Skip to content

DataFlow fundamentals


A DataFlow is a container that combines dependency injection with event dispatch for realtime data processing. In technical terms all elements within a DataFlow are organised in a topologically sorted directed acyclic graph (DAG).

The DataFlow container can be thought of as a spreadsheet on steroids, each node is like a formula cell, if any input to the formula changes the spreadsheet forces a recalculation. As the DataFlow receives an event it evaluates which nodes are connected to the handler and only triggers the connected elements for recalculation.

Following the spreadsheet analogy, the programmer provides the formula cells and immediate dependencies. The spreadsheet calculates the global set of relationships and manages the recalculation of formulas when any cell is updated. Delegating the mechanical but difficult task of calculating global dependencies to an algorithm allows us to build spreadsheets that are complex but very predictable.

DataFlow brings this spreadsheet like paradigm to the realtime processing world. Function are the formula cells, references between nodes are the formula dependencies. The DataFlow analyser uses the dependency information to calculate the global set of relationships. When the DataFlow determines a node is updated from an external event any dependent functions are called.

DataFlow can be classified as a combination of incremental computation and data flow programming.

Event dispatch rules

Notification connections between nodes are calculated at construction time using the same data that is used to add nodes to the DataFlow and annotations that mark recalculation methods.

When the proxy event handler method is called on the DataFlow it dispatches with the following with logic:

  • Any top level event handler is invoked with the arguments provided
  • The event handler method indicates whether child instances should be notified with a Boolean return type
  • Any dependent child node in the DAG is conditionally notified of the parent event handler completing processing
  • A child instance can only be notified if all of its parents have finished processing their notifications
  • The trigger method of the child returns a Boolean indicating whether the event notification should propagate
  • The DataFlow recursively works through the child references and trigger methods in the DataFlow
  • Dispatch callbacks are in strict topological order, with the event handler the root of the call tree
  • Each instance is guaranteed to be invoked at maximum once per event processing cycle
  • Any instances not connected to an executing root event handler will not be triggered in the cycle
  • Connections can be either direct or through a reference chain

Combining dependency injection and event processing

The introduction of dependency injection gave developers a consistent approach to linking application components. DataFlow extends dependency injection to support container managed event driven nodes. Aligning to a familiar development pattern has the following benefits:

  • Shallow learning curve for developers to use DataFlow effectively
  • Consistent programming model for event driven logic increases developer productivity
  • Re-use of industrial quality and predictable event dispatch model

DataFlow as a container

DataFlow builds a container from configuration information given by the programmer. Functions supported by the container include: creating instances, injecting references between nodes, setting properties, calling lifecycle methods, factory methods, singleton injection, named references, constructor and setter injection.

DataFlow are very lightweight and designed to be run within an application. Multiple DataFlows can be used within a single application each instance providing specialised business processing logic.

Automatic event dispatch

DataFlow exposes event consumer end-points, routing events as methods calls to nodes within the container via an internal dispatcher. The internal dispatcher propagates event notification through the object graph. A default onEvent end-point is always available on a DataFlow.

DataFlow leverages the familiar dependency injection workflow for constructing the object graph. Annotated event handler and trigger methods are dispatch targets. The DataFlowBuilder uses annotations to calculate the call trees for the internal dispatcher. A node can export multiple service interfaces that the DataFlow will expose. Invoking a service method on the DataFlow follows the documented dispatch rules.

Building and executing

There are two components the developer uses to execute a DataFlow, the builder and the runtime library.

DataFlow-Builder

The builder analyses the configuration information provided by the programmer and builds a DataFlow that houses all the nodes combined with pre-calculated event dispatch. This in memory DataFlow built runs in an interpreted mode

DataFlow-Runtime

The runtime provides the DataFlow with a core set of libraries required at runtime.

DataFlow-Compiler

A DataFlow-compiler utility takes the Outputs from the builder to generate the following outputs, either:

  • In memory DataFlow running in an interpreted mode
  • An DataFlow generated and compiled in process
  • An DataFlow generated ahead of time and serialised to code

An AOT generated container only requires the runtime to function, no compiler libraries are required. Compiled DataFlows have significant performance gains of 10X over interpreted mode, saving startup time and running costs