The Human Factor

Fast data processing so far has been a purely technically driven topic. It had a clear focus on mastering technology to provide reliable and robust frameworks for working with moving data. From that perspective things made pretty good progress as hard challenges like “exactly once processing” were solved (see the data Artisans blog on how Apache Flink solves this).

Now it is the time to look forward into a future where data processing is not just getting faster but streaming technologies are having a significant impact on the implementation of what John Launchbury (director of DARPA’s information innovation office (I2O)) calls the third wave of AI systems (https://www.youtube.com/watch?v=-O01G3tSYpU). In this context the construction of explanatory models for classes of real world phenomena to introduce contextual adaption requires a constant development over time – a perfect match for stream processing.

Before we lose ourselves in the future, let’s get back to the present. Although sophisticated technologies are available and could help to make significant progress we are actually quite hesitant or relinquish to apply them. Why?

During a presentation someone asked me about unique use cases for stream processing technologies that may not be implemented with batch oriented approaches.

At first I was a bit puzzled as I could not come up with an ad-hoc answer, especially one that does not sound too sophisticated, like supporting explanatory models. Although I could have pointed towards fraud detection or behavior driven recommendation systems, these are not those exclusive set of use cases expected by the inquirer.

Finally my answer was that there might be some special use cases but for now fast data adds a pretty important aspect to existing cases: the timeliness of data.

Most current cases are based on historical data collected and provided through batch processes based on the concept of data-at-rest. To make significant progress compared to any competitor is to come up with services based on more current data. This is where data-in-motion enters the scene.

But that still leaves us with the question why transforming businesses and processes from data-at-rest towards data-in-motion happens so sluggishly. Is it simply a question about technology? Probably not as frameworks like akka are around for quite a while now and even newer ones like Apache Flink matured into mainstream recently.

From my point of view and what I have experienced so far this is neither a technology thing nor it is a question on budget and resources. It is more a human factor.

People tend to stick with traditions, existing processes and proven knowledge. At some time they were trained to work with SQL on data-at-rest and they were introduced to the concept of reproducible operation results.

However these characteristics – data persistency and reproducible results – are removed from stream processing architectures and were replaced by uncertainty and time: it is not for sure when an information arrives and if it is received at all. Furthermore data must be processed right the moment it is available since it is immediately replaced by the next event as it occurs.

Although adding temporary caches or mixing in data from persistent storages is a well established approach people must deal with the nature of this paradigm to fully understand and leverage it.
To turn your fast data architecture into a success do not only tackle architectural questions but put focus on the people who are going to use your platform in the future. Let them embrace the future, meet their fears and make them first class citizens of your platform.

From Data At Rest To Data In Motion – Reasons From A High-Level Business Perspective

Clive Humby coined in 2006 the phrase “data is the new oil”. As this was already true for companies like Google and Facebook it was the promotion of Hadoop into the state of an Apache top level project in 2008 which made a technology accessible to a broader public to turn his vision into global reality.

For the years to come companies began to collect as much data as possible from a variety of sources with the hope in mind that it may turn into gold some time if just the right algorithm was applied to it.

Some figured out how to transform raw data to business insights others are still struggling with the same question while their mostly uncatalogued data is getting (c)older and less valuable.

Although data of the aforementioned companies is aging as well, they still know how to turn it into action and keep up with the market. But as they develop a false sense of security the next evolutionary stage is around the corner.

Not to stay with the market but getting ahead of it requires immediate insights from data generated just the other moment. That is why we are currently seeing an emerging development from data at rest (batch mode processing) to data in motion (stream processing).

Like in the oil business it is not only about owning the raw material but it requires to master the refinement process as well in order to be successful.

In contrast to the early stages of big data processing this phase does not only require a new technology but demands for a paradigm shift as well.

While processing was bound to stable datasets, could be performed with a widely known language based on set theory (SQL) and provided reproducible results, working with data in motion is more complex as time, context and uncertainty are significant characteristics.

Thus evolving a big data architecture by adding the concepts of data in motion must go hand in hand with a significant shift in how data is understood and worked with.

Although both aspects imply changes the latter is the most important as a technology simply provides the foundation but it is the person who exploits it that is turning data to insights and finally to revenue.

Like any evolutionary step this one implies a cultural change as well.

The mission of this project or blog is to shed some light on the aspects of fast data architectures and provide a practical guide on how to turn them into a success – from a technological as well as cultural perspective.