WSO2 Data Analytics Platform—the Swiss-army-knife in Analytics

With Analytics becoming an organizational necessity in the modern business context, WSO2 combined many capabilities delivered by a few older products in to a single product called WSO2 Data Analytics Server (DAS). This was done with the purpose of offering same capabilities and more, with a single product, which could be used to build comprehensive solutions. Since we generally talk about solutions we have to build—but not just about products—we may think in terms of and will discuss about a Data Analytics Platform, which is built utilizing WSO2 Data Analytics Server capabilities. This article intends to give an overall idea about the key capabilities. It will help you decide whether this platform is something that you should consider studying more about, or something that needs to be passed right away to save time.

The basics

If we think of a feature rich middleware platform that would help your organization addressing all types of analytics related requirements, basically we have to consider a three main factors.

  1. Inbound Channels for Events/Data
  2. Internal Events/Data Analyzing Mechanisms
  3. Outbound Channels for Events/Data

If we look into the core architecture of WSO2 Data Analytics Platform, we can observe that this concept being applied as depicted below (figure 1.0).


Figure 1.0

The inbound-channel basically consist of two main layers, in order to read events from external sources, and interpret them into an events-data-format which is understood by the internal analytics engines. In WSO2 context, the components which read events are called Events Receivers; and the internally consumed common events-data-format is generally called Event Streams. You can imagine the nature of such an events-stream by comparing one with a stream of a few different fluids with different densities. Just like how differently dense fluids would layer over each other and flow together without being mixed, a particular event-stream may have many fields of parameters.


Figure 1.1

This interpretation activity or the collection of events-streams is not something that exists as physical components of the platform. However, this also is depicted in figure-1.0 as a component of the platform—because—from the analytics-engine's point-of-view, such an events-stream can be considered as the dynamically created component that virtually exists in the memory and carry events-data in.

These event-streams are consumed by the internal analytics engines (represented by Analyse Events component in figure 1.0), to perform processing activities and deliver the expected outcome through outbound-channels.

For the time being, we may keep our focus towards the events-data processing part, leaving the matters related to other key components such events-receivers, events-publishers and many similar for later discussion.

Thanks to the comprehensive events-data processing features that the platform is equipped with, it addresses 4 main sub-domains in Analytics. Now, we may try to understand what these 4 sub-domains are, and how the platform performs in each case to deliver deliver different outputs utilizing the same sources of events-data.

Espresso, Americano, Cafe-mocha and Cappuccino of same Bean

The main four sub-domains in analytics, which the WSO2 Data Analytics Platform was built to address are,

  1. Real-time analytics
  2. Batch analytics
  3. Interactive analytics and
  4. Predictive analytics

We may select each of these at a time, and discuss how the platform performs these utilizing the same events-data collection, and what are the components and mechanisms involved in the process.

Real-time Analytics


Figure 2.0

The activity flow of real-time analytics is depicted with figure-2.0, and can be listed as below.

  • Events-Receiver receives events and
  • Publishes events-data into the real-time analytics engine,
  • in the form of an Input Events Stream.
  • The real-time analytics engine, process these events-data
  • according to a predefined Query (or block of logic and rules) and
  • publishes output in the form of an Output Events Stream
  • in to an Events-Publisher.

WSO2 'Siddhi'—the Real-time analytics engine

'Siddhi' is a Complex Events Processing Engine which initially powered the WSO2 Complex Event Processor (CEP) product. When the WSO2 Data Analytics Server (DAS) was created, the same engine—with many enhancements—was embedded, to facilitate the WSO2 Data Analytics Platform with real-time analytics.

The real-time analytics engine made of 'Siddhi', is capable of reading elements from event-streams. Then it is necessary to define Execution Plans, using an SQL like query language called 'Siddhi' Query Language. Inside the code of a particular execution plan, it is possible to use one or many of such input-events-streams and process those events in real-time. The final output needs to be pushed in to an Output Events Stream, which follows a structure similar to an input-events-stream, yet is a part of outbound-channel of the flow. These output-event-streams are then understood by Events Publishers, and the real-time notifications or alerts are triggered by these events-publishers and delivers to the intended destination.

Batch Analytics


Figure 3.0

The activity flow of batch analytics is depicted with figure-3.0, and can be listed as below.

  • Events-Receiver receives events and
  • Persists events-data inside Raw Data Persistence unit,
  • using a few temporary Data Tables.
  • The batch analytics engine, summarizes these events-data
  • according to a predefined Queries (written in Apache Spark SQL) and
  • persists the Summarized Information inside Processed Data Persistence unit
  • using Data Tables.

If you observe the above activity flow, you can understand this exactly is similar to the real-time analytics flow which was discussed earlier. The platform persists inbound events-data within an internal (inbound) raw data storage (RDBMS), and the batch-analytics-engine—which was built using Apache Spark libraries—consumes these persisted data, instead of consuming events-streams directly as input. Similar to how 'Siddhi' based execution plans were used in the earlier case, it is also possible to script plans for batch analytics, using Apache Spark SQL. These scripts are executed periodically at a given interval (unlike 'Siddhi'). The summarized information will be persisted inside another (outbound) data storage (RDBMS) at the end of execution; and these information cab be used with the web based Dashboard Creation utility tool provided by the platform, presenting information such as statistics on web based dashboards with gadgets such as Charts (i.e. pie / scatter / Bar / Line) and Data Grids.

Interactive Analytics


Figure 4.0

The activity flow of interactive analytics is depicted with figure-4.0, and can be listed as below.

  • The already persisted events-data (raw data) and
  • summarized information (processed data) are Indexed by the platform.
  • Using the web based Data Explorer user interface,
  • and the web based Activity Explorer user interface,
  • The User (a human) queries data, and monitors activities
  • in an Interactive manner

When a human user has to query a data set in an interactive manner applying different filters and selection criteria, it is important to have that data set properly indexed. The indexing and searching mechanisms that the platform is equipped with (for interactive-analytics) are based on Apache Lucene. Additional to the default drop-down-menu-based search option available in the data-explorer UI, the platform also offers a way to write custom Lucene Queries for filtering data and view them on a data table.

Predictive Analytics


Figure 5.0

The activity flow of predictive analytics is depicted with figure-5.0, and can be listed as below.

  • The already Persisted Events-Data (raw data) and
  • Persisted Summarized-Information (processed data) are
  • used by the platform to make Machine Learning Models.
  • Once these machine-learning-models were made,
  • it is possible to process any or all of
  • the input-events-streams, output-event-streams,
  • persisted raw-data and persisted processed-data
  • against these machine-learning-models for Predictive analytics.

WSO2 Machine Learner comparatively is a new product that comes under WSO2 middleware platform, and is a project that actively kept evolving during the course of last couple of months. The same set of features of WSO2 Machine Learner is used by WSO2 Data Analytics Platform, to facilitate itself with the capabilities required for making Machine Learning Models. Generally, the already persisted raw and processed-data are used as source data-sets for making these machine-learning-models by the machine-learner.

Once these machine-learning-models becomes available within the Data Analytics Platform, it is possible to analyse the mentioned four sources (raw-data, processed-data, input-event-streams and output-event-streams) against such models for prediction.

Summary

Now you have an overall idea about,

  • the purpose that the WSO2 Data Analytics Platform was built for
  • the different aspects in Analytics that it addresses
  • how these capabilities are delivered by the platform and
  • how the same capabilities are used, to address your requirements.

What we did not discussed here is,

  • How the Events Receivers and Events Publishers work?
  • How they use Events Adapter Types (receiver and publishe type)
  • to comply with and communicate accross different transport level protocols?
  • How to map events-data fields against the fields of events-streams?

These topics were purposely kept for a future post, because this blog post was written with the purpose of helping solutions engineers and architects who evaluate platforms against analytics related requirements they have. According to what is explained in this post, you can understand that this particular platform allows you to address four different aspects in Analytics, with the same set of data which you publish once in to its events-receivers. This is simply called the λ Architecture in WSO2 analytics context. These Data Analytics Platform components also can be used in different combinations with other WSO2 products such as WSO2 API Manager, WSO2 Data Services Server and WSO2 Enterprise Service Bus to build comprehensive solutions that address your unique organizational requirements.

For example, if you wish to expose the processed-data (summarized information) in the form of APIs, so that the application developers of the organization could use them with the applications they build; you can follow my previous blog post called 'Your data, as their APIs' to expose the processed-data-persistence-unit (RDBMS) of this platform, having Managed API standards enforced over it.

If you believe that the above mentioned capabilities would address some requirements you have, or WSO2 middleware platform has the potential of helping you build your future analytics platform providing the capability of expansion and extension; you can go to the next level by exploring WSO2 official documentation, articles and other resources such as videos and webinars to gather more knowledge on specifics.

Comments