How to Get Faster Results from Streaming Analytics

A promising new approach for real-time analytics combines in-memory computing technology with a derivative of the digital twin model, providing security and safety for the Internet of Things.
Published: March 23, 2022

Today’s streaming analytics platforms can take minutes or even hours to produce results. This poses a serious challenge for managers of live systems, such as computer networks and other critical infrastructures, who must spot issues in the moment and respond quickly. Because these systems are typically inundated with incoming telemetry messages from many data sources, they usually process messages using rudimentary ETL pipelines that, for the most part, just store data offline for later analysis. For example, they may save data into log files or historian databases for automated and manual query, or they may stage data in a data lake for batch analysis. Getting analytics results within milliseconds or a few seconds remains a daunting challenge.

Consider, for example, the security and safety systems needed to protect a geographically distributed power grid spanning thousands of nodes. These systems continuously ingest and analyze telemetry from Internet of Things (IoT) devices and control systems throughout the network, and they must quickly identify threats and react in real time. For instance, security systems must be able to detect unauthorized intrusions and assess their scope and severity. Safety systems must determine whether transmission components, such as power lines and transformers, are experiencing unusual stresses and are likely to fail or cause fires.

Given the huge volume of data that needs to be processed, today’s security information and event management (SIEM) solutions typically employ big-data techniques to identify patterns in the telemetry and signal alerts. Likewise, safety systems usually log events into databases for query by operations personnel. By relying on offline processing or manual introspection to detect issues, these systems add unwanted latency that delays action—often when it’s most critically needed. How can their work be accelerated to enable fast, effective responses?

A promising new approach to tackling these challenges for real-time analytics combines the power of in-memory computing (IMC) technology with a derivative of the digital twin model popularized for use in product design. IMC has evolved throughout the last decade to provide a fast and highly scalable software platform for hosting streaming data in memory and analyzing it with low latency. It avoids the need to stage data in offline storage prior to analysis, and it integrates computing and memory-based storage to keep latency as low as possible. The digital twin model takes advantage of IMC technology by offering a simple yet powerful technique for structuring streaming analytics code.

Digital twins change the way developers think about streaming analytics. Instead of organizing application code as an event-processing pipeline, digital twins track the state of each data source that produces telemetry (such as a software security agent or an IoT device). Developers can create digital twins to hold relevant, dynamic-state information about data sources and run analytics code within milliseconds after messages arrive. This code continuously looks for unusual behaviors and creates alerts when needed. For example, it can run a machine-learning algorithm to detect anomalies in streaming data that might otherwise go undetected.

IMC technology enables digital twins to deliver highly scalable performance and provide the timely streaming analytics needed to identify threats and respond to them in the moment. IMC software runs on a cluster of servers hosted either in the cloud or on-premises and can host many thousands (or even millions) of digital twins. Together, these digital twins track the state of very large infrastructures, such as power grids, large computer networks or vehicle fleets. As infrastructures grow, IMC software can add more servers to seamlessly scale processing throughput.

IMC technology can provide real-time alerts and feedback without the delays of offline analysis. In addition, it can aggregate state information from all digital twins into a visual dashboard that gives operations personnel a complete, up to the second picture of a live system. The combined use of continuous real-time analysis and data aggregation enables personnel to quickly identify important issues that otherwise might be buried in telemetry stored offline and awaiting analysis.

With the rapid growth in the size and complexity of mission-critical live systems, the need to immediately assess incoming telemetry has become even more critical to uninterrupted operations. This challenge demands new ways of structuring streaming analytics. Integrating IMC with digital twins offers users a compelling combination of deep introspection and fast results. It has the potential to unlock powerful new capabilities for maximizing situational awareness.

William Bain is the CEO and founder of  ScaleOut Software.