Are Data Silos Holding Your IoT Strategy Back?

By Will Ochandarena

Rigid silos are preventing businesses from improving their productivity and quality by up to 30 percent by using new techniques in data processing and machine learning.

This is an exciting time for anyone working in the industrial space, whether in the manufacturing, oil and gas, mining or automotive sectors. Several companies with whom I've spoken believe that they can improve both productivity and quality by up to 30 percent during the next five years by using new techniques in data processing and machine learning, which is an order of magnitude higher than they would have even dreamed of five years ago. What has been holding them back? Rigid data silos.

The issue of data silos actually impacts companies across nearly all verticals, but I would argue it affects the industrial sector the most. Why? Because the data silos that exist in factories and industrial sites, usually called historians, are smaller in size and more limited in capability than those of other industries. Their limited size forces down-sampling of data, like collecting data points every minute instead of every second, as well as sprawl, since it may take multiple historian systems to capture data from all machines and sensors within a single factory.

This is, of course, made worse when you consider the "edge" problem: remote wells, refineries and mines that have their own ecosystem of machinery and sensors, all dangling off of spotty satellite or rural broadband connections. It's no wonder analysts and data scientists in this space have been struggling to show value.

Modern, scale-out data systems are finally making it easy to overcome these challenges. Such systems typically combine general-purpose data storage with file and database semantics, the ability to handle real-time streaming data, and the ability to perform analytics through both traditional SQL and programmatic machine learning. Some of these systems shrink down well into an edge-appropriate form factor, with built-in data replication for coordination with the mothership.

Most companies take their optimization journey in steps. An easy first step to take is to export data out of the various historian systems into the new data platform, and to use business intelligence dashboarding tools to look for patterns or correlations in the data that provide insight. This, however, doesn't really help the data-fidelity issue, so the next step is bypassing the historians and connecting machinery (PLC, DCS and the like) and sensors directly to the data platform, and cranking up the rate of data collection to once per second or higher.

Once high-fidelity data is in the system, data scientists can use sophisticated machine-learning and deep-learning algorithms to predict failure of components before they can happen, as well as detect inefficiencies in the end-to-end process, detect production defects and more.

When tangible business results are seen in the main location, companies can extend this intelligence to remote edge locations. There are several options for edge-data platforms, ranging from cloud vendors to large technology conglomerates and startups. Companies are usually most successful when they are able to ship their applications and machine-learning models to the edge as is, rather than writing to an edge-specific platform.

Most critically, due to network limitations, it isn't feasible to send all data to a central location for analysis, so the edge system needs to be able to act independently of the core, while staying in tight synchronization. An architectural pattern that lends well to this is called "learn globally, act locally," which essentially delegates the heavy lifting of building predictive failure models to the core location where computing power is plentiful, while synchronizing those models with the edge sites so that the predictions can happen locally, eliminating latency and ensuring the process is up even if the network isn't.

Some think this type of solution is a pipe dream, but it isn't—companies are already adopting these solutions for the Industrial Internet of Things (IIoT) across multiple industries and use cases, including oil exploration, oil refining, mining, manufacturing and automotive.

Will Ochandarena is the director of product management at MapR Technologies, where he is responsible for user experience and the cloud. Prior to MapR, Will spent some time in the SeaMicro group at AMD, and was responsible for networking and cloud strategy. Before that, he was a product manager for the Nexus family of data-center switches at Cisco. Will has an engineering degree from Rensselaer Polytechnic Institute and an MBA degree from Santa Clara University.