Jul 08, 2019Data consumed and produced by Internet of Things (IoT) devices is growing at an ever-expanding rate—there will be nearly 31 billion IoT connected devices by 2020. But the data generated from IoT devices is only valuable if you can analyze it, and performing that analysis presents its own set of challenges:
• Sensor and machine data is highly unstructured, which makes it difficult to use with traditional analytics and business intelligence (BI) tools that are designed to process structured data.
• Object storage is generally used for storing this data because of its flexibility, scalability and low cost—but object storage isn't easy to connect to analytics and BI tools in the first place.
• IoT data is massive, and analyzing it calls for elastic computing resources that are independent of storage, and that can easily adapt to heavy analytics workloads.
• Building a semantic layer is critical, given the breadth of data that's available for analysis and the difficulty of interpreting it.
As a result, many companies leave their IoT data untouched, and it becomes an underperforming asset. How can companies fix this situation?
Making Data Consumable
IoT device data is often kept in object stores such as S3. These days, users often need to manually transform that object store data into a format that is consumable by the tools they use. And while data continues to grow, the effectiveness of processes such as "extract, transform, load" (ETL) is not increasing to keep up, and performance continues to suffer as the size and complexity of datasets increases.
Users need a platform that allows them to connect their favorite BI or data science tools directly to their data regardless of where it is located, and regardless of how it is structured, without compromising on performance. The platform also needs to expose data from any source using the robustness and flexibility of SQL, since, in the majority of enterprises, SQL is the most popular data-access language known by users.
Building a semantic layer is critical, too. The semantic layer provides meaning and context to the underlying data so that business users don't need to build a sophisticated understanding of the underlying ways in which the information is stored. When data is properly tagged, catalogued and made searchable, its value increases because teams can more easily build a shared understanding that helps them reach, and act on, conclusions.
Finally, IoT data sets are generated by a huge number of devices, and these devices record a broad array of data, enabling a broad array of use cases, including maintenance to operations optimization to supply chains. But the value of this information is increased when combined with existing enterprise data sources, such as sales, customer and product information.
Enter data-as-a-service (DaaS). Data-as-a-service platforms address key needs in terms of simplifying access, accelerating analytical processing, securing and masking data, curating datasets, and providing a unified catalog of data across all sources. Rather than moving data into a single repository, DaaS platforms access the data where it is managed, and perform any necessary transformations and integrations of data dynamically.
In addition, DaaS platforms provide a self-service model that enables data consumers to explore, organize, describe and analyze data regardless of its location, size or structure, using their favorite tools such as Tableau, Python, and R. Some data sources may not be optimized for analytical processing and unable to provide efficient access to the information. DaaS platforms provide the ability to optimize the physical access to data that is independent of the schema used to organize and facilitate access.
With this ability, individual datasets can be optimized without changing the way in which data consumers access the data, and without changing the tools they use. These changes can be made over time to address the evolving needs of data consumers. DaaS allows users to tackle this challenge by providing a platform by which business users can easily discover, curate and share data from any source, then analyze with their favorite tools, all without being dependent on IT.
Based on Open Source
In data analytics, the future is open source. Infrastructure based on open source delivers a number of benefits to enterprises, including faster development cycles (building on the work of the community of open-source contributors), more secure and thoroughly reviewed code, and no vendor lock-in.
For example, data infrastructure built on Apache Arrow allows enterprises to leverage the benefits of columnar data structures with in-memory computing, providing dramatic advantages in terms of speed and efficiency. Open source DaaS platforms, such as Dremio, are built on Arrow, as well as number of other open-source projects, which results in extremely robust performance.
Tomer Shiran is the co-founder and CEO of Dremio. Previously he was the VP of product at MapR, where he was responsible for product strategy, roadmaps and new feature development. As a member of the executive team, he helped grow the company from five employees to more than 300 employees and 1,000 enterprise customers. Prior to working at MapR, Tomer held numerous product-management and engineering positions at Microsoft and IBM Research. He holds an MS degree in electrical and computer engineering from Carnegie Mellon University and a BS degree in computer science from Technion—Israel Institute of Technology. Tomer is also the author of five U.S. patents. You can contact him on LinkedIn, at Twitter or via email.