Top Eight Questions to Ask When Choosing a Data-Storage Strategy for Your IoT Solution

By Jay Srinivasan

What should engineering directors or managers, when selecting a database, ask in order to ensure that all options have been considered?

Internet of Things (IoT) solutions have a lot of moving parts that need to work together to make something useful. Each architectural choice impacts performance, from the end device hardware to the protocol used to transmit data to the back-end infrastructure like hosting and databases. Here, we'll look at the impact of choosing a database on an IoT solution, and some of the key questions that should be asked to help make the right choice.

Let's say you are the decision maker at your company who has the unenviable responsibility of coming up with the right database strategy for your IoT solution. Actually, finding answers is not that hard with all the information out there. What's difficult is asking the right questions. This article will focus on what questions an engineering director or manager should be asking in order to ensure that he or she has considered all options when selecting a database.

The first question is how much it costs, right? Wrong—this should actually be the last question. Although it might be tempting to understand cost first thing, it's not so easy to answer that question unless you first answer several others. Here's where I would start:

1. What type of database do you want?
There are SQL databases, NoSQL databases and, for IoT-specific workloads, time-series databases. It may help to understand the strengths and weaknesses of each of these, and to decide which major direction you want to take. The next seven questions will help direct this highest-level decision.

2. What type of workloads do you have?
Do you have transactional workloads, analytics workloads or a combination of both? The underlying storage engine is very different for various workloads, so different databases can be great with one but not the other. For example, there's rowstore vs columnstore. Typically (but not always), rowstore-based databases are efficient for performing transactional queries, but are not optimized for reading selected columns for analytical purposes. Columnstore-based databases are efficient at reading a lot of data for analytics processing in a high-performant manner, but are not so good when you have to perform a transactional update.

3. How scalable does your solution need to be?
Are you just connecting up tens or even hundreds of devices, or are you dreaming in the millions? Most implementations start small, but it's important to understand reasonable expectations for the near future. Some databases may be great to start with (low cost, high performance), but will not necessarily scale easily beyond a certain capacity—that is, unless you make changes to your application code, which brings us to the next question.

4. How dependent should your application be on your database?
Each application can be tied to a particular database based on the way it's partitioned, scaled and so forth. Tying to a database makes certain things easier and more streamlined, but can require lots of new code if a switch is needed down the line. Keeping an application decoupled from a particular database is very much desirable, so as not to lock yourself out of other options. It also keeps up with the paradigm of programming to an interface, not to a specific implementation.

5. What's the expertise of your current (and near-future) team?
If all you have is a bunch of SQL developers, you're going to ruin their productivity by forcing them to use a NoSQL database—and vice versa. So ask yourself how quickly your team can switch and ramp up. Do you have the in-house database administrator (DBA) expertise to manage and support the developers as needed? Also, are you taking into account the new additions to your team that are in your recruiting pipeline? Get the most out of your database by pairing it with the right team.

6. What database dependencies can you accept—and what can you not accept?
Some database products come with an option to host yourself, while others are cloud-based solutions that are ready to use. Both have their pros and cons. In the former, you're dependent on an experienced DBA who can take responsibility for hosting and scaling your database as your business grows. In the latter, you're dependent on an external cloud provider, which naturally influences where your applications will be deployed. For example, if you choose Google Spanner, it might be more performant to run your applications on Google Cloud as well, so you don't have to pay for the latency or egress costs of network data.

7. What other pieces do you need to complete the story?
Once the database is selected, there is a lot to still decide based on how you want to use your data: What about caching? Do you need other services on top, like a memcache? How about visualizing the data in your database—do you need services like Tableau or Dundas? If so, do they integrate well with the database of your choice? Are you going to store all the data in the database, or are you considering moving older data to cheaper, cold storage such as AWS Redshift or Google BigQuery?

8. Finally, the big elephant in the room: what are the costs of licensing, developing and hosting?
A database that has a license fee of $100,000 a year may appear outrageous compared to an open-source alternative, but if the open-source option requires a dedicated DBA—whose typical salary is around $150,000—then is it really cheaper? There are also costs associated with hosting the database that you need to take into account that vary widely at different scale. For example, an option that's cheaper for the first 10 terabytes may not be cheaper once you reach the 100-terabyte range, as is evidenced by the analysis provided below, comparing five databases:

By understanding the needs of the solution and how a database will be used, you can be more comfortable that there will not be surprises down the line. It's important to think beyond the pure database technology and numbers to what you can expect in the future. Once you have your answers clear and your databases shortlisted, you can be confident in making the right choice.

Jay Srinivasan is the senior director of engineering at infiswift. He has more than 18 years of experience at Microsoft, Google and infiswift, building scalable distributed enterprise software, including the Software Load Balancer that powers all of Microsoft Azure. At infiswift, he serves as the senior director of engineering, and is responsible for the overall architecture of infiswift's IoT platform and vertically integrated solutions for solar and agriculture domains.