Podcast EP 159 - How to reliably connect 100 million devices - Dominik Obermaier, CTO, HiveMQ

EP 159 - How to reliably connect 100 million devices - Dominik Obermaier, CTO, HiveMQ

Jan 10, 2023

This week, our guest is Dominik Obermaier, CTO of HiveMQ. HiveMQ helps companies connect devices to the internet by enabling fast, secure, efficient, scalable, and reliable bi-directional data movement between devices and the Cloud.

In this talk, we dive deep into the IoT protocol landscape as we focus on MQTT and the surrounding standards in enabling software. We also explored the critical role of open standards and open-source software in an efficient and dynamic marketplace that promotes innovation.

Key Questions:

  • What is an IoT protocol, and how does it differ from a standard and connectivity technology?
  • What are the different MQTT alternatives for communication with devices?
  • How has the functionality of MQTT changed over time?
  • How do you define the appropriate communication protocol for particular use cases?
Subscribe

Erik: Dominik, thanks for joining us on the podcast today.

Dominik: Yeah, thank you, Erik, for the invitation.

Erik: Great. I love hosting entrepreneurs, because it's always interesting to understand how they convinced themselves to devote themselves to this journey of starting a company. Dominik, I think your case is particularly interesting. Because you've, right out of university, set up a company devoted to protocols. That's not the topic, I think, that most university students are passionately dedicated to. So, we'd love to understand a little bit of your background. What was it that sparked, that led you to say, hey, I'm going to devote a significant portion of my life to building a business to solve this set of problems?

Dominik: Yeah, great question, Erik. Thank you for that. Yeah, it was not as straightforward. Honestly, when I was in university, I was not the protocol guy. I did a Bachelor in Computer Science in Germany, but I always was working on the side. So, I was exposed to basically some real-world applications of technology very early on.

Very early on in my career, I had the chance to meet a gentleman called Arlen Nipper. He designed a communication protocol called MQTT in 1999. Back then, we can go also into that. But there were a lot of things that came together that led me, and also our CEO, Christian, to the realization. Okay. This makes now really, really sense. We cannot believe nobody else is basically connecting these dots. Then we decided to build a company out of that. This is how it started. I'm also happy to dive into more details on that.

Erik: I think our audience is coming from different places. Some of them are going to know a lot about MQTT and maybe use it every day at work. In others, this is going to be somewhat of a fresh concept. So, why don't we do a little bit of a 101, and start with just the question of what is an IoT protocol? How does it differ from a standard? How does it differ from connectivity technology like NB-IoT? Then we can go from there.

Dominik: Yeah, absolutely. One of the interesting things about Internet of Things communication protocols is that, I mean, this is nothing particularly new. So, if we look at the technologies in the last 40, 50 years, there's something that basically a protocol stack that is used to communicate over the internet. For example, most people will notice that there is a, for example, TCP/IP as a transport communication protocol that has been used. Also, chances are if people are now listening to this podcast, that they're using an underlying internet technology to stream that. For example, IP or UDP and TCP/IP without even knowing.

This is important, because computer networks always need this common language in order to communicate together. People settle very early on on these different communication layers and standardize on that. For example, today, we see now a Windows machine can use the same internet basically as a Linux machine, Mac OS machine. Because people settled with the same, basically on the communication layer. In general, this will be a bit more complicated now. But there is also the OSI model that describes that in detail how much less there are.

In general, I think for this conversation the idea is there’s an underlying connectivity technologies that are being used. Then we have these application protocols on top. Because just that you can connect networks doesn't mean that they can understand each other.

One of the best examples for the Internet of humans is a protocol called HTTP. HTTP was also developed in the '90s, basically, to access websites or to have a client server protocol. Their end client — in most cases, a web browser — can go to a server and request a web page. For example, I could go now to Wikipedia. Go to wikipedia.com, and look for an article. My browser would ask over something called HTTP, the Wikipedia server. The Wikipedia server would respond with an answer and give me the article on a very, let's say, high level concept. This is great. Basically, this is a kind of request-response type protocol. Somebody is requesting data, and then a server serves the data. This is how most technologies for the Internet of humans work.

The problem here is now that if we look at Internet of Things, we have a bit of a different problem. To give a sense of the magnitude of the problem, we now have connected, I think, 60% of all people on Earth. Let's say, for the sake of conversation, 6 billion people almost are connected to the internet, 5 to 6 billion people. Depending on which kind of analysis you read, we have approximately 40 or even more billion devices connected to the internet. So, we have almost 10 times the amount of devices using the same internet technologies as humans.

It's pretty clear that technologies built for the Internet of humans is not suited for the Internet of Things. Because if you think about it, devices — which could be fridges, factories, cars, and really anything you can connect to the internet — they are constantly producing data and constantly need to receive data. So, you have almost this streaming approach where the devices don't request data, but they produce data anytime. They need to receive data anytime. This is also in a very, very fast way.

Also, there are other things especially if you have mobile networks in the mix. This means you have high latency, so it takes a long time for data packets to be transmitted. Also, the bandwidth is not so good. I'm now connected with my computer to you. Now the bandwidth is pretty good. So, we can communicate very fast. If I drive with a car over a mobile network somewhere in the US or in Europe, in some remote areas, chances are the internet connectivity is not so good. This is why we need dedicated communication protocols that are designed to handle these use cases. There are some of them — I'm happy to go into that later, like MQTT and also others.

These are communication protocols exactly designed for the Internet of Things and not designed for the Internet of humans. We see that now that there are a lot of servers, for example. The recent Eclipse Developer Survey basically showed that most developers these days working on anything IoT-related are using dedicated Internet of Things protocols.

So, a lot of people have exposure who are working on these devices. People who have not worked on IoT use cases probably won't have so much exposure to these protocols yet. But this is only a matter of time, because Internet of Things is not an industry. It's really something going through all industries, as some of them are earlier than others.

Erik: Okay. Great. That's a great foundation. Sorry. We'll certainly not going to cover the entire protocol ecosystem. But let's discuss a few of the alternatives to MQTT for communication with devices. Then we can also get into what differentiates them, and why have you chosen in particular to focus on building a company to connect via MQTT. But first, what are the other set of protocols that would be in that ecosystem?

Dominik: Yeah, that's a great question. There was also in 2014 or 2015. There was something that people sometimes called the Protocol Wars, where we had a lot of competing technologies competing for this Internet of Things connectivity. But this time, it's over for quite some time.

But there are many, let's say, different protocols used for specific purposes. Back then, there were protocols like XMPP which comes more from the chat side of things. Google Chat, for example, use that back then. AMQP, which is coming more from the traditional message queuing world was one of the, I would say, contenders. MQTT as basically Internet of Things protocol, it was designed from the ground up for doing that. Then of course, there's also things like HTTP that is being used also for IoT use cases. So, I think these would be the ones today.

I think you usually only see MQTT, HTTP, and AMQP. But AMQP are only for very specialized use cases. HTTP is usually something that is being done either for some of your special use cases or for some, let's say, deployments. The architects are not aware of dedicated IoT protocols like MQTT.

Erik: Sorry. I guess folks are familiar with HTTP and, to some extent, the evolution of it. These protocols also evolved in terms of functionality to some extent. What did MQTT look like when it was born in 1999, and then in 2012 when you set your company up, and today? How has the functionality evolved over the past two decades?

Dominik: Great question. Quick thing I wanted to mention, because also why did we even use MQTT, and why did we decide to dedicate the time? MQTT was developed in 1999. It was developed for a specific use case — for oil pipeline monitoring. Basically, a SCADA system use case. Phillips 66 back then basically had one of the first TCP modules that are used for satellite communication. One of the issues was that satellite communication, even today, is still pretty expensive compared to other alternatives. Back then, it was even more expensive.

So, the designers of the protocol, Andy Stanford-Clark, who is CTO at IBM in UK, and Arlen Nipper who is now with a company called Cirrus Link, they decided to solve the problem and save a lot of bandwidth, basically, over mobile networks, over TCP. They asked the question, what is the most lightweight protocol and the fastest protocol we can develop over TCP/IP? Also, utilizing all the characteristics of TCP/IP. For example, you have this guaranteed message ordering. It is guaranteed to not lose data and stuff like that. They built a very thin layer on top of TCP/IP in a so-called publish/subscribe layer, MQTT's publish/subscribe protocol, which completely decouples the producers of data and the consumers of data.

Because if you think about it, you do not want to reconfigure all your sensors and actors in the field just because you changed something on the server site, or if you have new producers of data, or new consumers of data somewhere in the data center. So, they built this very dynamic protocol where their data producers and consumers can enter and leave basically your system at any point of time.

Also, they built in some features for remote monitoring, basically. If a device gets offline for some reason because it's broken, when the battery is out and so on, what is called the MQTT broke, the message distribution software —that is usually residing in the cloud — is aware of that fact and can also then notify other participants. "Hey, this sensor is not available anymore. There might be an issue." So, they built some very unique features like that into the protocol.

But then unfortunately, this protocol was shelved. It was not used anymore. Why? Because it's proprietary. Around 2010, IBM decided to open the protocol, make it so majority free, which means they basically promised to not sue you if you would implement a protocol.

Then the first open source software, Mosquitto, was one of them. Open source broker, they were started. Basically, we started a company because we thought, hey, this whole MQTT thing, why isn't everybody using that? This exactly solves all the issues we have these days for large-scale deployments. We figured out, okay, you cannot really use it in enterprise environments. This is how we started the company to build software, to make it suitable for commercial deployments. The reliability is key. So, we developed something called clustering, and really made a lot of innovation in the space to fix some of the issues that prohibited companies to use the MQTT technologies in production.

Then fast forward 2014, we also helped as a company specify the standard. MQTT is an ISO standard. We helped specify that with other players like Cisco and others. In 2018, the Version 5, which is the newest version of the protocol, was released. We also have standards on that. This is now the protocol used for majority of devices connected around the world.

Erik: Okay. Interesting. Would it be correct to say that the fundamental structure has remained the same, but the innovation has been around defining the standards and tailoring them to the needs of the market, and then also building software around MQTT to address some of the challenges inherent in this solution?

Dominik: Yeah, I think the general design of MQTT is still the same. Also, the design principles, I think, are today even more important than probably they were back then. Because MQTT has some really huge advantages when you want to scale something. I'll give you an example. We have customers who have tens of million devices on a single installation. For people who are familiar with message queueing technologies, we have a customer that has more than 150 million so-called topics of structures on a single deployment. Traditional message queues usually have around 100, maximum 1000. With MQTT technologies, you can really scale that very fast in a very dynamic way.

When it comes to the innovation, another one, of course, is the standardization. It's also the feature sets, of course, evolved. You see the same with HTTP. I mean, HTTP is also pretty old. There were multiple durations that the protocol remained the same with HTTP. But there were a lot of, let's say, tweaks and more standards, and also some features added. The same is true for MQTT, especially Version 5. It's full of features that the market basically demanded.

Some of the proprietary functionality, for example, we had in our product, we also helped bring back to the standard. So, everybody can ensure that. Because I believe in open standards. Not everybody using MQTT will be having a customer. It's good to have every standard lives by multiple implementations and having them available. I believe it's the best approach. It's not to rely on proprietary functionality, but bring back things into the standard so everybody in the market can participate on the feature sets, and not only people using a specific implementation.

Erik: Great. I'd love to get into the topic of open source. But before we go there, let's touch a bit on maybe we can say the verticals and the horizontals. IoT is a very broad topic. Pretty much, as you mentioned, every industry — healthcare, transportation, manufacturing, et cetera — is using IoT data to some extent, and the devices. They each have their own regulatory environment. They each have their own requirements. Then within those industries, obviously, there's heavy use cases that require very high volumes of data. There's use cases that require putting a sensor out in the field and letting it sit there for 10 years, and send one piece of data every 30 minutes. So, you have also a wide variety of uses in every industry. In this very diverse landscape, how do you define what is the right messaging or communication protocol for a particular use case? How do you make that determination?

Dominik: The answer, we often see it as it depends. It depends on multiple things. For example, when we come to different verticals, what you see across most verticals is as soon as you have internet connectivity, MQTT is the dominant protocol to use. MQTT is really designed for Internet communication. It's super lightweight. It's extremely fast. It relies on the underlying communication technologies like TCP/IP, which they're often very efficient implementations also over the mobile network, for example. So, this is one thing.

As a mental framework, MQTT is usually used when there's internet connectivity. These days, it's also very often in things like factories or where you have multiple data producers and consumers. These days, we do a lot of work with factories where they have, on the one side, autonomous factories that cannot rely on the internet connection. So, to build an MQTT system in the factory, which is almost the central communication bus, and then they also add another MQTT layer for communicating to the cloud.

The thing is, you don't see MQTT alone usually, in a factory, for example. Every industry has their own set of protocols. In a factory, we often, for example, see things like OPC UA. You see also Modbus and a lot of other legacy technology. What you often find is that there is a bridge happening — to bridge the old world and the old legacy protocols to an MQTT layer. This is what you see with many of, let's say, the large companies and corporations these days, and with multiple customers in the US where they have almost 100 factories around the globe. Then I'll connect everything with the same technology with MQTT, and they get real time insights into everything what's happening globally.

This is something you can do when bringing IoT protocols also to domains and have the advantages of that. You also see some transportation, for example. You see MQTT being used also inside complex vehicles, but also cloud connectivity. Of course, still, you usually have the old world. This is why you usually bridge MQTT technologies with others.

Erik: Okay. Thank you. Well, let's discuss open source now. Maybe it's less interesting, the question why is open source important? But maybe more interesting, why is it important to HiveMQ? Why have you, as a CTO of a company, decided to also devote your time and effort to supporting the Oasis initiative? How does open source interface with your business?

Dominik: I would quickly like to differentiate between open standards and open source. Here's how we believe in both. Open standards, from my perspective, are the absolute foundation for any communication technology. To make it short, the history of the Internet showed that open standards are required for any kind of connectivity. All the proprietary protocols went away even if you had comments like Microsoft pushing for that. For me, it's clear. Any kind of communication technology must be open standards.

This is why I'm actually a bit unhappy that many companies are just trying to make a proprietary version of open source, of open standards. It's a bit sad. So, you see there was a big cloud in there. You have MQTT support, for example, with Microsoft and with AWS, for example. But the reality is that they don't really support MQTT to the full extent, the full open specification.

Customers don't get the full experience and get up. That's a proprietary version of that, which you should also with a lot of vendors. But especially the big vendors usually do that. Because this creates a lock in effect. So, I do not believe that communication protocols should be used for lock ins. So, I'm a heavy advocate for open standards and specifications.

I dedicate a lot of my time for working on these open standards. Because I believe this is the only thing that drives the industry forward. When it comes to open source, HiveMQ, for example, has an open source version. Most of the standard MQTT functionality is available for free on the server side, but also on the client side. We earn our money with customers who have mission-critical use cases. Really, the factories that cannot have a downtime. There are connected car scenarios where they have millions of customers who rely on a good user experience, things like that. So, we're doing really the mission-critical things, especially for startups and other companies who are not having a budget for doing mission-critical deployments.

Of course, open source software is good. Open source software is free. Also, there's a community around it. Let me add one thing here. When it comes to communication protocols, I also believe that especially the client implementations, to be of response. So, we're heavy sponsors also of the Eclipse Paho Project — the response or the maintainers for maintaining free and open source Apache 2 license and Eclipse licensed implementations that customers can use for putting things into a car, into a fridge, into a back end software, and so on, to communicate with MQTT. All of that is completely free and open source.

We are investing for that because we believe in growing the pie for everybody. For us, as a company, it's the right move and really supporting the whole ecosystem. Because we don't care if somebody wants to use another MQTT software with that client absolutely. I do not believe communications protocols should lock you in into one vendor. The decision for a vendor in the cloud should really rely on what's needed for the business. But we're not fans of locking customers or users into a specific piece of software.

I think these times are actually over. In 2022 yet, I don't think this is the right approach. The approach is really customer should have the freedom of choice, should use MQTT technologies, and should choose the vendor on the cloud and on the backhand side they want to use that serves them best.

Erik: Yeah, well, I hope you're right. That is the path forward. Obviously, larger players always have some incentive to lock the ecosystem into a status quo. But it does sound like from the end customer side also, there's a lot of push towards open standards. Hopefully, that wins the day.

Let's discuss then your solution. I think we've already covered the business, the value proposition to an extent in terms of supporting mission-critical use cases that have to scale across thousands and millions of devices in fast, secure, efficient, scalable. But let's then look at what are you actually providing in terms of your flagship product, and then the other solutions in your portfolio.

Dominik: Our flagship product is the HiveMQ MQTT platform. This is basically a technology component called MQTT broker, which is a data distribution software designed for mission-critical use cases. So, we have customers all around the globe like BMW, Audi on the automotive side. We also have the largest logistics companies in the world as customers who are relying basically on delivering the packages to their end customers with our technologies. We have airports as customers who basically rely on smooth operations. So, these customers, they need mission-critical software that runs all the time. That also scales with the needs Because very often, we see that they start smaller. Then they roll it out across the whole company. This is what we're focusing on.

Our MQTT platform is, to this day, the most scalable version of anything. We've shown that we can scale to more than 100 million devices with a single installation. We have customers from all industries. For example, Netflix is also a customer of ours, which I can talk publicly about. They use all of that for making sure the business just runs. Similar with a database, you don't want to care about it. It just needs to work. But the scale of your operating is usually very high. This is why we have built — I think we are, honestly, the first in the market back then when we brought in elastic clustering to the market, which means you can add and remove basically data nodes at runtime as you wish. For example, you can scale up and scale down, if you want. We focus on security. It's a huge part of what we do.

Reliability is our core business. But also, we brought in a lot of innovation to the market, for example, around observability, which means if you have thousands or even millions of devices, it's a pretty hard problem to identify which of these devices are having issues. So, we provide those technologies for identifying issues and fix them before basically the end customer has detected some issue and things like that. It's really about making MQTT technology ready for the enterprise. It is our enterprise platform, which we have, which consists of the broker and multiple additional modules like AWS Kinesis integration, Event Hubs integration, Kafka integration, and a lot of other different things.

We also have what we call HiveMQ Swarm, which is a quality assurance and load testing tool that allows our customers to continuously monitor how is their performance, how are the deployment tools, how are the backends tool? Also, you can do roll out simulations. Customers use it, for example, if they enter a new market. For example, automotive companies entering a new market, and they want to test their end-to-end system. Does it really scale? If we add about 10 million additional users to our systems? Does the broker server, does it have enough resources, but also our back-end services, our micro services working still directly in the cloud? Are the databases we use, are they sized sufficiently? Things like that.

We can basically make it easier for our customers to understand the impact of the decisions they make. Also, they can see how is the whole system they're building and scaling. Then we have a lot of additional tools like, for example, running things on Kubernetes in the cloud, and so on. We're constantly working on new innovations we bring to market.

Erik: Let me see if I can get you to visualize this for me using a few examples. If we look at three different hypothetical customers. Let's say, one is Netflix, which is primarily a streaming content, streaming video. Then we have the second hypothetical customer would be Volkswagen. We have fleets of vehicles, mission-critical real time communication, et cetera. Then we have, let's say, Total, managing oil and gas infrastructure in remote regions. So, three different customers, different scenarios. Would they be using the same toolkit entirely from HiveMQ, or would they be using your solutions in different ways?

Dominik: No, actually, they would use it in the same way. I think the main features, functionality they would rely on would be a bit different. For example, such a hypothetical streaming example would probably require a lot of bandwidth. Not so many users compared, for example, to connected cars. But still, a lot of users, less scalability because you would have to scale the system up and down in order to save costs. That's something you would rely on, but you would also rely on a lot of message throughput. So, you would really focus on the reliability of the solution.

Connected car is also an interesting one. You would also have that scalability aspect of that. So, as soon as you add more users, you sell more cars, of course your user base would grow. You would also very likely, in these examples, use the extension systems to integrate with your own enterprise systems. Automotive companies have very complicated back end systems usually. Probably would use the open source extension system to integrate with these custom security systems and so on and easily hook their software in.

For our oil and gas use case, and also asset monitoring use case, it's not like you would add millions of new devices per year or so. Here, you would focus on the reliable throughput of data and also the reliability aspects of that. Overall, the installation would look pretty much the same. It could be either hosted on our cloud or hosted on their own clouds. But the details of the platform, what you would be using would be a bit different. Also, USPs would be a bit different for the different customers. But overall, it's the same software. All of them would benefit from new modules we release, the benefit from new integrations and new features.

Erik: Okay. If we look at the business model behind HiveMQ, I guess we have a few different variables. We have number of devices. We have volume of messages. We have volume of data across those messages. There's probably other things, but we don't have to get too much into the details there. But what would the pricing model be? Is this a standardized equation, or do you look at each customer and say, let's figure out something that makes sense given your unique situation?

Dominik: What we have is we have our cloud offering. Many customers started with that. So, we have this actually completely free up to 100 devices, which is for some use cases even sufficient. And it has a pay per use model. That is usually good until 1000 or 10,000 devices. Above that, this really doesn't make so much sense. Because pay per use models, you see this also with public clouds. They work very well if you have low volume. But in a mission-critical production case, usually, this is getting honestly more expensive at some point of time.

Another approach would be better. So, we have businesses but also per industry. For example, let's say we try to connect a car platform. You have millions of devices. The customers usually think different, because it's basically a cost model per car, per device calculation. While if you're in a factory, usually I have a handful or hundreds of devices, not millions of devices. The value unlocked here is totally dependent on the reliability. It's basically the cost. It's basically about the mean time between failures and mean time to recover. This is how the value is usually perceived. This is why we have multiple editions for multiple industries also, that are really tailored to the industry needs to have really the most effective basically pricing model for all of them.

But for people who just want to get started, we have our cloud model that is just pay per use. You pay, at the end of the day, per connection in our cloud. But for mission-critical use cases, we have tailored versions for all the industries to serve them best.

Erik: Okay. Great. It sounds like a very pragmatic approach. Well, Dominik, this must be an exciting time for you. I think you were in the IoT game since 2012. Back then, I know the question was very much what is IoT, and now we're getting to the point where, basically, every company is using it to some degree and often in fairly high volumes. We're getting to that hockey stick, or at least sustained high level growth. If we look at your business, what is it that excites you about the future? Obviously, you have a mature platform today? Looking forward 3 years, 5 years, 10 years, where do you see HiveMQ growing?

Dominik: At HiveMQ, we're building a central nervous system for our world. Eventually, we believe we will help connect every device on Earth with the software we built, the only open source IP on the clients IP, on the server side. For me, it's really crazy how things have developed. Back then, it was a vision. Honestly, a bit of a crazy vision. Now we really see that things are starting.

Everything around IoT, it's still day one. It's just so crazy. I speak to so many developers also on conferences. While I live in the IoT bubble now for quite some time, one could have the impression, yes, IoT is really now here. But it's it's not even there yet. Some industries are starting it. The customers we work with, what they are building and what they are going to release in the next three years to the market, this is just crazy.

What I like most about my shoppers, there's really no way that I can be a pessimist. Because seeing what our customers are working on, the hard problems our customers are tackling, this is why I'm a massive optimist. This is just crazy what companies do these days. Also, I'm a bit humbled to see what our platform, what our technology can enable them.

It's not only about saving costs. Very often, it's cost saving. But it's about customers can do things they couldn't do before. They can serve their customers better. This excites me every day. This is what IoT, for me, is about. IoT is, for me — this might be a long conversation. IoT is not really from another industry. It's really an aspect that will transform everything we do as humans over time. We're building the central backbone and the central nervous system for that.

Erik: Yeah, that's a good way to think about it. It's a little bit like how do you talk about the internet. The internet is not really a technology. It's a thing that exists and influences many, many different industries and businesses, and has many different applications. IoT is just bringing that thing to the rest of the world.

When I set my business up a few years after yours in 2016, I think it was with a similar mentality that this is going to be just an exciting set of challenges to solve for the next 10, 20, 30, 40, 50 years. As long as you want to keep playing in the space, there will be new challenges to face. Certainly, as you said, we're still in the opening stretch here. Dominik, I think we've covered a good bit of territory here. Wrapping up, any other things that are important for folks to know?

Dominik: I think we've covered most of that so far. Especially for people who haven't heard about MQTT, the best way is for you to Google MQTT. For people who really want to get their feet wet with the technology, we have a 10-piece blog post series that many people use to get started with MQTT. We have a YouTube channel and also a series where I try to explain the core concept in a more digestible three to five-minute format to developers and architects and for the people who are basically owning parts of a business and who want to learn more about how can technologies like MQTT help them get more effective, save costs, and also do things they couldn't do before.

So, we also have a lot of white papers. A lot of custom use cases are also on our website. So, I invite people to look at that. Also, I'm always up for a chat. People can feel free to connect over LinkedIn. Drop me a message. I'm always happy to help and share thoughts with people.

Erik: Awesome. Dominik, thanks for your time today.

Dominik: Thank you.

 

test test