IoT Spotlight - EP 163 - How is the cloud transforming lab operations - Nathan Clark, Founder, Ganymede

Podcast EP 163 - How is the cloud transforming lab operations - Nathan Clark, Founder, Ganymede

	EP 163 - How is the cloud transforming lab operations - Nathan Clark, Founder, Ganymede
Podcast Date	Feb 14, 2023
Rating
Podcast Description	This week, we interviewed Nathan Clark, founder of Ganymede. Ganymede is the cloud-native data platform for life science companies focusing on integrating physical lab instruments with digital workflows. In this episode, we discussed the challenge of integrating and automating processes in life science research labs and manufacturing facilities. We also explored what the lab in the future will look like and how companies are leveraging the cloud today to improve productivity and guarantee compliance. Key Questions: ● What are the critical elements in automating the manufacturing process? ● What are the roles of the primary and secondary users in setting up the data infrastructure for the lab? ● How would you structure the tech stack, and where are you interfacing with other legacy systems and technologies ● How do people measure the objective benefits in the manufacturing and R&D side of the business?

	Subscribe

EP 163 - How is the cloud transforming lab operations - Nathan Clark, Founder, Ganymede

Podcast Date

Feb 14, 2023

Rating

Podcast Description

This week, we interviewed Nathan Clark, founder of Ganymede. Ganymede is the cloud-native data platform for life science companies focusing on integrating physical lab instruments with digital workflows.

In this episode, we discussed the challenge of integrating and automating processes in life science research labs and manufacturing facilities. We also explored what the lab in the future will look like and how companies are leveraging the cloud today to improve productivity and guarantee compliance.

Key Questions:

● What are the critical elements in automating the manufacturing process?

● What are the roles of the primary and secondary users in setting up the data infrastructure for the lab?

● How would you structure the tech stack, and where are you interfacing with other legacy systems and technologies

● How do people measure the objective benefits in the manufacturing and R&D side of the business?

Contents

Erik: Nathan, thanks for joining us on the podcast today. Nathan: Absolutely. A pleasure to be here. Erik: Nathan, you're calling in from Boston. I got to tell you that's my wife's favorite city. I'm here in Shanghai. She works with Philips, so she spent some time at their campus there. She loves it because she was there in the summer, and she was boating around. She wants to move there. I said, okay, you got to spend a winter there first, and then tell me about moving. It's a cool city. It's also obviously the right city for you, given the space you're in. I'd like to quickly touch on how you ended up there. Because just looking at your CV, you started as a trader at Goldman Sachs. Then you were working with a company, a firm, which is also in the payment space. And so, finance is a lucrative industry. It's often an industry that people aspire to break it into, especially a company like Goldman Sachs. What was the journey that led you from that into the world of life science? Nathan: Yeah, I love my work at Goldman, and I love finance academically as well. But I think, for me, I've always been drawn to technology and drawn to trying to find ways to automate things. I think in the role of something like trading, it's somewhat of a mechanical thing. You have this big machine of a bank that you've built. You're oiling it and taking care of it, and operating it. But I was really always drawn to build the machine. And so, I think that was the reason I went to a firm, which is a buy now, pay later company. I saw it had a very strong growth trajectory and reputation on the technology side. So, for me, it was an opportunity to help build out a financial institution from the ground up, and get in at an early level work with really strong engineers. I found, when I went there, that I love that. So, that was a great place for me. But as the firm grew, I think I also started realizing, okay, as much as I love finance, I'm really drawn to the life sciences as a place to make an impact. I think finance can be quite helpful for the world, not ethically. But I think that the life sciences are really what I'm drawn to. So, we always pushed in that direction, myself, and my co-founder who I met at a firm. I went to work at a company called Benchling, which is sort of like a CRM for wet lab life sciences operations — tracking your experiments, tracking the cells you're using, tracking the DNA, et cetera. But pretty quickly, I ended up deciding to launch Ganymede itself. It's been a pretty consistent trajectory, I'd say, over time. I love finance, but I just feel the gravitation towards technology and towards the life sciences. Erik: That decision to start a company is a heavy decision, because you're giving up a stable life and making a leap into a high-risk venture, with a high probability of failure. What was it that led you at that time — because you were working with quite an interesting company in the domain that you were interested in — to make the leap? Was it, you made a decision that I want to be an entrepreneur, or was it that you just saw a problem and said, I can't believe nobody's solving this problem; I think I can do it? What was it that triggered that idea at that time? Nathan: I think it was really the latter. I like to tell people when they are interested in launching a startup, especially in the enterprise B2B space, that ideas, in a way, are a dime a dozen. You can go around and ask people what their problems are, and they'll tell you their problems. You can build a pretty big list of problems pretty quickly. Especially in a big enterprise company, you can observe a lot of those upfront. But what, for me, really pushed me over the line was, exactly as you said, I started talking to people and asking around and seeing if people would be interested in something like what Ganymede does, what we can dive into. They didn't just say, "Yes, this is my problem," but they actually were saying, "Hey, when can you start building this because I need this right now?" So, between that. Then also finding and really, I think, syncing up with my co-founder, Benson, who is very eager to start, we realized, okay, this is the time. We need to put in our notice and start working on this immediately. I think that was finding the idea where people are actually asking to pay you. That's the resonance that really pushed us over the line. Erik: We're a consulting company, right? It's a digitalization consulting company, so we're in that natural position talking to people and understanding their problems, and hopefully solving some of them or pointing them towards companies like yours. Just in the past 12 months, a couple of companies in the consumer product space — perfume, and so forth — mentioned the exact same problem. In their lab, a bunch of manual processes, long lead times, decision bottlenecks, et cetera. I don't know. I imagine there's some overlap. There are some areas where there's unique problems in life sciences. It's interesting, because we weren't digging into that space at all. They just floated to the top. So, you get a sense that this is something. There's really a hunger for solutions. Before we get into the details of what Ganymede does, let's try to give a nice comprehensive perspective over the problem landscape here. What does it look like today, and what are the pain points that people are having? Nathan: I'll contextualize this by saying that a lot of where we started was very much analytical instrumentation in R&D labs, very life sciences-focused. We're expanding a lot into manufacturing and in broader areas like that, so I have some comments on that. Especially in R&D, I think the problem space that we observe is that things are incredibly complex. They're complex for a good reason. Biology as a science is a huge mess because of human evolution. There's not much abstraction built in to how our bodies are made compared to something like an integrated circuit, where its every layer is carefully human-designed. Especially in R&D, and then even once your R&D works and you start manufacturing some therapy or a drug that you've created, you've come up with something custom, and de novo. R&D is about trying new things. And so, that doesn't just mean that you're trying different arguments into the same function. It means that you're trying different functions. You're trying different processes. All of these things collide to make biology and pharmaceuticals very, very hard to describe and hard to pin down in terms of what their data structure is. I think, historically, something I certainly observed in all my research that I was doing, it's very hard to build apps in this space. It's very hard to actually automate anything. Because what are you automating? You can make the hardware work, but it's very hard to understand what the software is. It's very hard to understand what is the process that you're trying to capture in some automated setup. Especially at the R&D level, you have really always made it almost impossible for people to have any digitization that automates things rather than just capturing data in a more structured format, from what we've seen. There's sometimes point-to-point automation. People also talk about islands of automation when it comes to automated lab, robot arm kind of setups called 'work cells' in the space. But nothing is unified across the whole facility. Nothing is actually truly digitized in the way that I think I would certainly expect from a traditional enterprise standpoint of saying, hey, can I go log into some web portal, see my data here, my production process? No, because there's no way to describe it or capture it. I think that seems, to us, to be the core problem that really holds back, especially R&D for scientific biology. It's that it's too complex to pin down. And so, it's really hard to do anything with the data other than just log things in a disconnected way. That really inspired Ganymede. I think a lot of people have traditionally approached that by saying, "Hey, well, let's come together and pin down some standard for what are all the concepts in science. How can we create some taxonomy of these different things, so that we can start having lab instruments or apps output their data in this consistent format?" But that doesn't work. Those efforts have always eventually stalled. Because you can't pin down all the concepts in science. You can't really develop a no-code solution here. You can't say, "Oh, I'm going to go OPC style and have all these tags that I'm going to map into from A to B, like a middleware." It doesn't work because there's too many concepts floating around. I think that's where we really observed that there's a real need for software developers here to handle this level of complexity. Erik: Let's dig into that a little bit more. Because I guess, on a very high level, a lot of people would look at this R&D process. You could look at manufacturing process as kind of a synonym. Okay, I can map out the flows. We have equipment. Data is coming off equipment. We take actions. We put data into systems. On paper, they might end up looking actually quite similar. Is it that, in a manufacturing process, it's designed to be standardized, and you don't have too much room for the human brain to play or interpret? In this process, that's like a very essential element. It's having a human in the loop that's making autonomous decisions. Is that the critical element that's difficult, that has stymied previous efforts to automate this process and connect the notes? Nathan: I'd say that's a big one, yeah. I'd say there are really three characteristic data problems here that make it difficult, purely from a math perspective, to get this into traditional CRM-like tools or middleware-like tools. One is exactly what you said, where if you are in a manufacturing process, you can define your process. It'll be stable maybe for a year or two, at least, hopefully. In R&D, it seems to change every week. And so, that means that it's very difficult to have separate IT people or third-party consultants come in and work with the scientists to stabilize the process. Because it's going to change even as you're stabilizing it. So, there needs to be some de-evolution. We find oftentimes to the scientists on the ground to do the last mile, and then have the central parties or third parties building up the backbone in a more modular way. The second, I'd say, is that there's a lot of aggregation of data. The thing that breaks the OPC style point-to-point mapping paradigm of saying, okay, I have tags, A, B, and C, and these go to XYZ in a lot of analytical instruments, oftentimes we find actually the flavor is more like, "Hey, I did this experiment seven times, and I need to aggregate that down to an average. An average is what I care about." And so, you get this aggregation style mapping — one to many, or many to one. Those sorts of more complex relationships between data points are very hard to describe in no-code tools. But they're everywhere here. Then the third, I'd say, is exactly as you said as well. In R&D, when you do some analysis and you get some result out that's interesting, the interesting part means that you're going to do some complex decision and have branching logic and branching paths from there. It's not like you're doing QC and you're saying, okay, well, QC is pass, fail. With pass, fail, that's a very simple decision. But in R&D, you probably have many, many branching points. For all these reasons, it evolves over time. The statistics are a little more complex, so the relationships are not just one-to-one. You have lots and lots of branching even within the space of the experiment. I like to say that it's very complex over space and time. Erik: Got you. Okay. Well, let's get into what Ganymede does. But maybe before we get into the details, it'd be useful just for you to outline, at a high level, what's the aspiration. What are you trying to accomplish in the lab? Nathan: With all that said, I think hopefully that makes our approach pretty clear as a solution, which is to say we see this as a software engineering problem, where the solution — it's very hard to have any no code that works here. So, we want to make the perfect developer platform to get software engineers into the space, or let scientists and data scientists in the space code themselves. This has already happened outside of the wet lab in higher scale, data-driven parts of biology like gene sequencing. There's a whole discipline of data science now, bioinformatics, where people are building out pipelines. They're writing Python scripts. They're doing this large-scale computation pretty efficiently and creating a lot of code in custom business logic. So, we want to bring that into the wet lab for the first time, which I think is a very different approach than anyone has ever taken. I think everyone looks at the complexity here and always says, hey, let's get scientists involved. Let's give them no-code tools. That's how the last mile will be solved. We have perhaps a more pessimistic take, which is to say, no, it really needs to be someone who can code. The scientists can help advise them, but that person is going to be the bridge to handle this complex logic. It has been a very effective paradigm, so far. Our aspiration long term is to create, I would say, the bioinformatics and developer community of wet lab biology and manufacturing. There’re a few phrases that people, I think, in the space threw around that we definitely identify with, like LabOps. That's a big one. There's a company, Elemental Machines, that coined that. They do a lot of great work in the sensor and data space. I think that's similar to the way that industries like machine learning have gone, where in the old days, everyone was a data scientist and the data scientists would do the analysis. But there wasn't much of an infrastructure behind it. Nowadays, with companies like Databricks, everyone has also developed this discipline of machine learning ops, where you have people who are dedicated to just building out the flow and the infrastructure, connecting the different parts of the machine learning development process. Similarly, we want to bring that to the wet lab space. That's our long-term aspiration. Tactically, there's a lot of work to do in a lab instrument connectivity data infrastructure, the actual meat of our product that we're working on now. But long-term, we see this as something that needs a software engineering solution. Erik: If we look at the users in the deployment today and the problem space, we have the research teams that are using the lab. We have consumers of that information — who might be business, who might be in management who are consuming heavily-processed output from the lab. Then we have the people that are managing the technology, deploying the technology and managing those systems who historically are probably like the IT department, maybe IT people embedded in the research team in some companies but basically IT. So, that's really hooking systems up and integrating them. Now we're moving into this data science era, where then you get into the point, the question of like, okay, do we have data scientists? Where do they sit? Do they sit with IT? Do they sit with the lab? In my experience here, often, it'd be like one data science guy. Maybe at headquarters in larger companies, they have a proper team. But in a lot of cases, here in China, it's like this is our data scientist, which is a daunting task for that individual. Among those different people, who would be the primary users, and who would be the secondary users? Then what would be their roles in, not just using Ganymede but in general, in setting up this data infrastructure for the lab? Nathan: I think the overall set of users is very similar to what you described. I think, in a lot of ways, as you said earlier, biology R&D is a special case of manufacturing. It's very process-driven. It has these, I think, very complex phenomenon. So, it has the same IT people, data scientists, technician scientists, people on the floor involved. We end up our businesses as somewhat barbelled between, I would say, mid-late-stage R&D companies and large-scale pharmaceutical manufacturing. There's a space in the middle that we don't really touch, which is clinical trials. That's in between those stages. I'd say, on the R&D side, we end up usually working with the data scientists or process engineers pretty directly, sometimes scientists themselves pretty directly. We do always look for some sponsor or champion, I'd say, that can code and be that person in the lab. Then on the large-scale pharmaceutical manufacturing side, it's exactly as you described. IT is oftentimes our client. IT is more centralized. They are, oftentimes, I would say, in bio and pharmaceuticals, we're seeing there are a lot of data scientists that are embedded with the business. There's more than one. So, there's a pretty good discipline-invasive coding ability that is nascent, that's starting to appear out there in the business, in the facilities, wherever they are. And so, we can take those people and make them our champions. Tell them, hey, if you've been coding in scripts before, now we can code in Ganymede. Look, it makes this nice no-code interface out of it for the scientists. But ultimately, the user that's benefiting is definitely always the scientists at the end of the day. I think what we're trying to do is say, okay, the data scientists, whoever's coding in Ganymede — sometimes IT team themselves directly, or sometimes Ganymede as part of services — they are building tools and code that will then be used in no code by the scientists directly, and be to the benefit of the scientists or the factory technicians, and automate their workflows, automate their data entry, automate whenever they were moving around data on USB sticks. Between computers in the past, no more. Erik: Okay. That makes sense. Let's look then into the tech stack. The way I'd usually think about this is, you're touching the machine and then interfacing to the cloud, and then processing, and then sending data to applications so that they can be acted on by people. Maybe there's a different way that you structured this. How would you structure the tech stack? Then where are the layers where you're playing today? Where are you interfacing with other legacy systems or other technologies? Nathan: We, I would say, have two main layers. One is what we call our agent layer which is the interface to the machine, interface to also apps that are out there on the network or in other clouds. Getting the data, however, is needed. We have a whole variety of technologies for that. So, everything from a physical device that we have that can be plugged in to USB, Ethernet, RS-232, et cetera to a Windows-based file watcher in ScriptRunner, to we have a purely browser-based system, to an API-based network agent that can continuously sync data off of apps over API's. All of those things, for us, what they do is they will get the data, and they'll sync it up and create a real-time interface to the data for the Ganymede cloud. The Ganymede cloud will consume this data, keep it synced. Then every single data source is turned into a data frame, a table for us. So, we try to build almost, I would say, a database out of all the data sources we see. This is where things start to get quite distinct from the status quo, which is, we think that the best way to deal with this complexity is to treat it as a database software engineering problem. Rather than saying, hey, we're going to have logs of unstructured data in a JSON style or OPC style format, everything's a table. Our agents are going to put data into a tabular format. Our agents are not going to do any business logic. No business logic allowed on-prem. All business logic has to be defined in the cloud. That includes the drivers. So, we will send raw signals off of the devices and put them in the cloud, and then the driver to parse those is all cloud-based, which is great. Because it allows developers to go in and say, "Hey, I can see all the raw data coming off of any instrument. I can update the driver just here in the cloud, instantly." And so, that's where our cloud comes in, which is taking all of these data frames, putting them in a single control plane for you, and allowing people to actually write code and build code and run code in version in Ganymede as a true cloud platform. I'd say, although we're definitely far, far more specific than something like AWS or GCP, our capabilities to let users write and run their own codes, store data, version it, manage it are, more or less, on par with a true cloud provider. So, we've started at a very, very low level here in our trying to build up in the right direction for the life sciences, and generally anything that has this high level of complexity. Erik: Got you. Then at the application level, do you do data visualization? Do you do analysis, or you are primarily processing the information and making it available for other systems? Nathan: We have some native analysis and visualization built in. I would say we don't try to specialize in that. It's very hard to compete against something like Power BI or Tableau. And so, those can be hooked up. We have our data layer stores everything in a data lake format. That's very easy to connect to any of the BI tools that people may have. But we do have some native visualizations. Because we allow people to write arbitrary code, we can host the drivers. We can host the parsers to all these instruments and apps that are integrated. But we can also host the scientific analysis. That's the killer app here, which is that we're not just integrating the instruments and moving data from A to B, but we're also actually doing full statistics. There are some cases where we have full machine learning models that are just in the middle here. That's what makes it very effective for the life sciences, and I think has been the big gap. It's that all this stuff is so intertwined and tangled that you really do need a place where you can say, hey, my instrument driver is right here. Then on my statistical model is right here. They bleed together. And you know what? That's fine. We'll host them both. We're, I think, doing a pretty complete end-to-end process of saying, "Hey, you ran your instrument. We'll collect all the data. Do that analysis of saying, hey, you ran it seven times. Here's the standard deviation. We're going to put the standard deviation statistic into your app to record it. That's, I'd say, the end-to-end that ends up being built in Ganymede. That's all user-defined code. We have a lot of it off the shelf, but you can always open it up and edit it or build your own. Erik: Question about the task of getting data off the machine. So, I was working on a project earlier this year with a chemical company in their lab environment. They were trying to get data off some machines. In some cases, the only solution that seemed to work was having a camera, look at the machine and read the data off the interface. Okay, it's 45. Then all you basically get are the numbers that are physically visible, maybe a red light that flashes on and off. Chemicals is maybe a more price-sensitive industry. Maybe they're working with older equipment than a lot of the life science companies. They're also just older companies, so maybe they have more legacy equipment. But what's the situation in the companies that you're working with? Is it 99% of equipment that you're able to plug into and extract digital data? Do you also have to find workarounds to get analog data from devices that just don't interface? Nathan: We definitely have to find workarounds. I would say, luckily, because of this statistical nature of what a lot of the instruments are doing in the life sciences, they do have a higher chance of emitting a file that you can parse and process. But still, what you described of, hey, here's some instrument that's completely locked down, there's no port, you got to point a camera at the screen and read the data off, that definitely happens as well. I think that's where we've said our paradigm is going to be, that you know what, we know there's no silver bullet. There's no single standard that everything's going to conform to. We're not going to treat the entire world like it's going to be OPC accessible or have a web API or something like that. Instead, we're going to assume that everything is always going to be like that — super fragmented, really hard to connect to — and build different methods of getting the data with different agents. Then let users define how they want to parse the business logic. These cases where data is super clean, we can get it off some API or some file that's very easy to parse and bring in. Then there's cases where data is super messy, and you have very awkward ways of getting the data in. And so, that's oftentimes where we can do OCR. We have a system for that in Ganymede for parsing things like images or PDFs as well, if that's how the data is coming in. We can integrate with a barcode scanner. We can also just let users, developers create a form in Ganymede for users to type their data in, if needed. I think we try to assume that there's no silver bullet, and that these really difficult integrations will happen and be everywhere. We'll just try to glue it together as much as we can. But we're not going to say "Oh, here's the only way that we can connect. And so, we can't touch this." In practice, I think we end up being able to cover 95%, 99% of what people have. Because we'll go as far as the instrument will allow us in terms of integrating it. We'll do what we can and try to create the smoothest experience possible. But it may not always be possible to actually have it be completely automated end to end. But in many cases, it is. Erik: Yeah, I guess this job at least should get easier over time, right? I think the newer equipment coming out tends to be more connectable. Nathan: Mostly. Erik: Quick question around — yeah, please. Nathan: I was just going to say mostly. I think the industry is turning in the right direction, but it's a slow process. Hopefully, everything gets better. Erik: Yeah, that's right. Because it's not just a technical issue, right? There's a whole mentality around building walls versus making your solution connected. I guess, that's a corporate strategy decision as much as technical. Nathan: That's right, yeah. Erik: Let's see. So, you mentioned earlier that you don't touch clinical trials. Why not? What are the unique complications around that? Is it just not interesting as a business problem? What's the reason? Nathan: I'd say, it's just a very different area. Our specialty has been saying, "Hey, you have all these lab instruments. You have all these statistics that you're doing. We can bring them all into one place to let you write code on top of them and automate them." Clinical trials, there's a whole different raft of software there. It's much more about doctor-patient interactions. It's also much more regulated. We don't want to touch any data that has human identifiable information in it. In terms of what our platform is meant for, our platform is meant for just process operational data. And so, it's a different regulatory regime. It's a very different data problem, I'd say. There are actually a much more clear set of tools that you're using in these clinical trials. You know the different acronyms for the five, seven different types of applications you're using. It's more like the data is incredibly messy within them. It's more of a data cleaning problem oftentimes, we find. There are other companies that are focused on data cleaning, and we'll let them do their thing. We're more focused on data structuring and analysis structuring, which suits the wet lab and physical processes better. Erik: Let's try to make this as tangible as possible. I think the best way to do that is to walk through an end-to-end case — from what was the challenge that they had to deploying the solution. You could maybe blend two or three cases together as well, if that helps. But I think it would also be interesting to understand what are the KPIs or the specific impacts that people are looking to realize. Is it short in a certain process, time or improving quality control to a certain measure? What are the benefits, the objective benefits that are aimed at here? Nathan: Yeah, I'll give you two cases: one in higher scale manufacturing, and then one on more R&D side of the business. On the manufacturing side, we have a partnership with a company called Apprentice. Apprentice is a manufacturing execution system for pharma. They themselves are a startup, much bigger than us. They've raised over $100 million. They're fantastic to work with. What they do is, through their MES, they have an application that technicians on the factory floor in pharma manufacturing are using to describe their processes, to capture data, have their electronic batch records, and so on. Because of that, Apprentice does have a pretty big desire to get more natively integrated with a lot of the instruments and machinery that people are using on the factory floors. Because this pharma manufacturing is incredibly locked down and regulated, so it's very high value to the clients and to the IT and compliance folks in those spaces to not just automate data entry for speeding up the technicians, although that's a big plus, but also for compliance reasons. Being able to record data and have certainty that whatever people did was actually true is quite important. They came to us, Apprentice, and said, "Hey, we have a whole bunch of clients that could really benefit from having scales, laboratory balances integrated more deeply." In the pharma manufacturing process, when you're manufacturing a drug, one of the first steps oftentimes is to just weigh out different components of the drug or the pill that you're going to prepare — the active pharmaceutical ingredients, the matrix that those are going to sit in, whatever it is. One of the problems that they face in that space is that they'll establish tolerances of, okay, how much do you need to weigh out of this? You may say, okay, I can weigh out 50 to 55 milligrams, because the recipe says 52 milligrams. So, my tolerance is 50 to 55. If it's above that or below that, I'm going to have to reject the batch and throw it out. What ends up happening is that people will weigh out 56 milligrams because they're not being careful. They'll say, "You know what? Actually, it's good enough. I'll call it 55. So, it's intolerances. I can continue with my day." That's a huge problem. That's because you have people recording data manually. It's not coming directly off of anything. They're just looking at the screen, of a scale, and writing down the number. For them, what we did was we said, okay, let's get these scales more natively integrated. How do scales integrate? For the most part, they have no PC. They have no network component. They just have a USB port, or an Ethernet port, or an RS-232 port. And so, we use our edge devices, connect with them. The edge devices, rather than having a parser or a business logic layer for standardizing the data on the edge device, they actually just take the signals coming off the USB port, or RS-232, or Ethernet, and send them up to the cloud verbatim. Equivalently, they'll take signals from the cloud and put them back into the scale, verbatim, to control the scale. I'd say, this is where our approach helps a lot. Because what that means is that these scales which all have super different, there's very little — it's not OPC. It's literally strings coming over the wire. You'll have something like w19G, which means we weighed 19 grams, or T for tier. Every scale has completely different implementations. They're poorly documented. Some of the documentation is just wrong, especially when you're going across many multiple manufacturers. So, that's where our cloud-based approach is very effective. Because we were able to say, hey, we'll just get all the strings off the scale, and then process them with a cloud-based driver in real time. Then put them into a web API for the apprentice application to consume, to read from and write to. Through that, we're able to, very efficiently and very quickly, write new drivers for new scale types. Versus a lot of the manufacturers have different — they have some connectivity layers that they've built already. For instance, Mettler Toledo, a big scale manufacturer, has what is called LabX. That is a data layer for their scales, but it's very specific. It's very hard to set up. It's very filled with semantics of how Mettler Toledo thinks about the world. Whereas Apprentice's experience and, by extension, Apprentice's clients experience with Ganymede is that we're effectively a magic wand, where when you say I point out at scale, it doesn't matter what manufacturer it is. As long as it has some connector, we'll hook up the edge device. We'll just say, hey, we see all the strings coming off. We can look up the documentation, and then create this mapping very quickly into what these strings mean. It's all purely web-based. You don't need anyone. Once the thing is connected on-prem, you don't need anyone there anymore. You can just start coding on the web from anywhere in the world and get it done very quickly. For a manufacturing execution system and for these regulated manufacturing clients, I think that's pretty transformative. Because they're able to really radically accelerate this from something that would take months to integrate scales and might not be worth it, to something that now takes days or weeks per scale type. Similarly, I think the same holds true in scientific wet labs, where the problem is a little bit different but the same value prop holds of saying, it's very important for me here. Less compliance, more accelerating my scientists because they spent so much time doing data entry today. But it's the same thing. Again, if I have all these lab instruments I bought, the lab instruments just will emit a file onto their OPC that's attached to them. That's it. They're done. People end up having to open up those files locally and do some analysis in Excel or import the files into some software to do some analysis, and then do that analysis and put it into an app like a Benchling, or a manufacturing execution system, or LES, or whatever it may be. There's this very consistent flow, which is very error-prone and very manual data intensive of saying, hey, I got to get this file. I got to manually do an analysis. Then I got to put it by hand into some app or something. We, I think, have a much better approach. We're saying, okay, we're going to hook up to the instrument. We'll get the file, just host it raw in Ganymede, and then automatically have the parser built out in the software layer. Automatically, do the analysis. Then automatically, put that final result into where it needs to go. For the scientists, it's the same value prop in a way, more focused on saving time than compliance here. But they speed up by probably 10% to 20%, I would say. They spend hours and hours and hours every week doing data entry. It just vanishes. So, it really radically accelerates people. In science, it reduces the cycle times of how long it takes to do an experiment and then see the data come in to debug it and decide what your next experiment will be. Also, in an environment like this, it helps save headcount. Because you don't need as many scientists to run the same lab processes. Erik: Okay. Great. It sounds like a very high-value solution in both cases, and somewhat intuitive that this is a solution that the market needs. You already mentioned that people are basically asking you, when can I buy it? I just had a look at some of your figures. You're relatively a young company, 10, 11 months old. You've raised something like USD$15 million. Obviously, investors are quite confident that there's a business here. What's the answer to the question, why now? This feels like a problem that needs to be solved. Why haven't Mettler Toledo, why hasn't X large equipment manufacturer that's been selling into these companies for the past decade, why hasn't this basically been a solved problem? Nathan: We've definitely grown super, super quickly, so far. I think it is because there's such a need here, and it's so unsolved. It's a good question. I would say, there's probably two big reasons that this is fairly unsolved. One is that, the way that people have approached this is always saying it's a very local solution or very app-like or internal company platform-like solution that they end up developing. Say, Mettler Toledo, for instance. They're great, but they don't specialize in software. As you said, manufacturers are often incentivized to try and reach these local optima, where they're only offering a solution that connects to their machinery. And so, that undermines the value prop of people being able to connect to everything. They end up having to have a dozen different connection layers for the dozen different instrument manufacturers and apps that they have involved across their business. Oftentimes, people sometimes do have these things already installed. Then we end up being the connection layer between all these connection layers, just because they're all so fragmented and disconnected from each other. Then the other piece, I'd say, is that there's very little true software engineering in the space. It's a very hard space to be a software engineer in. Because people want to build apps oftentimes, and you can't really build apps here. People want to build no-code solutions, which are very effective in other industries. But you can't really do that here. The traditional paradigm of what software engineers do doesn't work that well. The only thing that we observe that really works is developer platforms and cloud infrastructure at a very raw level, which is, I would say, a very extreme type of software engineering — which you don't get much of in bio because there's very few cloud infrastructure engineers paying attention to this space for good reason, traditionally. But I think finally now, for us, we're really bringing in a ton of software engineering talent, a ton of cloud infrastructure engineering talent, and really bringing a sledgehammer into the space in a way that hasn't happened before. I'd say, that's the other aspect. We're able to build at a much deeper level and leverage a lot more deep software engineering talent then exist within the industry today. Erik: Fascinating. Well, obviously, you're a sharp guy. It looks like you have a great founding team. It sounds like you've really found the right time and the right problem as well. If we look forward next 12 months, 24 months, what is on the horizon for you? What are the problems that you need to tackle in the next couple development cycles? Nathan: 12 to 24 months is a very long time in startup land, I'll say. We have a lot of plans during that time. Where we are right now as a company, I think, within the year here, in 2022, we've built out a lot in terms of our core back end. We have the rails and the capabilities at a back-end level to do really any implementation, to get any instrument or device or app wired up. Now, from here, where we're working at the moment is productizing more of a web app to be able to, instead of just interacting with Ganymede's systems over a command line, actually have a graphical interface for that. Then the next step, going into next year, is to say, okay, this web app is now available for people to self-serve, log into. I think, as much as we're doing a lot of big traditional enterprise sales, we're very, very focused on saying that we want to create a tool that's very good as a self-service tool for developers at these biotechs or at these pharma companies so they can use Ganymede without ever talking to us. That's a huge advantage for them. Because then, they can skip the whole enterprise sales cycle. They can move much faster. They don't have to talk to us. Although, we will always have big enterprise deals where we're very deeply involved, it also helps us because we're making sure that the platform is very robust and very modular enough that someone could self-serve use it. That also helps us internally. Because our internal developers will also benefit from that. We see them somewhat as equivalent to the external developers. Long way to say, I think that self-service capabilities are a big one for us. The ability to go onto our website and actually just open up and spin up again a new environment from nothing is big for us. It's very difficult for, I think, a lot of companies in the space to have solutions like this to do because they are not focused on cloud infrastructure. The idea of automatically spinning up an entire cloud infrastructure environment just through the click of a button is very daunting. But that's where we've come from. Natively, that's in our DNA. So, we want to lean into that, really leverage that, and go further from there. Looking further into 2023, I think a couple other themes that we'll be touching on are, one, right now, Ganymede is a very cohesive, integrated platform where you can do a lot of things. But we want to start decoupling that and making it more modular in saying, hey, if you just want to use the Ganymede's database because it's a great database for manufacturing for the life sciences, here it is. Go crazy with it on its own. If you just want to use our computing layer because it's very well set up, or if you just want to use our agents and install them yourself, you can use them in isolation. We'll start making things more open source, open core, so that you can download and use Ganymede within your systems. We'll never, I think, do fully on-prem because we see the future as being in the cloud. But we're working towards people being able to self-host Ganymede in their cloud. Since that's oftentimes a big demand from customers, they've invested in these big cloud layers that they've built out internally as their private clouds. Why not be able to run Ganymede in that environment? Then I think, longer term, where we're really focused is although we work very much with the biotechs and the pharmaceutical companies who are getting the direct value out of the automation, long term, if they're more self-service, we actually want to mostly be focused on the things that are integrating into this. We want to be able to say things like, hey, anytime that you get a new instrument from a manufacturer that we're partnered with and you turn it on, it's going to create a new Ganymede cloud of, say, my instrument.client.com. You'll just be able to go into there and see all the data there on the web. No more having to go into files on the local PC. Everything just writes its data directly to the cloud natively. Same for apps. We want to build the right infrastructure and connectivity layer to have all these different things and the bio lab talk to each other at that infrastructure level, and really start building out a better layer for how things connect. That's our long-term goal. I think looking out at where the industry has gone, there has been so many failed attempts here to try and build better centralized layers for the lab. A lot of those have failed because people try to weave in the scientific semantics to try and make the analysis itself just work out of the box. Our take is that, it'll never work out of the box. It's always going to need — the business logic will always need to be defined de novo. But what you can do is provide the right infrastructure layer for the data to be available. And so, that's what we're obsessed with. I think that's a lot of what we'll start focusing on in 2024. It's very strong tooling for instrument manufacturers, for apps, et cetera to be able to couple Ganymede very deeply as their cloud layer that represents the cloud presence of this on-prem machine. Erik: Awesome. Well, I'm going to have to invite you back on maybe in 12 months. You are right, 12 months is a lot of time in the startup world. You, guys, clearly move fast. I've worked on a fair number of corporate innovation projects, where 12 months is just long enough to have some internal alignment on, what direction you want to move in. Nathan, great business that you're building. Thank you for walking us through it today. Really, I would love to chat in 12 months, 18 months, and see where you are there. Nathan: Yeah, I would love to check in. Some things will still be in the same sales pipeline, exactly as you said. The corporate innovation processes 12, 18 months, it'll be exactly where it is. Then hopefully, our engineering team can have built out the entire universe in the meantime. So, we'll see how far we can get.

Erik: Nathan, thanks for joining us on the podcast today.

Nathan: Absolutely. A pleasure to be here.

Erik: Nathan, you're calling in from Boston. I got to tell you that's my wife's favorite city. I'm here in Shanghai. She works with Philips, so she spent some time at their campus there. She loves it because she was there in the summer, and she was boating around. She wants to move there. I said, okay, you got to spend a winter there first, and then tell me about moving. It's a cool city. It's also obviously the right city for you, given the space you're in.

I'd like to quickly touch on how you ended up there. Because just looking at your CV, you started as a trader at Goldman Sachs. Then you were working with a company, a firm, which is also in the payment space. And so, finance is a lucrative industry. It's often an industry that people aspire to break it into, especially a company like Goldman Sachs. What was the journey that led you from that into the world of life science?

Nathan: Yeah, I love my work at Goldman, and I love finance academically as well. But I think, for me, I've always been drawn to technology and drawn to trying to find ways to automate things. I think in the role of something like trading, it's somewhat of a mechanical thing. You have this big machine of a bank that you've built. You're oiling it and taking care of it, and operating it. But I was really always drawn to build the machine. And so, I think that was the reason I went to a firm, which is a buy now, pay later company. I saw it had a very strong growth trajectory and reputation on the technology side. So, for me, it was an opportunity to help build out a financial institution from the ground up, and get in at an early level work with really strong engineers. I found, when I went there, that I love that. So, that was a great place for me.

But as the firm grew, I think I also started realizing, okay, as much as I love finance, I'm really drawn to the life sciences as a place to make an impact. I think finance can be quite helpful for the world, not ethically. But I think that the life sciences are really what I'm drawn to. So, we always pushed in that direction, myself, and my co-founder who I met at a firm. I went to work at a company called Benchling, which is sort of like a CRM for wet lab life sciences operations — tracking your experiments, tracking the cells you're using, tracking the DNA, et cetera. But pretty quickly, I ended up deciding to launch Ganymede itself. It's been a pretty consistent trajectory, I'd say, over time. I love finance, but I just feel the gravitation towards technology and towards the life sciences.

Erik: That decision to start a company is a heavy decision, because you're giving up a stable life and making a leap into a high-risk venture, with a high probability of failure. What was it that led you at that time — because you were working with quite an interesting company in the domain that you were interested in — to make the leap? Was it, you made a decision that I want to be an entrepreneur, or was it that you just saw a problem and said, I can't believe nobody's solving this problem; I think I can do it? What was it that triggered that idea at that time?

Nathan: I think it was really the latter. I like to tell people when they are interested in launching a startup, especially in the enterprise B2B space, that ideas, in a way, are a dime a dozen. You can go around and ask people what their problems are, and they'll tell you their problems. You can build a pretty big list of problems pretty quickly. Especially in a big enterprise company, you can observe a lot of those upfront.

But what, for me, really pushed me over the line was, exactly as you said, I started talking to people and asking around and seeing if people would be interested in something like what Ganymede does, what we can dive into. They didn't just say, "Yes, this is my problem," but they actually were saying, "Hey, when can you start building this because I need this right now?" So, between that. Then also finding and really, I think, syncing up with my co-founder, Benson, who is very eager to start, we realized, okay, this is the time. We need to put in our notice and start working on this immediately. I think that was finding the idea where people are actually asking to pay you. That's the resonance that really pushed us over the line.

Erik: We're a consulting company, right? It's a digitalization consulting company, so we're in that natural position talking to people and understanding their problems, and hopefully solving some of them or pointing them towards companies like yours. Just in the past 12 months, a couple of companies in the consumer product space — perfume, and so forth — mentioned the exact same problem. In their lab, a bunch of manual processes, long lead times, decision bottlenecks, et cetera. I don't know. I imagine there's some overlap. There are some areas where there's unique problems in life sciences.

It's interesting, because we weren't digging into that space at all. They just floated to the top. So, you get a sense that this is something. There's really a hunger for solutions. Before we get into the details of what Ganymede does, let's try to give a nice comprehensive perspective over the problem landscape here. What does it look like today, and what are the pain points that people are having?

Nathan: I'll contextualize this by saying that a lot of where we started was very much analytical instrumentation in R&D labs, very life sciences-focused. We're expanding a lot into manufacturing and in broader areas like that, so I have some comments on that. Especially in R&D, I think the problem space that we observe is that things are incredibly complex. They're complex for a good reason.

Biology as a science is a huge mess because of human evolution. There's not much abstraction built in to how our bodies are made compared to something like an integrated circuit, where its every layer is carefully human-designed. Especially in R&D, and then even once your R&D works and you start manufacturing some therapy or a drug that you've created, you've come up with something custom, and de novo. R&D is about trying new things. And so, that doesn't just mean that you're trying different arguments into the same function. It means that you're trying different functions. You're trying different processes.

All of these things collide to make biology and pharmaceuticals very, very hard to describe and hard to pin down in terms of what their data structure is. I think, historically, something I certainly observed in all my research that I was doing, it's very hard to build apps in this space. It's very hard to actually automate anything. Because what are you automating? You can make the hardware work, but it's very hard to understand what the software is. It's very hard to understand what is the process that you're trying to capture in some automated setup. Especially at the R&D level, you have really always made it almost impossible for people to have any digitization that automates things rather than just capturing data in a more structured format, from what we've seen.

There's sometimes point-to-point automation. People also talk about islands of automation when it comes to automated lab, robot arm kind of setups called 'work cells' in the space. But nothing is unified across the whole facility. Nothing is actually truly digitized in the way that I think I would certainly expect from a traditional enterprise standpoint of saying, hey, can I go log into some web portal, see my data here, my production process? No, because there's no way to describe it or capture it. I think that seems, to us, to be the core problem that really holds back, especially R&D for scientific biology. It's that it's too complex to pin down. And so, it's really hard to do anything with the data other than just log things in a disconnected way.

That really inspired Ganymede. I think a lot of people have traditionally approached that by saying, "Hey, well, let's come together and pin down some standard for what are all the concepts in science. How can we create some taxonomy of these different things, so that we can start having lab instruments or apps output their data in this consistent format?" But that doesn't work. Those efforts have always eventually stalled. Because you can't pin down all the concepts in science. You can't really develop a no-code solution here. You can't say, "Oh, I'm going to go OPC style and have all these tags that I'm going to map into from A to B, like a middleware." It doesn't work because there's too many concepts floating around. I think that's where we really observed that there's a real need for software developers here to handle this level of complexity.

Erik: Let's dig into that a little bit more. Because I guess, on a very high level, a lot of people would look at this R&D process. You could look at manufacturing process as kind of a synonym. Okay, I can map out the flows. We have equipment. Data is coming off equipment. We take actions. We put data into systems. On paper, they might end up looking actually quite similar. Is it that, in a manufacturing process, it's designed to be standardized, and you don't have too much room for the human brain to play or interpret? In this process, that's like a very essential element. It's having a human in the loop that's making autonomous decisions. Is that the critical element that's difficult, that has stymied previous efforts to automate this process and connect the notes?

Nathan: I'd say that's a big one, yeah. I'd say there are really three characteristic data problems here that make it difficult, purely from a math perspective, to get this into traditional CRM-like tools or middleware-like tools. One is exactly what you said, where if you are in a manufacturing process, you can define your process. It'll be stable maybe for a year or two, at least, hopefully. In R&D, it seems to change every week. And so, that means that it's very difficult to have separate IT people or third-party consultants come in and work with the scientists to stabilize the process. Because it's going to change even as you're stabilizing it. So, there needs to be some de-evolution. We find oftentimes to the scientists on the ground to do the last mile, and then have the central parties or third parties building up the backbone in a more modular way.

The second, I'd say, is that there's a lot of aggregation of data. The thing that breaks the OPC style point-to-point mapping paradigm of saying, okay, I have tags, A, B, and C, and these go to XYZ in a lot of analytical instruments, oftentimes we find actually the flavor is more like, "Hey, I did this experiment seven times, and I need to aggregate that down to an average. An average is what I care about." And so, you get this aggregation style mapping — one to many, or many to one. Those sorts of more complex relationships between data points are very hard to describe in no-code tools. But they're everywhere here.

Then the third, I'd say, is exactly as you said as well. In R&D, when you do some analysis and you get some result out that's interesting, the interesting part means that you're going to do some complex decision and have branching logic and branching paths from there. It's not like you're doing QC and you're saying, okay, well, QC is pass, fail. With pass, fail, that's a very simple decision. But in R&D, you probably have many, many branching points. For all these reasons, it evolves over time. The statistics are a little more complex, so the relationships are not just one-to-one. You have lots and lots of branching even within the space of the experiment. I like to say that it's very complex over space and time.

Erik: Got you. Okay. Well, let's get into what Ganymede does. But maybe before we get into the details, it'd be useful just for you to outline, at a high level, what's the aspiration. What are you trying to accomplish in the lab?

Nathan: With all that said, I think hopefully that makes our approach pretty clear as a solution, which is to say we see this as a software engineering problem, where the solution — it's very hard to have any no code that works here. So, we want to make the perfect developer platform to get software engineers into the space, or let scientists and data scientists in the space code themselves. This has already happened outside of the wet lab in higher scale, data-driven parts of biology like gene sequencing. There's a whole discipline of data science now, bioinformatics, where people are building out pipelines. They're writing Python scripts. They're doing this large-scale computation pretty efficiently and creating a lot of code in custom business logic. So, we want to bring that into the wet lab for the first time, which I think is a very different approach than anyone has ever taken.

I think everyone looks at the complexity here and always says, hey, let's get scientists involved. Let's give them no-code tools. That's how the last mile will be solved. We have perhaps a more pessimistic take, which is to say, no, it really needs to be someone who can code. The scientists can help advise them, but that person is going to be the bridge to handle this complex logic. It has been a very effective paradigm, so far.

Our aspiration long term is to create, I would say, the bioinformatics and developer community of wet lab biology and manufacturing. There’re a few phrases that people, I think, in the space threw around that we definitely identify with, like LabOps. That's a big one. There's a company, Elemental Machines, that coined that. They do a lot of great work in the sensor and data space. I think that's similar to the way that industries like machine learning have gone, where in the old days, everyone was a data scientist and the data scientists would do the analysis. But there wasn't much of an infrastructure behind it.

Nowadays, with companies like Databricks, everyone has also developed this discipline of machine learning ops, where you have people who are dedicated to just building out the flow and the infrastructure, connecting the different parts of the machine learning development process. Similarly, we want to bring that to the wet lab space. That's our long-term aspiration. Tactically, there's a lot of work to do in a lab instrument connectivity data infrastructure, the actual meat of our product that we're working on now. But long-term, we see this as something that needs a software engineering solution.

Erik: If we look at the users in the deployment today and the problem space, we have the research teams that are using the lab. We have consumers of that information — who might be business, who might be in management who are consuming heavily-processed output from the lab. Then we have the people that are managing the technology, deploying the technology and managing those systems who historically are probably like the IT department, maybe IT people embedded in the research team in some companies but basically IT. So, that's really hooking systems up and integrating them.

Now we're moving into this data science era, where then you get into the point, the question of like, okay, do we have data scientists? Where do they sit? Do they sit with IT? Do they sit with the lab? In my experience here, often, it'd be like one data science guy. Maybe at headquarters in larger companies, they have a proper team. But in a lot of cases, here in China, it's like this is our data scientist, which is a daunting task for that individual. Among those different people, who would be the primary users, and who would be the secondary users? Then what would be their roles in, not just using Ganymede but in general, in setting up this data infrastructure for the lab?

Nathan: I think the overall set of users is very similar to what you described. I think, in a lot of ways, as you said earlier, biology R&D is a special case of manufacturing. It's very process-driven. It has these, I think, very complex phenomenon. So, it has the same IT people, data scientists, technician scientists, people on the floor involved. We end up our businesses as somewhat barbelled between, I would say, mid-late-stage R&D companies and large-scale pharmaceutical manufacturing. There's a space in the middle that we don't really touch, which is clinical trials. That's in between those stages.

I'd say, on the R&D side, we end up usually working with the data scientists or process engineers pretty directly, sometimes scientists themselves pretty directly. We do always look for some sponsor or champion, I'd say, that can code and be that person in the lab. Then on the large-scale pharmaceutical manufacturing side, it's exactly as you described. IT is oftentimes our client. IT is more centralized. They are, oftentimes, I would say, in bio and pharmaceuticals, we're seeing there are a lot of data scientists that are embedded with the business. There's more than one. So, there's a pretty good discipline-invasive coding ability that is nascent, that's starting to appear out there in the business, in the facilities, wherever they are. And so, we can take those people and make them our champions. Tell them, hey, if you've been coding in scripts before, now we can code in Ganymede. Look, it makes this nice no-code interface out of it for the scientists.

But ultimately, the user that's benefiting is definitely always the scientists at the end of the day. I think what we're trying to do is say, okay, the data scientists, whoever's coding in Ganymede — sometimes IT team themselves directly, or sometimes Ganymede as part of services — they are building tools and code that will then be used in no code by the scientists directly, and be to the benefit of the scientists or the factory technicians, and automate their workflows, automate their data entry, automate whenever they were moving around data on USB sticks. Between computers in the past, no more.

Erik: Okay. That makes sense. Let's look then into the tech stack. The way I'd usually think about this is, you're touching the machine and then interfacing to the cloud, and then processing, and then sending data to applications so that they can be acted on by people. Maybe there's a different way that you structured this. How would you structure the tech stack? Then where are the layers where you're playing today? Where are you interfacing with other legacy systems or other technologies?

Nathan: We, I would say, have two main layers. One is what we call our agent layer which is the interface to the machine, interface to also apps that are out there on the network or in other clouds. Getting the data, however, is needed. We have a whole variety of technologies for that. So, everything from a physical device that we have that can be plugged in to USB, Ethernet, RS-232, et cetera to a Windows-based file watcher in ScriptRunner, to we have a purely browser-based system, to an API-based network agent that can continuously sync data off of apps over API's.

All of those things, for us, what they do is they will get the data, and they'll sync it up and create a real-time interface to the data for the Ganymede cloud. The Ganymede cloud will consume this data, keep it synced. Then every single data source is turned into a data frame, a table for us. So, we try to build almost, I would say, a database out of all the data sources we see. This is where things start to get quite distinct from the status quo, which is, we think that the best way to deal with this complexity is to treat it as a database software engineering problem.

Rather than saying, hey, we're going to have logs of unstructured data in a JSON style or OPC style format, everything's a table. Our agents are going to put data into a tabular format. Our agents are not going to do any business logic. No business logic allowed on-prem. All business logic has to be defined in the cloud. That includes the drivers. So, we will send raw signals off of the devices and put them in the cloud, and then the driver to parse those is all cloud-based, which is great. Because it allows developers to go in and say, "Hey, I can see all the raw data coming off of any instrument. I can update the driver just here in the cloud, instantly." And so, that's where our cloud comes in, which is taking all of these data frames, putting them in a single control plane for you, and allowing people to actually write code and build code and run code in version in Ganymede as a true cloud platform.

I'd say, although we're definitely far, far more specific than something like AWS or GCP, our capabilities to let users write and run their own codes, store data, version it, manage it are, more or less, on par with a true cloud provider. So, we've started at a very, very low level here in our trying to build up in the right direction for the life sciences, and generally anything that has this high level of complexity.

Erik: Got you. Then at the application level, do you do data visualization? Do you do analysis, or you are primarily processing the information and making it available for other systems?

Nathan: We have some native analysis and visualization built in. I would say we don't try to specialize in that. It's very hard to compete against something like Power BI or Tableau. And so, those can be hooked up. We have our data layer stores everything in a data lake format. That's very easy to connect to any of the BI tools that people may have. But we do have some native visualizations.

Because we allow people to write arbitrary code, we can host the drivers. We can host the parsers to all these instruments and apps that are integrated. But we can also host the scientific analysis. That's the killer app here, which is that we're not just integrating the instruments and moving data from A to B, but we're also actually doing full statistics. There are some cases where we have full machine learning models that are just in the middle here. That's what makes it very effective for the life sciences, and I think has been the big gap. It's that all this stuff is so intertwined and tangled that you really do need a place where you can say, hey, my instrument driver is right here. Then on my statistical model is right here. They bleed together. And you know what? That's fine. We'll host them both.

We're, I think, doing a pretty complete end-to-end process of saying, "Hey, you ran your instrument. We'll collect all the data. Do that analysis of saying, hey, you ran it seven times. Here's the standard deviation. We're going to put the standard deviation statistic into your app to record it. That's, I'd say, the end-to-end that ends up being built in Ganymede. That's all user-defined code. We have a lot of it off the shelf, but you can always open it up and edit it or build your own.

Erik: Question about the task of getting data off the machine. So, I was working on a project earlier this year with a chemical company in their lab environment. They were trying to get data off some machines. In some cases, the only solution that seemed to work was having a camera, look at the machine and read the data off the interface. Okay, it's 45. Then all you basically get are the numbers that are physically visible, maybe a red light that flashes on and off. Chemicals is maybe a more price-sensitive industry. Maybe they're working with older equipment than a lot of the life science companies. They're also just older companies, so maybe they have more legacy equipment. But what's the situation in the companies that you're working with? Is it 99% of equipment that you're able to plug into and extract digital data? Do you also have to find workarounds to get analog data from devices that just don't interface?

Nathan: We definitely have to find workarounds. I would say, luckily, because of this statistical nature of what a lot of the instruments are doing in the life sciences, they do have a higher chance of emitting a file that you can parse and process. But still, what you described of, hey, here's some instrument that's completely locked down, there's no port, you got to point a camera at the screen and read the data off, that definitely happens as well. I think that's where we've said our paradigm is going to be, that you know what, we know there's no silver bullet. There's no single standard that everything's going to conform to. We're not going to treat the entire world like it's going to be OPC accessible or have a web API or something like that. Instead, we're going to assume that everything is always going to be like that — super fragmented, really hard to connect to — and build different methods of getting the data with different agents. Then let users define how they want to parse the business logic.

These cases where data is super clean, we can get it off some API or some file that's very easy to parse and bring in. Then there's cases where data is super messy, and you have very awkward ways of getting the data in. And so, that's oftentimes where we can do OCR. We have a system for that in Ganymede for parsing things like images or PDFs as well, if that's how the data is coming in. We can integrate with a barcode scanner. We can also just let users, developers create a form in Ganymede for users to type their data in, if needed. I think we try to assume that there's no silver bullet, and that these really difficult integrations will happen and be everywhere. We'll just try to glue it together as much as we can. But we're not going to say "Oh, here's the only way that we can connect. And so, we can't touch this."

In practice, I think we end up being able to cover 95%, 99% of what people have. Because we'll go as far as the instrument will allow us in terms of integrating it. We'll do what we can and try to create the smoothest experience possible. But it may not always be possible to actually have it be completely automated end to end. But in many cases, it is.

Erik: Yeah, I guess this job at least should get easier over time, right? I think the newer equipment coming out tends to be more connectable.

Nathan: Mostly.

Erik: Quick question around — yeah, please.

Nathan: I was just going to say mostly. I think the industry is turning in the right direction, but it's a slow process. Hopefully, everything gets better.

Erik: Yeah, that's right. Because it's not just a technical issue, right? There's a whole mentality around building walls versus making your solution connected. I guess, that's a corporate strategy decision as much as technical.

Nathan: That's right, yeah.

Erik: Let's see. So, you mentioned earlier that you don't touch clinical trials. Why not? What are the unique complications around that? Is it just not interesting as a business problem? What's the reason?

Nathan: I'd say, it's just a very different area. Our specialty has been saying, "Hey, you have all these lab instruments. You have all these statistics that you're doing. We can bring them all into one place to let you write code on top of them and automate them." Clinical trials, there's a whole different raft of software there. It's much more about doctor-patient interactions. It's also much more regulated. We don't want to touch any data that has human identifiable information in it. In terms of what our platform is meant for, our platform is meant for just process operational data. And so, it's a different regulatory regime. It's a very different data problem, I'd say.

There are actually a much more clear set of tools that you're using in these clinical trials. You know the different acronyms for the five, seven different types of applications you're using. It's more like the data is incredibly messy within them. It's more of a data cleaning problem oftentimes, we find. There are other companies that are focused on data cleaning, and we'll let them do their thing. We're more focused on data structuring and analysis structuring, which suits the wet lab and physical processes better.

Erik: Let's try to make this as tangible as possible. I think the best way to do that is to walk through an end-to-end case — from what was the challenge that they had to deploying the solution. You could maybe blend two or three cases together as well, if that helps. But I think it would also be interesting to understand what are the KPIs or the specific impacts that people are looking to realize. Is it short in a certain process, time or improving quality control to a certain measure? What are the benefits, the objective benefits that are aimed at here?

Nathan: Yeah, I'll give you two cases: one in higher scale manufacturing, and then one on more R&D side of the business. On the manufacturing side, we have a partnership with a company called Apprentice. Apprentice is a manufacturing execution system for pharma. They themselves are a startup, much bigger than us. They've raised over $100 million. They're fantastic to work with. What they do is, through their MES, they have an application that technicians on the factory floor in pharma manufacturing are using to describe their processes, to capture data, have their electronic batch records, and so on. Because of that, Apprentice does have a pretty big desire to get more natively integrated with a lot of the instruments and machinery that people are using on the factory floors. Because this pharma manufacturing is incredibly locked down and regulated, so it's very high value to the clients and to the IT and compliance folks in those spaces to not just automate data entry for speeding up the technicians, although that's a big plus, but also for compliance reasons. Being able to record data and have certainty that whatever people did was actually true is quite important.

They came to us, Apprentice, and said, "Hey, we have a whole bunch of clients that could really benefit from having scales, laboratory balances integrated more deeply." In the pharma manufacturing process, when you're manufacturing a drug, one of the first steps oftentimes is to just weigh out different components of the drug or the pill that you're going to prepare — the active pharmaceutical ingredients, the matrix that those are going to sit in, whatever it is.

One of the problems that they face in that space is that they'll establish tolerances of, okay, how much do you need to weigh out of this? You may say, okay, I can weigh out 50 to 55 milligrams, because the recipe says 52 milligrams. So, my tolerance is 50 to 55. If it's above that or below that, I'm going to have to reject the batch and throw it out. What ends up happening is that people will weigh out 56 milligrams because they're not being careful. They'll say, "You know what? Actually, it's good enough. I'll call it 55. So, it's intolerances. I can continue with my day." That's a huge problem. That's because you have people recording data manually. It's not coming directly off of anything. They're just looking at the screen, of a scale, and writing down the number.

For them, what we did was we said, okay, let's get these scales more natively integrated. How do scales integrate? For the most part, they have no PC. They have no network component. They just have a USB port, or an Ethernet port, or an RS-232 port. And so, we use our edge devices, connect with them. The edge devices, rather than having a parser or a business logic layer for standardizing the data on the edge device, they actually just take the signals coming off the USB port, or RS-232, or Ethernet, and send them up to the cloud verbatim. Equivalently, they'll take signals from the cloud and put them back into the scale, verbatim, to control the scale.

I'd say, this is where our approach helps a lot. Because what that means is that these scales which all have super different, there's very little — it's not OPC. It's literally strings coming over the wire. You'll have something like w19G, which means we weighed 19 grams, or T for tier. Every scale has completely different implementations. They're poorly documented. Some of the documentation is just wrong, especially when you're going across many multiple manufacturers.

So, that's where our cloud-based approach is very effective. Because we were able to say, hey, we'll just get all the strings off the scale, and then process them with a cloud-based driver in real time. Then put them into a web API for the apprentice application to consume, to read from and write to. Through that, we're able to, very efficiently and very quickly, write new drivers for new scale types.

Versus a lot of the manufacturers have different — they have some connectivity layers that they've built already. For instance, Mettler Toledo, a big scale manufacturer, has what is called LabX. That is a data layer for their scales, but it's very specific. It's very hard to set up. It's very filled with semantics of how Mettler Toledo thinks about the world. Whereas Apprentice's experience and, by extension, Apprentice's clients experience with Ganymede is that we're effectively a magic wand, where when you say I point out at scale, it doesn't matter what manufacturer it is. As long as it has some connector, we'll hook up the edge device. We'll just say, hey, we see all the strings coming off. We can look up the documentation, and then create this mapping very quickly into what these strings mean. It's all purely web-based. You don't need anyone. Once the thing is connected on-prem, you don't need anyone there anymore. You can just start coding on the web from anywhere in the world and get it done very quickly.

For a manufacturing execution system and for these regulated manufacturing clients, I think that's pretty transformative. Because they're able to really radically accelerate this from something that would take months to integrate scales and might not be worth it, to something that now takes days or weeks per scale type. Similarly, I think the same holds true in scientific wet labs, where the problem is a little bit different but the same value prop holds of saying, it's very important for me here. Less compliance, more accelerating my scientists because they spent so much time doing data entry today. But it's the same thing.

Again, if I have all these lab instruments I bought, the lab instruments just will emit a file onto their OPC that's attached to them. That's it. They're done. People end up having to open up those files locally and do some analysis in Excel or import the files into some software to do some analysis, and then do that analysis and put it into an app like a Benchling, or a manufacturing execution system, or LES, or whatever it may be. There's this very consistent flow, which is very error-prone and very manual data intensive of saying, hey, I got to get this file. I got to manually do an analysis. Then I got to put it by hand into some app or something.

We, I think, have a much better approach. We're saying, okay, we're going to hook up to the instrument. We'll get the file, just host it raw in Ganymede, and then automatically have the parser built out in the software layer. Automatically, do the analysis. Then automatically, put that final result into where it needs to go. For the scientists, it's the same value prop in a way, more focused on saving time than compliance here. But they speed up by probably 10% to 20%, I would say. They spend hours and hours and hours every week doing data entry. It just vanishes. So, it really radically accelerates people. In science, it reduces the cycle times of how long it takes to do an experiment and then see the data come in to debug it and decide what your next experiment will be. Also, in an environment like this, it helps save headcount. Because you don't need as many scientists to run the same lab processes.

Erik: Okay. Great. It sounds like a very high-value solution in both cases, and somewhat intuitive that this is a solution that the market needs. You already mentioned that people are basically asking you, when can I buy it? I just had a look at some of your figures. You're relatively a young company, 10, 11 months old. You've raised something like USD$15 million. Obviously, investors are quite confident that there's a business here. What's the answer to the question, why now? This feels like a problem that needs to be solved. Why haven't Mettler Toledo, why hasn't X large equipment manufacturer that's been selling into these companies for the past decade, why hasn't this basically been a solved problem?

Nathan: We've definitely grown super, super quickly, so far. I think it is because there's such a need here, and it's so unsolved. It's a good question. I would say, there's probably two big reasons that this is fairly unsolved. One is that, the way that people have approached this is always saying it's a very local solution or very app-like or internal company platform-like solution that they end up developing. Say, Mettler Toledo, for instance. They're great, but they don't specialize in software. As you said, manufacturers are often incentivized to try and reach these local optima, where they're only offering a solution that connects to their machinery. And so, that undermines the value prop of people being able to connect to everything. They end up having to have a dozen different connection layers for the dozen different instrument manufacturers and apps that they have involved across their business.

Oftentimes, people sometimes do have these things already installed. Then we end up being the connection layer between all these connection layers, just because they're all so fragmented and disconnected from each other. Then the other piece, I'd say, is that there's very little true software engineering in the space. It's a very hard space to be a software engineer in. Because people want to build apps oftentimes, and you can't really build apps here. People want to build no-code solutions, which are very effective in other industries. But you can't really do that here. The traditional paradigm of what software engineers do doesn't work that well.

The only thing that we observe that really works is developer platforms and cloud infrastructure at a very raw level, which is, I would say, a very extreme type of software engineering — which you don't get much of in bio because there's very few cloud infrastructure engineers paying attention to this space for good reason, traditionally. But I think finally now, for us, we're really bringing in a ton of software engineering talent, a ton of cloud infrastructure engineering talent, and really bringing a sledgehammer into the space in a way that hasn't happened before. I'd say, that's the other aspect. We're able to build at a much deeper level and leverage a lot more deep software engineering talent then exist within the industry today.

Erik: Fascinating. Well, obviously, you're a sharp guy. It looks like you have a great founding team. It sounds like you've really found the right time and the right problem as well. If we look forward next 12 months, 24 months, what is on the horizon for you? What are the problems that you need to tackle in the next couple development cycles?

Nathan: 12 to 24 months is a very long time in startup land, I'll say. We have a lot of plans during that time. Where we are right now as a company, I think, within the year here, in 2022, we've built out a lot in terms of our core back end. We have the rails and the capabilities at a back-end level to do really any implementation, to get any instrument or device or app wired up. Now, from here, where we're working at the moment is productizing more of a web app to be able to, instead of just interacting with Ganymede's systems over a command line, actually have a graphical interface for that. Then the next step, going into next year, is to say, okay, this web app is now available for people to self-serve, log into.

I think, as much as we're doing a lot of big traditional enterprise sales, we're very, very focused on saying that we want to create a tool that's very good as a self-service tool for developers at these biotechs or at these pharma companies so they can use Ganymede without ever talking to us. That's a huge advantage for them. Because then, they can skip the whole enterprise sales cycle. They can move much faster. They don't have to talk to us. Although, we will always have big enterprise deals where we're very deeply involved, it also helps us because we're making sure that the platform is very robust and very modular enough that someone could self-serve use it. That also helps us internally. Because our internal developers will also benefit from that. We see them somewhat as equivalent to the external developers.

Long way to say, I think that self-service capabilities are a big one for us. The ability to go onto our website and actually just open up and spin up again a new environment from nothing is big for us. It's very difficult for, I think, a lot of companies in the space to have solutions like this to do because they are not focused on cloud infrastructure. The idea of automatically spinning up an entire cloud infrastructure environment just through the click of a button is very daunting. But that's where we've come from. Natively, that's in our DNA. So, we want to lean into that, really leverage that, and go further from there.

Looking further into 2023, I think a couple other themes that we'll be touching on are, one, right now, Ganymede is a very cohesive, integrated platform where you can do a lot of things. But we want to start decoupling that and making it more modular in saying, hey, if you just want to use the Ganymede's database because it's a great database for manufacturing for the life sciences, here it is. Go crazy with it on its own. If you just want to use our computing layer because it's very well set up, or if you just want to use our agents and install them yourself, you can use them in isolation. We'll start making things more open source, open core, so that you can download and use Ganymede within your systems.

We'll never, I think, do fully on-prem because we see the future as being in the cloud. But we're working towards people being able to self-host Ganymede in their cloud. Since that's oftentimes a big demand from customers, they've invested in these big cloud layers that they've built out internally as their private clouds. Why not be able to run Ganymede in that environment?

Then I think, longer term, where we're really focused is although we work very much with the biotechs and the pharmaceutical companies who are getting the direct value out of the automation, long term, if they're more self-service, we actually want to mostly be focused on the things that are integrating into this. We want to be able to say things like, hey, anytime that you get a new instrument from a manufacturer that we're partnered with and you turn it on, it's going to create a new Ganymede cloud of, say, my instrument.client.com. You'll just be able to go into there and see all the data there on the web. No more having to go into files on the local PC. Everything just writes its data directly to the cloud natively. Same for apps. We want to build the right infrastructure and connectivity layer to have all these different things and the bio lab talk to each other at that infrastructure level, and really start building out a better layer for how things connect. That's our long-term goal.

I think looking out at where the industry has gone, there has been so many failed attempts here to try and build better centralized layers for the lab. A lot of those have failed because people try to weave in the scientific semantics to try and make the analysis itself just work out of the box. Our take is that, it'll never work out of the box. It's always going to need — the business logic will always need to be defined de novo. But what you can do is provide the right infrastructure layer for the data to be available. And so, that's what we're obsessed with. I think that's a lot of what we'll start focusing on in 2024. It's very strong tooling for instrument manufacturers, for apps, et cetera to be able to couple Ganymede very deeply as their cloud layer that represents the cloud presence of this on-prem machine.

Erik: Awesome. Well, I'm going to have to invite you back on maybe in 12 months. You are right, 12 months is a lot of time in the startup world. You, guys, clearly move fast. I've worked on a fair number of corporate innovation projects, where 12 months is just long enough to have some internal alignment on, what direction you want to move in. Nathan, great business that you're building. Thank you for walking us through it today. Really, I would love to chat in 12 months, 18 months, and see where you are there.

Nathan: Yeah, I would love to check in. Some things will still be in the same sales pipeline, exactly as you said. The corporate innovation processes 12, 18 months, it'll be exactly where it is. Then hopefully, our engineering team can have built out the entire universe in the meantime. So, we'll see how far we can get.

Overview

EP 163 - How is the cloud transforming lab operations - Nathan Clark, Founder, Ganymede

Transcript