Case Studies
-
(5,794)
- (2,602)
- (1,765)
- (764)
- View all
-
(5,073)
- (2,519)
- (1,260)
- (761)
- View all
-
(4,407)
- (1,774)
- (1,292)
- (480)
- View all
-
(4,157)
- (2,048)
- (1,256)
- (926)
- View all
-
(2,488)
- (1,262)
- (472)
- (342)
- View all
- View all 15 Technologies
ANDOR
- (1,732)
- (1,626)
- (1,605)
- (1,460)
- (1,423)
- View all 42 Industries
ANDOR
- (5,781)
- (4,113)
- (3,091)
- (2,780)
- (2,671)
- View all 13 Functional Areas
ANDOR
- (2,568)
- (2,482)
- (1,866)
- (1,561)
- (1,537)
- View all 127 Use Cases
ANDOR
- (10,333)
- (3,499)
- (3,391)
- (2,981)
- (2,593)
- View all 9 Services
ANDOR
- (503)
- (432)
- (382)
- (301)
- (246)
- View all 737 Suppliers
ANDOR
Please feel encouraged to schedule a call with us:
Schedule a Call
Or directly send us an email:
Compare
|
Prevent Future Technical Issues by Centralizing Alerts, Events, and Metrics
CircleCI's team was using several patched-together monitoring tools. As CircleCI's application infrastructure scaled, it became tedious to track the health and performance of their servers, databases, and other IT components as they had to spend hours every week manually correlating the outputs of their existing monitoring solutions. The final straw occurred when CircleCI missed an outage that should have been caught early by its monitoring system. Lowe knew then that he had “hit the limit with [their] tools” and needed to implement a more effective and sensitive monitoring solution that would scale automatically with CircleCI’s growth.
|
Download PDF
|
|
|
Gaining Infrastructure-Wide Visibility in a Private Openstack Cloud
Revinate, a company providing software services to hotels, was facing challenges with its growing customer base. The company's configuration had grown quickly to 25 physical servers supporting 400 virtual instances. The company's Software-as-a-Service (SaaS) offering is hosted in a Rackspace Private Cloud (RPC) environment that utilizes the OpenStack architecture. However, the RPC offering lacked the robust management capabilities needed for efficient operation. The available data came only infrequently in the form of a weekly email, forcing the company to track any trends manually. The company needed a tool that would enable them to instrument the entire stack from top-to-bottom, including the applications, all underlying services, the virtual instances and hypervisor, and the physical server resources.
|
Download PDF
|
|
|
Improving Staff Productivity by Providing Developers with a Workflow-Oriented Operational Monitoring System
As SimpleReach’s platform grew, teams began spending more time tracking and comparing performance metrics during infrastructural updates. Their existing open source monitoring tools created a disconnect between development and operations teams, making it difficult to assess the performance implications of frequent changes in the production environment. The underlying problem was a familiar one: a disconnect between development and operations. “The developers didn’t realize how the changes they were making were affecting the production environment,” Lubow recalls. “Some of the impacts were significant, and the need for frequent changes in both application and system software was making the situation untenable.”
|
Download PDF
|
|
|
Easy-to-Use, Scalable Monitoring with Minimal Maintenance
Vistar Media, a company providing a programmatic platform for the digital out-of-home advertising industry, was facing a challenge as its infrastructure grew. They needed a powerful way to monitor their real-time service but didn't want to spend time managing their own monitoring solution. They were looking for a scalable, low-overhead solution that would integrate with their existing services and databases, such as Python, StatsD, Amazon EC2, and Amazon S3. This would allow their teams to focus on serving clients. In their previous setup, they had relied heavily on the open-source project Graphite and had more than 12 servers solely dedicated to collecting metrics for monitoring. As the company grew, they had to expend an increasing amount of time to make sure that the monitoring servers scaled at the same rate as the company.
|
Download PDF
|
|
|
Powering the Growth of a High-Volume Video Advertising Platform
SpringServe, a video ad server platform, was facing challenges with its self-hosted monitoring stack which could not correlate metrics between systems, making it difficult to identify and resolve issues before they directly impacted the customer experience. As SpringServe's infrastructure was becoming more dynamic and distributed to provide better service around the world, significant blind spots hampered those efforts. They could not track application performance across regions, nor could they correlate metrics between systems to uncover the source of issues. SpringServe needed a reliable, real-time monitoring solution that could keep pace with its auto-scaling infrastructure, allow them to adopt innovative technologies, and keep growing quickly—without sacrificing the speed or consistency their customers depend on.
|
Download PDF
|
|
|
Cloud Evolution
As Nordcloud expanded its reach throughout Europe, they faced the challenge of developing their next generation cloud services. They recognized that some needs weren’t easily or efficiently tackled in-house. Nordcloud sought Independent Software Vendors (ISVs) with a natural alignment to fill specific gaps in their offerings. They needed a partner to help extend their cloud solutions services, particularly in the area of monitoring. Monitoring a cloud solution is the final stage of a managed service environment and is key to reducing costs and SLA failures. Nordcloud needed to address situations where clients have multitenant environments or need more visibility than open source tools or legacy network monitoring tools could provide.
|
Download PDF
|
|
|
E-Commerce Platform Increases Resilience at Scale with Datadog and AWS
Neto was looking to move its existing legacy infrastructure to the cloud in order to drive automation and support their customers’ growth. However, their existing monitoring tools were unable to scale dynamically and could not track services across ephemeral infrastructure components. This posed a challenge as they needed a monitoring solution that could provide real-time visibility across a highly-automated environment. Prior to moving to the Amazon public cloud (AWS), maintaining and scaling Neto’s legacy infrastructure was slow, reactive, and prone to technical difficulties. Neto’s infrastructure environments often drifted out of sync, making it hard to increase capacity or deploy changes to production without engaging in manual, time-consuming processes.
|
Download PDF
|
|
|
Improving Application Performance and DevOps Collaboration with a Unified Monitoring Platform
HashiCorp’s self-hosted monitoring tools had poor usability, which led to a lack of visibility into their systems. This left engineers without quick feedback on new product features and ill-equipped to effectively troubleshoot issues. The limited access to real-time monitoring and alerting hindered the team’s responses to issues, causing unnecessary delays in incident diagnosis and resolution. The lack of visibility was attributed to the poor usability of the self-hosted monitoring tools that HashiCorp was using at the time, which left engineers ill-equipped to effectively troubleshoot issues or get real-time feedback on new product features. The limited access to real-time monitoring and alerting hindered the team’s responses to issues, causing unnecessary delays in problem diagnosis and resolution. Without the ability to track and compare current and historical states, troubleshooting became a reactive, time-consuming, and tedious task.
|
Download PDF
|
|
|
Ensuring Cloud Reliability with Infrastructure Monitoring
CMD Solutions required visibility into cloud workloads and infrastructure so they could facilitate safe and reliable cloud migrations for their customers. They also needed to be able to discern between critical signals and false alarms in order to mitigate any performance or availability issues that might occur in their customers' dynamic environments. CMD must maintain a high-level view of all of their customers’ environments at once. But with all customers requiring high-touch service simultaneously, CMD needed tools and processes that would allow them to migrate and support customer workloads without engaging in time-consuming, manual work that could take attention away from critical system issues.
|
Download PDF
|
|
|
How Zendesk Enables Greater Developer Productivity with AWS and Datadog
Zendesk was transitioning to a highly dynamic, container-based environment and needed a robust monitoring solution that integrated with AWS and Kubernetes. Their existing monitoring tools created silos between teams and required manual correlation of metrics, traces, and logs, which made it difficult to resolve issues. To keep up with their customers’ evolving needs, Zendesk’s developers need the freedom to build new features quickly. Historically, Zendesk had used a monolithic, on-premises architecture for its production workloads, while its nonproduction workloads ran on Amazon Web Services (AWS). This setup created a lot of friction for their developers and made it difficult to scale.
|
Download PDF
|
|
|
Ensuring a Highly Available and Highly Scalable Platform
Braze, a cloud-based company that develops customer relationship management software, was facing challenges in scaling their systems and resolving customer support tickets. The company processes more than 8 billion API requests and sends more than 2.7 billion daily messages to a network of over 2.4 billion monthly active users. With a growing engineering organization, teams had different techniques and tools for evaluating the performance of their applications and scaling their systems. The organization needed a uniform way of determining whether provisioned infrastructure was appropriate for the traffic, forecasting future needs, and investigating performance-related issues. Additionally, many technical customer support tickets were escalated directly to the Product and DevOps engineering teams, effectively bypassing the Global Services and Support team when the questions were about performance, uptime, or throughput. This led to distractions for engineering and deprived Support and Success of the ability to quickly resolve customer tickets.
|
Download PDF
|
|
|
Provide a Flexible Solution to Suit a Service-Based Architecture and Scale With a Rapidly Growing Business
Airbnb, a leading community-driven hospitality company, faced the challenge of maintaining the reliability of their services while adapting quickly to new business opportunities. They developed a service-based architecture for some components of the site, while other components continued to be part of their main application. Separate engineering teams were created to support the separate components and features. Over time, they added many different systems for monitoring, some reporting to the central dashboard application, others being more standalone. This approach became difficult to scale, leading them to look for a comprehensive and more holistic operations performance solution.
|
Download PDF
|
|
|
Growing a Global Company
Fintonic, a personal financial planning and mobile banking service, was experiencing rapid growth and expansion into new markets. They adopted a microservices-based architecture driven with Kubernetes to facilitate their global expansion. However, with their existing monitoring tools, it took time to onboard new engineers as they needed to learn a specialized query language to manually correlate logs across multiple tools. Teams struggled with alert coverage and prioritization since their monitoring tools could not target alerts based on tags. As they grew quickly, onboarding new engineers became difficult with their incumbent monitoring tools. They needed to adopt a stateless architecture with Kubernetes, Terraform, and Ansible to be able to replicate what they had built for Spain in Chile and Mexico.
|
Download PDF
|
|
|
Game Server Monitoring
EA DICE was preparing for the scale of traffic they expected following Battlefield V’s beta launch. They wanted to plan the launch to ensure stability and low latency and be confident that their customers could enjoy the new game with no interruptions. The game server team was continually on the lookout for a central log management solution to complement their infrastructure monitoring with Datadog, and they had evaluated a number of logging solutions. From the outset, they were only interested in a solution that could provide insight into their logs without their team needing to run and maintain the logging system or incur any other overhead. A second requirement was finding a cost-effective logging solution for game server monitoring due to the large volume of logs. Finally, they wanted a logging solution that integrated with everything in their tech stack.
|
Download PDF
|
|
|
Taking Monitoring to the Next Level
Devsisters, a leading mobile gaming company, needed visibility into the health of their applications to meet the demands of their rapidly expanding user base. Their existing tools added complexity that made it increasingly difficult to pinpoint user-facing issues. Additionally, the implementation and integration of these tools into their tech stack required a significant and continual time investment from the engineering team. As Devsisters’ engineering team set out to monitor and ensure the reliability of their cloud-native systems, they initially adopted a handful of open source tools for their perceived low cost. However, implementing and integrating these tools with their tech stack required a significant time investment from the engineering team, both upfront and continually. More importantly, Devsisters realized that these open source tools could not handle the scale and complexity of their modern environments.
|
Download PDF
|
|
|
Detecting Malicious Activity in Real Time
PedidosYa, a member of the Delivery Hero group, faced a challenge when the company introduced free food vouchers for new users. Users were creating several accounts from different IP addresses to receive multiple vouchers, but this behavior was difficult to pinpoint and prevent at scale. The team’s threat detection workflow at the time involved manually creating firewall detection rules for every domain they operate, which was grueling, time-consuming, and required lots of maintenance. As fraudulent activity increased, it became impossible to create individual rules for every IP address that needed to be blocked. This process led to a month-long delay in detection, which gave the malicious actors enough time to achieve their goal.
|
Download PDF
|
|
|
A Proactive Approach to Data-Driven Observability
Compass, a real estate brokerage company, was experiencing exponential growth and needed a sustainable monitoring strategy that could scale along with the engineering organization. They were using a suite of different monitoring products, which meant that engineers typically had to loop through multiple tools to solve one problem. This constant context switching created friction between teams and contributed to engineer burnout. Additionally, their point solution for monitoring the frontend stack introduced excessive administrative overhead due to its poor support for user provisioning and often generated false positives due to poor configuration. As Compass quickly expanded its engineering team, this process became a bottleneck and was no longer acceptable.
|
Download PDF
|
|
|
Scaling a Region’s Leading Live Streaming Service with Confidence
Vidio, a leading video streaming service in Indonesia, needed a monitoring solution to ensure a smooth and latency-free experience for their users. The company required visibility into their dynamic cloud environment, which was crucial for maintaining application uptime and consistent high stream quality. The challenge was to find an intuitive platform that would not add extra overhead to its Engineering and DevOps teams. This was particularly important during live sports events, which can attract an audience of up to eight million viewers.
|
Download PDF
|
|
|
Complete Observability of IoT Systems
Automotus, a curb management company, was facing challenges with their IoT devices and scaling cloud resources. They needed a robust monitoring solution that would provide visibility into their IoT devices, as well as their scaling cloud resources. Their manual and reactive approach to monitoring was proving to be inefficient. They were unable to collect important hardware metrics, such as network throughput, I/O load, and memory, which meant they often missed the first signs of degraded device performance. If their devices stopped sending messages, they were forced to SSH into the system and sort through logs by hand, which was an extremely time-consuming process that required all hands on deck. They also didn't have visibility into the management and backend services that are crucial to their system, such as AWS IoT Core. These problems were compounded by the absence of a centralized platform to view and analyze this data in context. The resulting blind spots stymied their troubleshooting process, leaving them to cross their fingers that nothing would go wrong.
|
Download PDF
|
|
|
Partners Find a New Revenue Stream in the Datadog Marketplace
RapDev, a Boston-based technical consulting company, wanted to expand its integration and implementation service offerings to unlock more revenue growth potential. They saw an opportunity in the Datadog Marketplace to augment Datadog's existing monitoring capabilities by providing support for legacy OSs and internal IT. The challenge was to leverage the Datadog platform to implement projects and transformations at scale, diversify their customer base, and create a new revenue stream.
|
Download PDF
|
|
|
How Simplified Monitoring Helped Fundbox Build a DevOps Culture
Fundbox sought to enhance its DevOps processes by boosting responsiveness to software issues and reducing time spent maintaining its many monitoring tools. The company wanted one holistic solution that could empower developers to quickly spot and fix issues without being weighed down by so many monitoring systems. The complexity of the Fundbox monitoring infrastructure made it hard for DevOps engineers to quickly spot and resolve issues in production. And a few of these individual monitoring tools required extensive DIY customizations and ongoing maintenance. All this extra overhead reduced the time available for engineers to perform core DevOps functions, such as updating Fundbox services.
|
Download PDF
|
|
|
Datadog Helps SNCF Take High-Speed Track to Digital Transformation
SNCF, France’s state-owned railway operator, embarked on a major digital transformation initiative in 2016. The goal was to update its IT infrastructure and improve its competitiveness by migrating 90% of its applications to the cloud and embracing PaaS and containerization. However, SNCF discovered that it had no coordinated approach to monitoring. Business units had been adopting monitoring solutions independently, leading to the company using a total of 11 different monitoring tools. This lack of a single, standard monitoring tool severely restricted the scope of what each team monitored, making it difficult for different IT teams to cooperate on shared problems. This was a clear impediment to the organization’s goal to improve its competitiveness and agility. Additionally, SNCF’s existing monitoring tools weren’t cloud-native, leading to user friction and extra administrative overhead.
|
Download PDF
|
|
|
Reducing Operational Overhead and Building Value through Improved Visibility
As Arc XP grew, operational overhead increased, and it became more challenging for the Arc XP team to keep track of their environment and meet SLOs. The team needed to find a monitoring tool that could work in their complex environment and allow the organization to focus on building value for its customers. The company’s cloud infrastructure was expanding to multiple regions around the globe, and adequate visibility was vital to ensure a high quality experience for its customers. Furthermore, the engineering teams wanted a better tool to help them with technical support and IT operations. Engineers had no universal, proactive alerting system that could inform them as soon as issues arose, or a solution that could help them diagnose issues quickly. These technical obstacles, as significant as they were, ultimately presented the organization with an even more fundamental business challenge. The more time the Arc XP team spent bogged down in operations and support, the less it was using its strengths to pursue its organization’s founding mission.
|
Download PDF
|
|
|
Building a large-scale, highthroughput platform with Datadog APM and Continuous Profiler
Cvent, a market-leading meetings, events, and hospitality SaaS provider, had to pivot their entire product roadmap strategy to focus on building a new solution for virtual, hybrid, and in-person events due to the global COVID-19 pandemic. They planned to launch their new platform, the Cvent Attendee Hub, at Cvent CONNECT, their annual customer conference, and host the event on it. This meant that its performance had to be impeccable, and they only had six months to deliver. Ian Schell, Site Reliability Architect at Cvent, was tasked with ensuring that the Attendee Hub could accommodate the broad reach and increased registration volume of virtual events, which can often exceed that of in-person events. There were many unknowns surrounding usage patterns and scale, because the product was completely new, and the number of participants could be an order of magnitude higher compared to some in-person events.
|
Download PDF
|
|
|
Streaming Live Experiences to Millions, with Confidence
Seven.One Entertainment Group, a leading player in Germany's multi-channel entertainment industry, was facing a highly competitive market with rapidly changing viewer habits. Users were moving away from traditional TV and towards video-on-demand and interactive, second-screen experiences. The company needed to execute with the agility that only DevOps practices could provide. However, the lack of a single monitoring tool that provided visibility over the entire application and enabled engineers to trace requests across services was hindering their DevOps mindset. Each team used its own monitoring solution, so no tool provided visibility over the whole application or enabled engineers to trace requests across services. This lack of adequate monitoring also made it challenging for Seven.One Entertainment Group to deliver live interactive shows, which draw up to 10 million simultaneous viewers online.
|
Download PDF
|
|
|
Ensuring Complete Visibility into Kubernetes Networks and Workloads
Delivery Hero, a leading local delivery platform, experienced a sharp increase in application traffic due to the global pandemic. They leveraged Kubernetes to scale and maintain their containerized environment, but visibility gaps threatened their ability to handle the increased traffic. Their existing open source tools could only monitor one of their clusters at a time, creating critical blind spots during updates or new cluster additions. They also lacked visibility into their DNS services, which are used by Kubernetes for service discovery and communication. This lack of visibility made them vulnerable to potentially large-scale outages and slowed down the issue detection and resolution process.
|
Download PDF
|
|
|
Partnership Brings Joint Success in LATAM
Econocom, a B2B reseller and technology consulting company, needed a monitoring solution to support the modernization and cloud migration projects of their customers. They required a solution that could deploy quickly and easily, allowing them to focus on growing their high-value-add consultative services. Econocom was looking for a product that could cover any customer’s stack and could be implemented quickly to accelerate their deal cycles. They also needed a modern monitoring platform designed to facilitate DevOps-style communication within ephemeral, cloud-based and Kubernetes-based systems.
|
Download PDF
|
|
|
Arc XP secures applications in production with real-time visibility from Datadog
Arc XP wanted to boost its security monitoring capabilities and its defense-in-depth strategy so it could quickly detect and respond to attacks on its web applications and APIs. As an organization with divisions that operate autonomously, Arc XP wanted a single source of truth that could enable more effective collaboration among its distinct teams. In addition, Arc XP needed to detect suspicious behavior in its customers' code. The Arc XP platform allows customers to run their own code inside the Arc XP application, creating a shared security responsibility model with Arc XP responsible for the platform and its customers responsible for its code.
|
Download PDF
|
|
|
Eight Sleep achieves end-to-end observability with Datadog
Eight Sleep, a sleep fitness company based in New York City, was in need of a robust observability solution. The company wanted to better understand how users were experiencing its app and prevent any issues before they occurred. At the time, Eight Sleep used a solution that performed uptime testing for a few public endpoints, but it was too basic and lacked configurability. Engineers often got paged in the middle of the night for what ultimately proved to be false alarms. With a small development team, Eight Sleep needed a tool that could help it accomplish tasks quickly and easily. The company's competitors had three to four times as many engineers, so they needed a tool that could do the job it said it could do with minimum work required on their side.
|
Download PDF
|
|
|
Charm Industrial uses Datadog to access critical data in real time as they reduce the effects of climate change
Charm Industrial’s goal is to reduce the effects of global warming and climate change. Accomplishing that goal will require Charm to sequester gigatons of carbon dioxide (CO₂) from the atmosphere annually using a fleet of fast, mobile pyrolyzers. Charm will eventually operate tens of thousands of pyrolyzers 24/7. For Edward Young, Head of Software and Electronics/Staff Scientist at Charm, this presented a significant challenge. “When you have tens of thousands of systems, you can’t have operators at every single site,” he says. “To scale the business we needed a way to simultaneously monitor numerous systems in real time remotely.” Charm's pyrolyzer systems use high temperatures to decompose agricultural and forest biomass residue and convert it into bio-oil for use in carbon removal. These systems perform various jobs and have demanding safety standards. Each system includes sensors that measure critical data—such as temperature and pressure—to ensure Charm does not exceed safety thresholds. The team needs to monitor all that data in real time.
|
Download PDF
|