ClickHouse Case Studies OONI's Transformation: Enhancing Internet Censorship Measurement with ClickHouse
Edit This Case Study Record
ClickHouse Logo

OONI's Transformation: Enhancing Internet Censorship Measurement with ClickHouse

ClickHouse
Application Infrastructure & Middleware - Database Management & Storage
Cybersecurity & Privacy - Database Security
Buildings
Construction & Infrastructure
Testing & Certification
The Open Observatory of Network Interference (OONI) is a non-profit organization that provides free software tools to document internet censorship worldwide. Their tools allow users to test their internet connection quality, detect censorship, and measure network interference. However, OONI faced significant challenges in handling the vast amounts of data generated from these tests. They initially used flat files, MongoDB, and PostgreSQL to store metadata from measurement experiments. As the dataset grew into hundreds of millions of rows, performance issues arose, requiring a shift from an OLTP database to an OLAP one. OONI needed a solution that could simplify their architecture while handling complex data visualizations and enabling searches and aggregations on their 1B+ row dataset.
Read More
The Open Observatory of Network Interference (OONI) is a non-profit free software project that empowers decentralized efforts in documenting internet censorship worldwide. Established over ten years ago, OONI aims to increase transparency about internet censorship. They provide free software tools for users to test their internet connection quality, detect censorship, and measure network interference. OONI collects the data generated by its global network of volunteers, analyzes it, publishes it as open data, and produces research that contextualizes the findings within specific countries, regions, and ongoing social or political events. They also collaborate with 43 partner organizations worldwide to disseminate findings, conduct advocacy, and support policymaking and legal actions.
Read More
OONI adopted ClickHouse as its data storage and analytics engine to handle the large volumes of data. ClickHouse enabled OONI to perform complex queries on the vast amounts of data collected through its network measurement tests. It also supported generating visualizations, which aided in identifying trends and patterns in the data. ClickHouse streamlined OONI's architecture, allowing easier data access for researchers. The raw data (compressed JSON files) is stored on S3, while the metadata is stored in a single large table. This table contains all the relevant metadata for analysis and aggregation, such as country, network, timestamp, target, and the outcome of the analysis. With a size of 1.4 billion records and 32 columns, it is used in many aggregation queries that power the OONI Measurement Aggregation Toolkit (MAT) and their internal data analysis tools.
Read More
The adoption of ClickHouse significantly improved OONI's operations. It simplified their architecture, allowing for easier data access for researchers. The ability to run any query directly on the dataset enabled faster iteration. The efficiency of ClickHouse also significantly improved their investigations, as they could quickly answer questions without having to wait hours for queries to converge. This greatly improved their internal data analysis tasks. Furthermore, ClickHouse enabled them to perform aggregations directly on the dataset without maintaining intermediate counter representations, simplifying the data pipeline. This allowed real-time publication of measurements, aiding human rights defenders in rapidly responding to censorship events around the world.
Significant reduction in query time, with heavy queries that used to take up to 20 minutes on PostgreSQL now taking only hundreds of milliseconds in ClickHouse.
2x reduction in on-disk size for the database.
Ability to handle a dataset of 1.4 billion records and 32 columns.
Download PDF Version
test test