Progress Case Studies Providing Scalability and Speed for a National Treasure

Edit This Case Study Record

	Providing Scalability and Speed for a National Treasure Progress

Providing Scalability and Speed for a National Treasure

Progress

Technology Category	Analytics & Modeling - Real Time Analytics Platform as a Service (PaaS) - Data Management Platforms
Applicable Industries	Education
Use Cases	Search & Rescue
Services	Data Science Services System Integration
Challenge	The U.S. National Archives and University of Virginia (U.Va.) Press faced the challenge of transforming Founders Online, a scholarly tool traditionally accessed by a limited number of researchers, into a national resource capable of serving the public at large. The original platform was not designed to handle concurrent users at scale. The old platform's performance under load deteriorated quickly, and testing suggested the architecture would only support 100 concurrent users. The design imperative for the original system was to preserve the look and feel of print volumes, which had a minimal impact on smaller files, but longer, outlying collections stressed the system. The old platform recreated each search from scratch, putting an unnecessary burden on computing resources while slowing the system down. To make matters more complex, the organization had the equivalent of just 1.5 full-time programmers to devote to the project. Read More
About Customer	Founders Online is a free online tool commissioned by the U.S. National Archives and implemented by University of Virginia (U.Va.) Press that lets the public access the papers of six of America's Founding Fathers: Thomas Jefferson, Benjamin Franklin, George Washington, James Madison, John Adams and Alexander Hamilton. Funded by the National Historical Publications and Records Commission of the National Archives, Founders Online grew out of 50 years of scholarly efforts and gives unique insight into some of the brightest minds of the Age of Enlightenment. The website provides searchable access to over 150,000 documents, a number that's projected to grow to 175,000. Read More
Solution	MarkLogic, the only Enterprise NoSQL database platform, was used to build content systems that can manage billions of data points quickly. It allows enterprises to populate data in whatever form they have, not just the inflexible rows and columns of traditional structured databases. Using MarkLogic's native search, navigation and rendering capabilities, U.Va. Press didn't have to rebuild from the ground up to rescale its existing platform. It simply started thinking about queries in the aggregate, instead of on a document-by-document basis. For example, whereas a traditional structured database crawls through millions of rows and columns, MarkLogic uses data mapping to locate relevant documents quickly. The customer created a static index of previous search results, and prepopulated it before public launch with the most common search terms and links. Now, when a known search term is entered into the system, it simply serves up those existing results, instead of crawling through each stored document again. And the more people who use the system, the richer that stored search cache becomes. Read More Log in to view content
Contents

Technology Category

Analytics & Modeling - Real Time Analytics

Platform as a Service (PaaS) - Data Management Platforms

Applicable Industries

Education

Use Cases

Search & Rescue

Services

Data Science Services

System Integration

Challenge

The U.S. National Archives and University of Virginia (U.Va.) Press faced the challenge of transforming Founders Online, a scholarly tool traditionally accessed by a limited number of researchers, into a national resource capable of serving the public at large. The original platform was not designed to handle concurrent users at scale. The old platform's performance under load deteriorated quickly, and testing suggested the architecture would only support 100 concurrent users. The design imperative for the original system was to preserve the look and feel of print volumes, which had a minimal impact on smaller files, but longer, outlying collections stressed the system. The old platform recreated each search from scratch, putting an unnecessary burden on computing resources while slowing the system down. To make matters more complex, the organization had the equivalent of just 1.5 full-time programmers to devote to the project.

About Customer

Founders Online is a free online tool commissioned by the U.S. National Archives and implemented by University of Virginia (U.Va.) Press that lets the public access the papers of six of America's Founding Fathers: Thomas Jefferson, Benjamin Franklin, George Washington, James Madison, John Adams and Alexander Hamilton. Funded by the National Historical Publications and Records Commission of the National Archives, Founders Online grew out of 50 years of scholarly efforts and gives unique insight into some of the brightest minds of the Age of Enlightenment. The website provides searchable access to over 150,000 documents, a number that's projected to grow to 175,000.

Solution

MarkLogic, the only Enterprise NoSQL database platform, was used to build content systems that can manage billions of data points quickly. It allows enterprises to populate data in whatever form they have, not just the inflexible rows and columns of traditional structured databases. Using MarkLogic's native search, navigation and rendering capabilities, U.Va. Press didn't have to rebuild from the ground up to rescale its existing platform. It simply started thinking about queries in the aggregate, instead of on a document-by-document basis. For example, whereas a traditional structured database crawls through millions of rows and columns, MarkLogic uses data mapping to locate relevant documents quickly. The customer created a static index of previous search results, and prepopulated it before public launch with the most common search terms and links. Now, when a known search term is entered into the system, it simply serves up those existing results, instead of crawling through each stored document again. And the more people who use the system, the richer that stored search cache becomes.

Impact #1	Sub-second search: The result for users of the site - the public - is a quick, Google-like search experience for a remarkable collection of documents written over 200 years ago.
Impact #2	Leverage existing IT resources: By using data mapping to look at aggregate groups of documents, the customer was able to avoid irrelevant results and information bottlenecks to return results quickly and accurately. It also built on previous programming, using existing switches to duplicate processes already in place.
Impact #3	Better data insight: The customer created a static index of previous search results, and prepopulated it before public launch with the most common search terms and links. Now, when a known search term is entered into the system, it simply serves up those existing results, instead of crawling through each stored document again. And the more people who use the system, the richer that stored search cache becomes.

Impact #1

Sub-second search: The result for users of the site - the public - is a quick, Google-like search experience for a remarkable collection of documents written over 200 years ago.

Impact #2

Leverage existing IT resources: By using data mapping to look at aggregate groups of documents, the customer was able to avoid irrelevant results and information bottlenecks to return results quickly and accurately. It also built on previous programming, using existing switches to duplicate processes already in place.

Impact #3

Better data insight: The customer created a static index of previous search results, and prepopulated it before public launch with the most common search terms and links. Now, when a known search term is entered into the system, it simply serves up those existing results, instead of crawling through each stored document again. And the more people who use the system, the richer that stored search cache becomes.

Benefit #1	Cut the response time for a large, 90-page document from 19 seconds to just 1.86 milliseconds.
Benefit #2	When concurrent load increased to 5,000 users - or 50x projected capacity during initial testing - average response was still just 120 milliseconds.

Benefit #1

Cut the response time for a large, 90-page document from 19 seconds to just 1.86 milliseconds.

Benefit #2

When concurrent load increased to 5,000 users - or 50x projected capacity during initial testing - average response was still just 120 milliseconds.

Download PDF Version

Overview

Providing Scalability and Speed for a National Treasure

Operational Impact

Quantitative Benefit