Amazon Web Services Case Studies Zillow Provides Near-Real-Time Home-Value Estimates Using Amazon Kinesis
Edit This Case Study Record
Amazon Web Services Logo

Zillow Provides Near-Real-Time Home-Value Estimates Using Amazon Kinesis

Amazon Web Services
Analytics & Modeling - Real Time Analytics
Infrastructure as a Service (IaaS) - Cloud Computing
Platform as a Service (PaaS) - Data Management Platforms
Business Operation
Sales & Marketing
Predictive Quality Analytics
Real-Time Location System (RTLS)
Remote Asset Management
Cloud Planning, Design & Implementation Services
Data Science Services
Zillow Group, the owner and operator of the largest online real-estate and home-related brands, was struggling to provide timely and accurate home valuations, known as Zestimates, for all new homes. The company's in-house machine-learning framework, which ran on-premise to process vertically scaling workloads, was unable to scale fast enough to meet the growing amount of data and the increasing complexity of machine-learning models for accurate Zestimates. The company specifically sought a distributed platform, which would enable the fast creation and execution of massively parallel machine-learning jobs. The existing technology was taking too long to compute Zestimates, sometimes more than a day, which meant that customers weren’t getting updated information fast enough.
Read More
Zillow Group owns and operates a portfolio of the largest online real-estate and home-related brands, including the Zillow website. Tens of millions of users search Zillow daily for information about 110 million homes and apartments across the U.S. The most popular feature of the Zillow website is the Zestimate—a home-valuation tool that provides buyers and sellers with the estimated market value for a specific home. Zillow currently offers Zestimates for more than 100 million homes in the U.S., with hundreds of attributes for each property. The company uses a wide variety of public-record data—including tax assessments, sales transactions, images of homes, MLS listing data, and other information provided by homeowners—as inputs to its Zestimate algorithm.
Read More
Zillow decided to expand its use of Amazon Web Services (AWS) to solve the scalability and performance problems it faced with the Zestimate tool. Zillow chose to run Apache Spark on Amazon Elastic MapReduce (Amazon EMR). By running Zillow’s machine-learning algorithms using Spark on Amazon EMR, Zillow can quickly create scalable Spark clusters and use Spark’s distributed processing capabilities to process large data sets in near real time, create features, and train and score millions of machine learning models. Zillow uses Amazon Kinesis Streams to ingest a variety of data, including public-property records, home tax assessments, sales transactions, images and video, MLS-listing data, and user-provided information. All this data is ingested and pushed into Spark on Amazon EMR, which runs machine-learning models and gives users near-real-time Zestimates.
Read More
Zillow can execute massively parallel machine-learning jobs on a distributed platform, enabling it to run distributed machine learning across multiple nodes to calculate Zestimates.
Zillow can compute Zestimates faster and more frequently, because Amazon Kinesis Streams and Spark on Amazon EMR enable near-real-time data processing.
Zillow does not have to be concerned with managing and scaling a fleet of servers for ingesting real-time streaming data.
Zillow can compute Zestimates in seconds, as opposed to hours.
Zillow manages petabytes of data in its Amazon S3 data lake.
Download PDF Version
test test