Case Studies Improving home insurance pricing with synthetic geolocation data
Edit This Case Study Record

Improving home insurance pricing with synthetic geolocation data

Analytics & Modeling - Data-as-a-Service
Analytics & Modeling - Predictive Analytics
Application Infrastructure & Middleware - Data Exchange & Integration
Finance & Insurance
Business Operation
Quality Assurance
Regulatory Compliance Monitoring
Data Science Services
System Integration
Home insurance pricing was a risky business for our client. The insurance company catered to homes across the United States in areas with vastly different climate features and risk profiles. CCPA and HIPPA forbade the data science team to use the customers’ personal data, such as their addresses, in their modeling, so they could not assess risk and reflect that in their pricing.
Read More
The customer is a large insurance company operating across the United States, providing home insurance to a diverse range of clients. The company faces the challenge of pricing insurance policies accurately due to the varying climate features and risk profiles of different regions. They are also bound by strict regulations such as CCPA and HIPPA, which prevent them from using personal data like customer addresses in their risk assessment models. This limitation has made it difficult for the company to accurately assess risk and set appropriate pricing for their insurance policies.
Read More
The insurance company served modeling teams with synthetic geolocation data. The team could use synthetic home addresses to look up five climate features, such as fire and flood hazards, in public databases. The pricing model trained on synthetic data scored as good as the model trained on real data. Using synthetic home addresses eliminated the risk of re-identification and unlocked new insights. The team established a synthetization framework tailored to modeling based on privacy-risk classification and shortened time-to-data from 6 months to 3 days. The process kept 100% utility of the data, perfectly retaining the statistical dispersion of the original and providing an as-good-as real data alternative for training.
Read More
Using synthetic home addresses eliminated the risk of re-identification and unlocked new insights.
The team established a synthetization framework tailored to modeling based on privacy-risk classification.
The time-to-data was significantly shortened from 6 months to 3 days.
15M synthetic home addresses generated.
60x shorter time-to-data.
100% utility of the data retained.
Download PDF Version
test test