Case Studies Improving home insurance pricing with synthetic geolocation data

Edit This Case Study Record

	Improving home insurance pricing with synthetic geolocation data

Improving home insurance pricing with synthetic geolocation data

Technology Category	Analytics & Modeling - Data-as-a-Service Analytics & Modeling - Predictive Analytics Application Infrastructure & Middleware - Data Exchange & Integration
Applicable Industries	Finance & Insurance
Applicable Functions	Business Operation Quality Assurance
Use Cases	Regulatory Compliance Monitoring
Services	Data Science Services System Integration
Challenge	Home insurance pricing was a risky business for our client. The insurance company catered to homes across the United States in areas with vastly different climate features and risk profiles. CCPA and HIPPA forbade the data science team to use the customers’ personal data, such as their addresses, in their modeling, so they could not assess risk and reflect that in their pricing. Read More
About Customer	The customer is a large insurance company operating across the United States, providing home insurance to a diverse range of clients. The company faces the challenge of pricing insurance policies accurately due to the varying climate features and risk profiles of different regions. They are also bound by strict regulations such as CCPA and HIPPA, which prevent them from using personal data like customer addresses in their risk assessment models. This limitation has made it difficult for the company to accurately assess risk and set appropriate pricing for their insurance policies. Read More
Solution	The insurance company served modeling teams with synthetic geolocation data. The team could use synthetic home addresses to look up five climate features, such as fire and flood hazards, in public databases. The pricing model trained on synthetic data scored as good as the model trained on real data. Using synthetic home addresses eliminated the risk of re-identification and unlocked new insights. The team established a synthetization framework tailored to modeling based on privacy-risk classification and shortened time-to-data from 6 months to 3 days. The process kept 100% utility of the data, perfectly retaining the statistical dispersion of the original and providing an as-good-as real data alternative for training. Read More Log in to view content
Contents

Technology Category

Analytics & Modeling - Data-as-a-Service

Analytics & Modeling - Predictive Analytics

Application Infrastructure & Middleware - Data Exchange & Integration

Applicable Industries

Finance & Insurance

Applicable Functions

Business Operation

Quality Assurance

Use Cases

Regulatory Compliance Monitoring

Services

Data Science Services

System Integration

Challenge

Home insurance pricing was a risky business for our client. The insurance company catered to homes across the United States in areas with vastly different climate features and risk profiles. CCPA and HIPPA forbade the data science team to use the customers’ personal data, such as their addresses, in their modeling, so they could not assess risk and reflect that in their pricing.

About Customer

The customer is a large insurance company operating across the United States, providing home insurance to a diverse range of clients. The company faces the challenge of pricing insurance policies accurately due to the varying climate features and risk profiles of different regions. They are also bound by strict regulations such as CCPA and HIPPA, which prevent them from using personal data like customer addresses in their risk assessment models. This limitation has made it difficult for the company to accurately assess risk and set appropriate pricing for their insurance policies.

Solution

The insurance company served modeling teams with synthetic geolocation data. The team could use synthetic home addresses to look up five climate features, such as fire and flood hazards, in public databases. The pricing model trained on synthetic data scored as good as the model trained on real data. Using synthetic home addresses eliminated the risk of re-identification and unlocked new insights. The team established a synthetization framework tailored to modeling based on privacy-risk classification and shortened time-to-data from 6 months to 3 days. The process kept 100% utility of the data, perfectly retaining the statistical dispersion of the original and providing an as-good-as real data alternative for training.

Impact #1	Using synthetic home addresses eliminated the risk of re-identification and unlocked new insights.
Impact #2	The team established a synthetization framework tailored to modeling based on privacy-risk classification.
Impact #3	The time-to-data was significantly shortened from 6 months to 3 days.

Impact #1

Using synthetic home addresses eliminated the risk of re-identification and unlocked new insights.

Impact #2

The team established a synthetization framework tailored to modeling based on privacy-risk classification.

Impact #3

The time-to-data was significantly shortened from 6 months to 3 days.

Benefit #1	15M synthetic home addresses generated.
Benefit #2	60x shorter time-to-data.
Benefit #3	100% utility of the data retained.

Benefit #1

15M synthetic home addresses generated.

Benefit #2

60x shorter time-to-data.

Benefit #3

100% utility of the data retained.

Download PDF Version

Overview

Improving home insurance pricing with synthetic geolocation data

Operational Impact

Quantitative Benefit