In order to test their payment processing application, the QA team at this financial services company determined their data feeds must be simulated in a highly controlled fashion. To reproduce complex transaction data feeds, the team copied a subset of their production data and prepared it for testing. Production data is attractive because it contains real transactions in the proper data interchange format. However, to prepare the data for testing, it had to be laboriously reworked by hand to create the data variations and permutations needed for test cases while removing all sensitive customer and merchant information. It took the QA staff 160 man-hours (an entire man month) to build a test data set. Because the data interchange format was revised every six months, the number of man-hours required for test data provisioning effectively doubles over the course of a year. The tedious nature of the provisioning process placed limits on the variety of test data available for functional, integration and regression testing. And the limits on the volume of data provisioned was impacting their ability to perform the load and performance testing required to simulate heavy transaction loads. In the end, they concluded there were too many problems associated with using production data alone for testing purposes. The following summarizes their rationale. Production data is not controlled data Without manual modification, test data copied from production data can only test for conditions represented by a given data subset. It does not provide the QA team with the necessary data to test edge case conditions, the presence of invalid data values, or specific input value combinations that might uncover software defects. To maximize code coverage under all potential operating conditions, test data must be controlled to simulate data feeds that contain all of the data variations required by each test case and its assertions. Production data is not secure data Business and IT leaders at this financial service company were very concerned about data privacy. The risk of a data breach that might expose sensitive customer credit information was too great when considering the legal and financial consequences. This risk was further compounded by the fact that much of the testing was being performed by offshore contract resources, limiting the internal control over the handling of sensitive customer data. Secure, high volume production test data is not practical Data masking is the conventional approach often used for mitigating the security risks of working with production data. However, masking all of the PII contained in the transaction data feeds used by payment processing systems is a monumental task. Transaction data feeds are complex, nested, fixed file data structures that contain control codes, record types, accumulated transaction values, and calculations for reward points and cash-back incentives along with real card holder and merchant account numbers and credit information. Finding and masking the sensitive information in this complex data stream while preserving the referential integrity of the data values is both daunting and time consuming.
Read More