Video: Having big data is not enough: Tips to turn it into a business advantage
Kaplan Test Prep[1] is well known for helping students prepare for college-entrance exams, such as the SAT and ACT; post-grad admissions tests, such as the GRE and GMAT; and licensure exams for medical, legal, nursing, financial, and other professional careers.
Unfortunately, the company wasn't making the grade when it came to using all available information for data-driven decision-making.
Founded in 1938, Kaplan has decades of historical data, scores of legacy systems and diverse applications. From 2013 to 2015, it made a methodical move to a virtual private network and cloud-based application stack on Amazon Web Services [2](AWS), an effort that helped Kaplan modernize infrastructure and consolidate from 12 data centers down to four. But from an analytical perspective, Kaplan continued to rely on siloed tools and reporting capabilities. It lacked a centralized store where it could consolidate and analyze data from many data sources.
Read also: Cloud computing: AWS bumps up its datacenter capacity, again[3]
"We had one, small [Microsoft SQL Server] data warehouse that was ingesting data from just two systems; that's it," says Tapan Parekh, director of analytics and data architecture. "It wasn't a complete view of data, and nobody was happy."
When he joined Kaplan in November 2015, Parekh immediately began developing an architecture for an analytical data platform. Given that the majority of data sources were now running on AWS, Parekh was considering Amazon Redshift[4], the vendor's columnar database service. His biggest challenge was figuring out how to get data into Redshift.
"We have many different applications using different underlying databases and technologies," says Parekh. "We