Case Study: Automates Data Warehouse Infrastructure
Cornell University was founded in 1865 as a privately sponsored research university. 14 colleges and schools to make up Cornell that serve approximately 22,000 students and the ranking is in the top one percent of universities in the world. (data science in Malaysia)
Chris Stewart, VP and general manager, USA at WhereScape, and Jeff Christen, data warehousing manager at Cornell University and adjunct faculty in Information Science, spoke with DATAVERSITY® about how Cornell dealt with the end-of-life for the primary product they used to manage their data warehouse.
The Main Problem (data science in Malaysia)
Cornell was transforming and merging data into an Oracle Data Warehouse with Cognos Data Manager. IBM bought Data Manager and chose to discontinue support for it. “Unfortunately, Data Manager had millions of lines of code written in it, so we had to look for a successor,” Christen explained. He saw it as a chance to offer additional functions to improve the efficiency of their data warehouse.
The Evaluation (data science in Malaysia)
“Our prior tool would merely log it if there was an issue,” Christen explained, “but then we couldn’t load the warehouse because some network glitch that perhaps took seconds was enough to wipe out our nightly ETL processing.”
Documentation that was out of date was also a concern. Stewart joked with his customers about having to document a data warehouse. “There are two kinds of documentation: nonexistent and incorrect documentation. People laugh, but no one ever disputes that argument since it’s the one thing no one wants to do. ” Stewart explained.
Cornell had to consider licence and employee costs because it is an academic institution. Stewart frequently sees this in government and higher education companies where the administration has growing data demands but a tiny pool of accessible individuals, such as Christen’s four-person team.
Automation, according to Stewart, may alleviate much of that workload, allowing employees to accomplish more in less time. “You can’t just add two more individuals on the spot.” “You need to get more out of your existing personnel if you have additional work,” Stewart explained.
Identifying a Solution (data science in Malaysia)
Christen began looking for ETL tools with the goal of making some enhancements. When he was researching vendors, he looked for documentation, licence fees, improved performance, and the ability to work within existing workforce levels. Christen attended the Higher Education Data Warehousing conference in 2014 to learn more about her possibilities.
WhereScape was one of the conference’s exhibitors, and one of the elements that drew his interest was its documentation strategy. “We use our consumers to obsolete and insufficient documentation, and WhereScape obviously had a grip on that,” he said.
Because Cornell’s huge data warehouse system was scaled for end-user query performance, most of the solutions Cornell investigated required CPU licencing, which could be prohibitively expensive.
“We’ve got a lot of CPUs,” Christen explained. The expenses of CPU-based licencing would be enormous, therefore they had to figure out how to re-architect the entire system to lower the CPU footprint enough for the licencing to function, a process that would impose further constraints. Because WhereScape’s licence model is a developer seat licence, they only needed to purchase four named user licences with four full-time warehouse developers.
“With WhereScape, there’s no separate licencing for the CPU run-time environment, so if we’re successful, we’ll convert everything,” Christen explained. “However, there’s no penalty for how we build the warehouse for end-user performance or query performance.”
Company And Automation
It was a clear advantage to be able to integrate and use the product without having to hire more developers. “That’s been a big motivator for companies looking towards automation for their staff,” Stewart remarked.
Cornell didn’t make their decision solely on the basis of marketing materials. One of their developers worked with the product on a section of their principal general ledger model in an on-site proof of concept. They found WhereScape to be straightforward enough that one of their ETL developers was able to design a parallel environment in the proof of concept with minimal WhereScape support. The fact that the developer had received no formal training demonstrated that the learning curve would be reasonable.
They were able to make a practically apples-to-apples comparison with the proof of concept, which revealed “dramatic improvements” in load time performance when compared to Data Manager. “It was a robust tool, but it was also intuitive enough to be learned in a few weeks,” Christen explained.
What is WhereScape?
WhereScape assists IT businesses of all sizes in automating the design, development, deployment, and operation of data infrastructure.
“We learned a long time ago that there are patterns in data warehousing that really transcend any industry area or any corporate size,” Stewart explained.
WhereScape automates both the design and modelling of the data warehouse, all the way through to the physical build, because the process of building a data warehouse out is mainly mechanical, and most of it is common among data warehousing organisations.
“Even deployments, as you move a project from development to QA, and then to production, we’re scripting all of that out as well,” Stewart explained. These are all tasks that corporations often address with several tools — a resource-intensive approach that can result in a silo for each tool.
“We have a single tool suite that covers data warehousing from start to finish, and it’s simply one set of tools to master,” Stewart explained. Instead of licencing multiple tools for each aspect of establishing a data warehouse, finding a location to install them all, and spending weeks on staff training and management, teams may learn and utilise only one product. Giving the build to WhereScape’s automated process frees up time and energy, allowing the company to focus on using the data and producing relevant analytics.
Source: data science course malaysia , data science in malaysia