Data warehousing was an elective I took in my second to last semester. For the much of the course we focused on creating a data warehouse on MongoDB and used python to carry out the Extract, Transform and Load (ETL) process on data. Various other actions were also later performed such as creating a star schema, slicing and dicing attributes etc.

The end of the semester project was theoretical, focused on the subject and its components as given below.

A 5 to 6 page research work on the following topic:
“What possible factors can change the dynamics of Data warehousing if we replace the SQL infrastructure with NoSQL.”
Abstract
Intro: DWH, NoSQL, SQL Affecting Factors
Summarization and Tabluation
Result
Conclusion
Research reference

As a case study was required to show how and why a company might transition from SQL to NoSQL, I, after considerable research chose Netflix’s journey from using Oracle to Cassandra. You can read the abstract of the paper below, for the full pdf version, please click here.

Abstract – With the advent of IoT technologies, more and more connected services and personalized features use user data that has become key to providing insights to consumer needs. This data, being exponentially generated must be stored for analysis somewhere. Traditional data warehouses with their relational databases and structure are no longer up to the task of storing such variable data, nor are they easily scalable. In this research paper a case study of an organization taking a leap from SQL to a NoSQL based data warehouse is presented and the factors that influenced this decision are outlined.

Index Terms – SQL, NoSQL, Data warehouse, Cassandra, Netflix