Friday, June 20, 2014

Traditional Data Warehouse being squeezed



When the Data Warehouse was introduced, it brought a simple and consistent dimensional design. Reporting tools benefited from the simplicity and consistency of the underlying database structures and prospered. Operational systems were happy to specialize in the transactional area and shred other responsibilities somewhere else. So any representation that did not fit on a single screen was delegated either to a dimensional database provided by the vendor, or to a stand-alone data warehouse.
The analytical groups grudgingly accepted the Data Warehouse. The dimensional design did not match the statistical packages internal structures. But there were no alternatives for getting the data. And additional benefit was that the data was already cleansed and verified.
The landscape starts to change with the proliferation of the in-memory databases and Big Data solutions.

The in-memory database technology improves performance. And that spare database speed can be used to solve new problems or to allow for less optimal designs for existing problems. As such it tolerates to report directly from operational data structures not optimized for reporting. Nowadays the ERP systems, such as SAP HANA, can take on more reporting. And if ERP is the main data source for a Data Warehouse, the remaining sources can be easily absorbed by the ERP system and eliminate the Data Warehouse completely.
Though the Big Data is just a vague marketing term, it involves a lot of important database innovation. A distributed file system that underlies Hadoop that can be had inexpensively. Soon it will start to divert data and resources from the Data Warehouse. Of course, it is not a zero-sum game; the Big Data reservoirs will also get new data sources: device-generated, medical records, social network, etc. However it will be a loss for the DW.  
So the traditional Data Warehouse will be squeezed from two directions: less data extracted from operational databases, and more data diverted to the Big Data reservoirs instead of DW.