Creating an enterprise BI system is complex, complicated and costly – as we all know. Many projects in this field are not completed as planned and often become irrelevant. In many organizations, despite the multi-million investments made in BI solutions, management would tell you that they have no BI.
Traditional BI calls for use of a data warehouse (DWH), utilizing a relational database (SQL, Oracle, Sybase, MySQL) consisting of tables and indices. It usually has either the “Star” or the “Snowflake” schema (model). These models have been created due to technological restrictions, in order to reduce data volume in tables, decrease index size and improve performance. In actual fact, this results in the ETL process becoming very slow and complex (and, naturally, costly). This is due to the need to scan each row and field of incoming data and to perform a lookup in the appropriate dimension table.
Even if we have managed to overcome this hurdle, there still remains the fact that the aforementioned databases were not designed to handle large volumes (over 1 TB), so it becomes necessary to create hundreds of partitions, aggregation tables, pre-calculated tables, allocation to different databases, OLAP and other workarounds. The cost of licensing and hardware for data warehouse and OLAP is constantly growing, reaching hundreds of thousands of dollars or more. And this is BEFORE we have considered such issues as clustering, replication, scalability and high availability.
Now we need to implement a BI tool, which is supposed to interface with all the data warehouse / OLAP setup described above. Good BI tools can generate queries and aggregations on their own – but with a data warehouse as described above, we would be forced to forego this option and, instead, to write and maintain our own queries and/or procedures. Of course, any possibility to utilize ad-hoc, self service and heavy reports (such as bucket analysis, session reports, customer analysis, advanced segmentation and trends) is out of question!
With traditional BI, management must fully specify what, how, when and how much would be viewed. The data warehouse is updated nightly, with complex ETL, staging, complex calculations, aggregation, cube updates, processes – so that (hopefully) by morning we have reports within a reasonable time. And most important and costly is the necessity for full-time DBA and BI staff, tasked with maintenance and ensuring proper performance at all times, monitoring, avoiding reports over-run, alerting, upgrading, replacing hardware and making changes to the model – any minor change becomes its own project on the never ending GANTT chart that is BI.
So what is the alternative? BIG DATA is appropriate selution even for customers with as little as 0.5 TB
Selecting a column-based data warehouse does not require a database holding dozens of TB and is appropriate even for customers with as little as 0.5 TB; moreover, companies such as HP Vertica offer their product to customers at no charge (up to 1 TB of data, 3 servers in grid formation); in a column-based solution, tables may be flattened and the fact dimension can be skipped. There is no need for aggregation, no need for calculated tables, no need for partitions, no need for a full-time DBA, and often, the ETL process becomes as simple as loading files.
The system is designed to load massive volumes of data per minute, which means you don’t have to wait for nightly runs. The system allows for scalability, use of low-cost hardware, continuity, data redundancy, query performance that is up to 1,000 times faster, support for thousands of users concurrently and low-cost disk storage. The ETL process becomes rapid, short, simple and efficient. You can easily modify and add fields or add dimensions. The information in the data warehouse is compressed by a factor of 10 to 60. It is divided among multiple servers thus decreasing the data volume is and making data archiving or deletion unnecessary.
The BI tool connects directly to the database, performing each query as needed – with very rapid turn-around (typically shorter than one second). It is important that you select a BI tool which can work with a column-based database and can perform simple, efficient queries. (The best in this field is MicroStrategy ).
What about dimensions? This is simple – use a DISTINCT operator on the flattened table for response times of one hundredth of a second; if absolutely required, we can generate a view for the dimension – so you won’t ever miss it. This is of no importance for the BI tool, there is no need for cubes, you may use roll-up to perform OLAP queries against the database. For column-based systems, you do not need to pre-specify all indexes and most queries would be rapidly processed.
Of course it becomes possible (and encouraged!) to run any type of query or business enquiry, including ad-hoc, self service BI, drill session analysis or anything the manager’s / user’s heart desires. There is no need for an army of DBA and BI staff – as the system is a simple one with no performance issues.
The integration of Vertica and MicroStrategy has a proved track record in Israel and around the world. User satisfaction is very high.
We would be delighted to showcase what may well be your future BI solution!