Long gone are the days of pricey, cumbersome BI visualisation products, which took months or years to integrate, required strong servers and produced mediocre visuals. I’ve worked with Oracle Discoverer in 1998, Business Objects in 2002, SSRS in 2007, Cognos in 2008 and OBIEE in 2011. They were all really powerful for their time, but they usually required a well designed Data Warehouse underneath.
Beyond the license cost, which was usually high, you would have to hire at least one developer to create the reports, and even if you had the budget, Hiring was often a challenge. In 2009 it took me three month to hire a Cognos expert and then when I wanted to grow the team, I just couldn’t find additional developers.
Enter self serve BI, with Qlik, Tableau, and the recent rising star, Power BI, to name a few. Creating reports is suddenly easy and quick so you don’t need to hire developers. You don’t need a data warehouse anymore because you can extract data from your operational sources to the tool and create the relationships there. Most of them offer cloud-based hosting so there’s no need for fancy servers anymore. And the best part – license cost is usually much, much lower, with Power BI setting the bar very low, at 10$ per pro user per month.
Sounds good, right? Well, it IS really good. But it can be really bad. Here’s why, and how to make it really good.
Which version of the truth would you like today sir?
One of the most important rules of a reliable BI system, is “one version of the truth” or “one source of the truth”. In the good old, pricey and slow days, the organisation would run its reports over a data warehouse. The ETL processes brought data from operational source, put it in one place, and that one place, with its calculated measures and KPIs, was the single source of the truth. Yes, it took months or years to build it. Yes, it took days for a developer to add a report, but at least, “Total Sales Previous Week” showed the same figure, in all reports and dashboards.
With self serve BI, the temptation to just use whatever source you want, can cause Rob to use his own excel sheet while Rachel uses hers. Rob’s excel sheet is something he maintains manually, while Rachel downloads hers once a day from the sales system. They both use a self serve BI tool to create eye catching dashboards, and they both publish them easily and share with their peers. And those peers are impressed with the donut charts and the cool date slider and the perfect interactions between the visuals on the dashboard, but then they realise that Rob’s numbers are slightly different than Rachel’s, and there’s also a new report from Craig while Sarah is working on her own version and… none of these cool looking dashboards are now reliable.
And even if their figures did match, Rob may have used a different way to segment the sales data from Rachel, because Rob likes to merge small towns into regions, but Rachel’s doesn’t.
2. This looks awesome, can you add one more field?
When it’s so easy and quick to create dashboards and they get published and shared quickly, there’s a temptation to just run and do it, without planning the data structure. Mapping Power BI or Tableau to a few tables in your operational database or to csv files or spreadsheets can be done so easily, that organisations may just skip the data architecture step. A cool new dashboard is ready, and then the customer services manager asks a simple question – “This looks awesome, can you add a new field, showing how many phone calls we had with the customer”? You may have that number somewhere, but you may not. And if you do, you might have it segmented by sales person, dates, products and sales regions, but you also might not. If you skipped the part where you investigate what’s needed and create a suitable database and processes for it, don’t be surprised if the shiny quickly built report ends up being not good enough.
3. No need for dedicated, strong servers, right?
Right. Kind of.
If everyone creates their own reports, and they all refresh their data once a day or once an hour from the operational databases,you’ll end up demanding more and more resources from these databases. The old-school data warehouse wouldn’t do that, because once a day or once an hour, data would be collected into the warehouse, and that data would serve all reports from there, not from the operational databases.
So – what’s the solution?
In my opinion, data governance is the key.
Don’t centralize your reports, centralize your data.
Self serve BI is here to stay and for a good reason. It’s absolutely great. We cannot and don’t want to go back to the old breed of BI solutions and all of the bad scenarios I described can be avoided if you still centralise your data and control it.
Let Rob, Rachel and Sarah create their own reports if they wish to. They will enjoy it and they will be more likely to use their reports, than those that someone from IT, who they have never met, created for them.
But give Rob, Rachel and Sarah a good dataset to work with. Have a strong data team. Have a business analyst constantly work with the organisation on what kind of data they need. Have a data architect and database developers create modern data marts or data lakes in an agile fashion, for their users to use. It doesn’t have to be a huge data warehouse which by the time you finish building it, the data requirements have changed twice. Small data marts can do the job and a good architect will be able to design them quickly and reuse them. If the organisation is small, the analyst, architect and developer may be an all in one “data guy”(my favourite role). It doesn’t have to be pricey, but it has to be in control of professionals.
Please let me know what you think in the comments or IM me. Happy to help on the matter.
1 Comment. Leave new
Interesting, thought-provoking and very relevant