With Power BI, as is the case with other BI products, you can load your data and start analysing it. The data can come from almost any possible source: relational databases, excel spreadsheet, online services, text files and what not. Furthermore, the data can come from more than one source, allowing you to use Power BI as an integration tool.
I’m pretty vocal against using Power BI, or any other BI platform to integrate data from different sources. I’ve talked about it a little bit here.
The gist is that it’s better to integrate your sources into a data warehouse or data marts,and connect Power BI to them. That way they are independent of trends in the BI space, and can serve as a single source of truth, rather than repeating the logic in each and every Power BI dataset. You can also query the data mart directly to answer ad hoc questions.
But I digress a little bit as the topic of this article is calculated columns and measures.
No matter where you get your data from, Power BI allows you to create calculated columns and measures, using the DAX language. I’ll explain what they are, what the difference is between calculated columns and calculated measures and when, and to what capacity they should be used.
A calculated column is a formula you apply on an existing column(or columns) to create a new one. For example, you can use a DAX function to extract year from a date field. When you do that, the table you’ve loaded will have an additional column, and for each row, the new column will be calculated as a result of data in that row.
For example, if a date field in row 1 has the value of 2010-01-01, the calculated year column will hold the value 2010, and if the value of the date field in row 2 is 2018-07-01, the year column of that row will be 2018. It is evaluated for each row, at load time. When you refresh data in your model, the calculated column is populated and stored as part of the model. From that point on, it is used like any other column. Since it is stored, it takes up space.
Calculated measures, on the other hand, are just a formula. They don’t take up space and they are evaluated at report run time. A calculated measure does not look at data at the row level, but rather looks at the entire results of the visual.
For example, if you display a table with sales amount per category and your filter is set to show results of 2017, adding an “average number of items sold” calculated measure to the table, will show an average per category, in 2017.
This is a simple average example, but calculated measures can be much stronger than that. Here’s a typical case:
Imagine you create a report for a bank. You want to show, for a specific customer, what is their account activity summary for last week. You’ll show how much money was withdrawn, how much was deposited, and the total fee amount. So far it’s simple, assuming you have some sort of a transaction fact table, you can just aggregate over it.
But…you also want to show the balance at the end of the week. Let’s imagine that you have another table that holds, per customer, their balance at the end of each day. In this case, you cannot just use a SUM function (which will show 7 times the balance), you actually want to display the balance at the end of the week. DAX enables you to write a function that will only show the latest value in the dataset, for that specific customer and that specific week.
This cannot be done in a calculated column, because it is not calculated on a specific row but rather on a result set.
The DAX calculation will look like this:
Latest Balance =
CALCULATE (
SUM ( ‘Daily Balance’[balance]),
FILTER (
‘Daily Balance’,
‘Daily Balance'[DateID] = MAX ( ‘Daily Balance'[DateID] )
)
)
I won’t get into explanations about this calculation in this article, but just wanted to show a more complicated DAX calculation used by a calculated measure.
So – what’s the bottom line – can we use calculated columns and measures?
Well… measures- yes, columns- usually not.
As I explained earlier, I think data modelling should not be done in Power BI but rather in a data mart underneath.
Meaning, if you need a “year” column, create it in your database. Same goes for “net revenue”(sales amount minus cost of sales) or any other calculation that can be done on the row level. It can then be loaded and used in Power BI, and can be reused in many datasets without going into the trouble of creating the calculated column again and again in each dataset.
An exception to this rule is when you do something as a one-off. If I create a model that I’m not going to reuse, and I’m using files(spreadsheets, CSV etc.) as sources (no data mart layer), it makes sense to create calculated columns.
Calculated measures are necessary in almost every model, as demonstrated above. You cannot really avoid them, but you can try and keep them to a minimum. I recommend pre-calculating whatever you can in the database and let calculated measures kick in when you need to work on a result set.
Other examples for calculated measures, from the top of my head: cumulative revenue, category sales amount as percent of total sales, variance of a KPI in comparison to its value the previous year, etc.
Thanks for reading, and if you like it, be sure to subscribe for updates- look for the subscribe box on your right (if you are using a computer) or at the bottom of the page (if you are using a mobile device).
Cheers.