Qlik developer primer for what's different in Power BI: Power Query (part 1/2)

M[mm]...

Most Business Intelligence platforms can solve similar problems, but how they do it is different. It's usually a mistake to try to implement a solution the exact same way as you would in another platform, just because that is what is familiar to you.

Several consultants at Axis have picked up Power BI as a second or third BI platform as we have added more Power BI customers over the last several years. We have a Power BI onboarding path like we have for all major platforms, but we don't have an introduction to orient experienced consultants based on the deltas and new concepts compared to existing expertise in Qlik, the platform most commonly known among consultants in my practice. This kind of primer would be more ideal to pre-empt dead ends and eliminate the ambiguity when you learn any new platform -- "Is this not possible or have I just not figured out how to do it yet?"

  • How similar are the platforms?
  • What are the big mindset shifts?
  • When does what work happen, and where?
  • How do you take advantage of what makes it unique?

I will attempt do that here, starting with Power Query, the interface and engine for getting data into a Power BI Dataset (data model), the equivalent of which is the Qlik load script.

I also hope this will be relevant to people who only know Qlik, so they are aware of some of the details of how other prominent technologies in the market differ. (Power Query, in particular, is hiding in Excel, even if you don't use Power BI.) I find this to be a frequent area where I add can add value as a consultant to customers who own every major platform out there.

General concepts

  • Power Query creates the tables that make up your data model. Each enabled query will generate one table in the data model with its results. Accordingly: if you don't need a query's results to persist as a table in the data model, disable the query. This is like dropping a resident table in Qlik.
  • Building the relationships between tables is a separate step done outside of Power Query, whereas the data model builds its own relationships in Qlik based on like field names.
  • Power Query is accessible through Power BI Desktop only. You cannot build a Power BI Dataset in your browser/natively in the service, although you can build Reports there using published Datasets. This has many implications regarding sensitive or voluminous data transferring to your laptop and the RAM/processing power of your machine as a constraint, driving various development best practices, like pushing work to the database with query folding and temporarily filtering data during development. Powerful, remote VMs are also an option, particularly if they enable you to do content creation in the Azure cloud when the data source is also in Azure.
  • Also implied by the last point is that by default you will be managing versions of your work on your desktop, which may not be accessible to others, between publishings. You can save without data by saving as a .pbit (template) file.
  • In Power Query, you can create parameters, which are functionally like hardcoded Qlik variables (unfortunately not dynamic). and functions, which can be like Qlik subroutines or parameterized variables. Individual M queries also leverage variables throughout.
  • There is no simple way to centralize code for reuse across many Datasets, like inheriting parameters and functions defined at the environment level, as is a common practice in Qlik, using included code files. The scope of parameters and functions is the queries in your immediate Power BI Dataset.
  • You can't store data out of Power Query to validate or reuse elsewhere (or as a data dump for users) or maintain data snapshots not available in the data source. There are no layers of Power Query outputs that efficiently stage and then transform data. More than one layer is a use case for Dataflows. Validation is still easily done in Power Query using queries, even if you can't store those queries out to something else.
  • There's no undo button or keyboard shortcut. Save often. :)

Transformation steps

  • Power Query has wizards to do many common transformations that shield you from learning M code, but you will almost certainly have to stray outside the wizards, even on your first project. By comparison, Qlik has only few wizards, but Qlik load script is much more SQL-like and less intimidating to learn.
    • Sometimes wizards generate inefficient code that you can simplify or improve, like calculating days between dates in a way that does not break query folding, using Number.From() on the dates prior to subtracting.
    • Even when wizards don't exist to do exactly what you want, it is often helpful to use a wizard to generate code similar to what you are trying to do, then manually change it, rather than start from scratch.
    • There are some things you will need to do that have no wizard at all, like doing a join between a fact table and a slowly changing dimension that maintains history, returning time of transaction dimensional data. In Qlik, you would use the IntervalMatch load prefix, but you will have to Google an example and write some freehand M to accomplish this in Power Query.
  • A preview of transformation results is shown immediately as steps are added to a query, returning limited results (think: SELECT TOP n). This is better than demoting an "exit script" over and over, in my experience. Immediate feedback can be helpful for data exploration and unit testing, but has other implications, too. For example, if one of your transformation steps requires aggregating the data (GROUP BY), that step will be done in its entirety to update the preview, even if only the first 1000 resulting rows are displayed in the preview. In these cases, each subsequent step and updated preview may cause a long wait. One strategy for this is to temporarily filter the data during development in an early, foldable query step, like limiting to a short time period or list of just a few customer IDs, then open it up later to validate the full results. In Qlik I rarely found this necessary, based on a combination of factors: prototyping expressions in the UI before adding them to the script because the back end/front end share a language, reloads that are generally fast enough to run in their entirety, and checking my results using a combination of the data model viewer and simple UI constructs.
  • There is no penalty for selecting a whole table in the first step of a query if only a subset of the columns is used in later steps. Having all columns as a reference can be helpful for browsing your source data, and Power Query is smart enough to disregard unnecessary overhead when it runs the ultimate queries. In Qlik, you want to minimize carrying unnecessary columns throughout, to minimize RAM consumption and speed things up.
  • Power Query calls a Join a "Merge" and calls Unioning "Combining". Qlik invented its own unnecessary term for Unioning.

  • Learn the difference between the Transform and Add Column menus to keep your queries lean, for efficiency/maintenance and sanity. Transform changes a column's values, in place, without creating a new column. Add Column will create a new column but is often also coupled with a transformation of an existing column. Use the latter when you want to keep the original column. (Similar shortcuts exist under both menus.) For example: You have a field with email addresses and want to get just the domains from the emails.


    • To add a new email domain column while keeping the original email column, under the Add Column menu, click Extract, Text After Delimiter. Edit the generated M code to assign the desired field name when it is created. (1 query step)
    • To transform the column in place and get rid of everything before the "@",  under the Transform menu, click Extract, Text After Delimiter. Rename the field. (2 query steps)
      • What not do: duplicate the email column (Add Column), extract the domain (Transform), rename the new column, remove the original column. (4 query steps)
  • When using DirectQuery (dynamic SQL generation), the supported Power Query transformations are very limited. That's a topic for another post, though.
  • Like a database, Power BI cares more about data types than Qlik and won't load unexpected values, like a text value in a field that has been defined as numeric. The proper data types are often inferred, but you must stay on top of them, to optimize storage and performance.
  • If something can't be computed (ex., divide by zero, data type conflicts), you often must write additional logic to specifically catch and handle those scenarios to avoid getting errors in your results. Qlik just defaults these to null and moves on with its life.

In the next post, I will pick this up with queries, M language, query folding, and refreshing data.

Contact Form

Name

Email *

Message *