Qlik Developer Primer for What's Different in Power BI: Data Modeling (Part 2/2)

Generated using Microsoft Designer (first attempt only)

In this second part of our Qlik developer primer series, we'll dive into the data model behavior and storage mode differences between Qlik and Power BI. This post will explore how data models behave in Power BI, storage modes, and other assorted, noteworthy things to be aware of. 

Here was part 1.

How the Data Model Behaves

Key field identities

In Qlik, associated fields are treated as one field in the interface. You can't distinguish between the values of a field coming from one table and the values of the same field in a related table -- or many related tables, when several tables have the same key.

Many Country fields as one, from https://help.qlik.com

In Power BI, related fields remain distinct, enabling more granular validation.
Field references in DAX are qualified, which means they include the table name, too. This means in Power BI, a calendar table related to both a sales table and a quota table can retain unique key field names (e.g., Order Date, Quota Date). Formulas explicitly refer to the date values from one specific table, such as performing a distinct count of the customer key in a sales fact table to identify customers who have made purchases.

Relationship directions

In Qlik, all fields filter one another continuously, no matter where they are in the data model, as long as there is a path to get there via associations. It's efficient at exercising these relationships, and they enable the green/white/gray that Qlik is known for. In my opinion, this is also usually helpful because it makes for predictable behavior to end users.

However, this can lead to ambiguity in some Qlik data models. For example, in a data model with a link table, quota fact, and actual fact, selecting a dimension related only to the actual fact will also filter the quota data and metrics, but in a way that is unclear to end users.

In Power BI, filtering follows the direction of relationships (arrows) and stops. In a star schema with all dimensional relationships pointing inward toward the fact table -- most common, and a best practice -- filtering in a dimension table will not affect values in other dimension tables. In charts, this is not noticeable, but in the filter values themselves, it is. In the report filters, filtering to one customer will not narrow down the list of products in the product filter to only those the selected customer purchased.

Join directions, from https://blogs.perficient.com

Using bidirectional relationships in Power BI is possible but generally not recommended due to performance considerations, as it generates more queries when filters are applied—queries that filter the filters, so to speak.

Field and relationship metadata

The table view in Qlik provides information at a glance regarding field distinctness and values and table row counts, which can be helpful for debugging.

In Power BI, the main metadata available at a glance in the table view is relationship cardinality. However, third party tools like DAX Studio make other metadata -- richer than what is available in Qlik -- readily available, including in remote models. Also, Power BI really forces you to recognize the cardinality, whereas I frequently see unintended (and previously undiscovered) many-to-many relationships.

Resilience to poor data quality, unintended relationships, or nonsensical visuals

What I found is that data models are pretty resilient in both platforms, and calculation results are the same between Qlik and Power BI, even for data models with link tables or directly relating two fact tables with a many-to-many relationship. You could say it's a good thing or a bad thing, but a terribly designed data model, not understanding your source data, or nonsensical combinations of dimensions will still give consistent results in UI calculations, according to my testing.

Scenarios I tested include:

  • Relating facts tables directly (e.g., Order Header and Order Line Item), aggregating facts from both tables
  • Relating facts tables through a link table with many-to-many relationships, aggregating facts from both tables
  • Grouping a fact by unrelated dimensions (e.g., Quota by Salesperson and Product, when Quota is only defined at the Salesperson level)
  • Handling many-to-many relationships between facts

Power BI, like Qlik, avoids overcounting facts when relating facts to facts. Also like Qlik, Power BI provides no feedback for nonsensical combinations, often displaying the same number repeatedly across rows. This resilience means that, while the models might perform poorly, they still function correctly. Of course you have the option of modeling this data differently, which I recommend.

Data storage modes

In Qlik, the only robust storage mode is in memory. Their secret sauce relies on having the data in memory. On-Demand App Generation (ODAG) is not a seamless user experience and is really just another way of getting data in memory on demand. Direct Query is emerging and is not compatible with many of the features that make Qlik unique.

Power BI offers a variety of storage modes, and they are more seamlessly integrated, i.e., users don't know where the data resides or where compute is happening. In-memory ("Imported") data can be combined with live SQL ("Direct Query") data in the same semantic model (a composite model), with tables switching modes (dual storage mode) automatically for the most efficient performance. Tables can even be partitioned into live and import partitions, AKA hybrid tables. And DirectLake combines some features of Import and Direct Query models.

Truly, the number of options is even more complicated than that, once you consider that you can Direct Query connect to Dataflows with Enhanced Compute enabled or even other Semantic Models. The downside is that it can be complicated to decide which approach to use, but Import mode is typically the default unless the use case specifically dictates otherwise.

Incremental refresh

Qlik supports any kind of incremental refresh you want, as long as you write the load script to do it. This also lends itself well to creating extract frameworks to efficiently stage large amounts of data for little effort, for experienced developers.

Power BI supports incremental refresh but lacks an upsert feature to efficiently stitch together only changed data. Its incremental refresh is more of a trailing n-period refresh, without many efficiencies when only a few rows have changed. Personally, I also find the configuration a little hokey in Semantic Models, although it's much better in Gen 1 Dataflows.

Aggregate awareness, AKA user-defined aggregations

This feature is pretty awesome. You can create a hierarchy of increasingly aggregated versions of the same fact table in the Semantic Model, then leave Power BI to automatically figure out the smallest version of the table it can use to run a DAX query. This video from Havens Consulting does a good job demonstrating it -- including using this in combination with a composite model to come up with a way to mash up import performance most of the time with the ability to hit the database only when needed.

Island tables

In Power BI, island tables are still often employed, although the term itself is more common among Qlik developers. Similar to Qlik, island tables in Power BI are used to store picklists, metadata, as a workaround due to lack of Qlik's "alternate states", and to store Field Parameters.

Conclusion

One of the first things I teach people about Power BI who have backgrounds in Qlik is to unlearn concatenated fact and link table data models, because there are superior options in Power BI. I would say "options" are the theme of what's different about Power BI. Qlik has fewer options but does them very well, and often in a simple, predictable way. Power BI has lots of options, which are good when you need them, but can lead to complexity, more to learn, and more decisions to make.

Contact Form

Name

Email *

Message *