Simple Power Query functions to troubleshoot join results


When creating the relationship between a fact and dimension in Power BI, a many-to-many relationship warning was triggered, to my surprise. I expected the key in DimProduct, ProductID, to be unique, but had introduced multiple rows per Product ID value somewhere along the way in my transformations.

 

DimProduct was created by joining six tables together -- the source Product table load followed by five merges (joins) of other tables. Which step was causing the issue of increasing my row count? Which table had a granularity I misunderstood? This could be a lot to unravel.


The data preview in Power Query for each query step can be helpful, but is based on a limited number of rows (1000). The only option for changing this is to use all rows, which slows every query preview to a crawl.

Fortunately, two simple M functions employed in basic, reusable queries enabled me to quickly and systematically debug the transformations, without leaving the Power Query interface. Here I'll review those functions and the specifics of how I debugged this join scenario.

Table.Profile

Table.Profile returns field-level aggregates with summary metadata about a target query. This is useful when trying to understand the general properties of a data source, whether fields are fully distinct, whether they contain null values that should be accounted for or replaced, etc. This function provides comprehensive results from the source, while you can never be totally confident based on the field profile in a 1000-row preview .

To generate the output above, from Power Query, click New Source, then Blank Query. In the formula bar at the top, type (including the equals sign):

= Table.Profile(#"DimProduct")

Rename the query to something more descriptive, like Table Profile, then right-click to disable the load, so the results do not persist in the data model.

For both this query and the next, you can reuse it over and over again during your development. Just enter a different target query name and rerun it.

Table.RowCount

Table.RowCount returns the number of non-distinct rows resulting from a target query. This information is also available as the "Count" column from the Table Profile query results above, but that profile can take some time to run on large datasets, and row counts alone are sometimes sufficient to answer basic questions, particularly in my accidental many-to many situation.


To generate the output above, from Power Query, click New Source, then Blank Query. In the formula bar at the top, type (including the equals sign):

= Table.RowCount(#"DimProduct")

Follow the same instructions from above to rename as Table RowCount and disable the load.

A variation you may sometimes wish to use is to find the number of distinct rows in a target query. Here's is the syntax:

= Table.RowCount(Table.Distinct(#"DimProduct"))

How I used the functions to debug where extra rows originated

My goal was to isolate which of the five merges (joins) created additional rows. To do this, I checked the row count after each merge, starting from the end and deleting merges successively until the resulting row count went down to the expected number (one per distinct ProductID). When it did, I knew I had found the problematic join and source table to research.
  1. Run Table Profile against DimProduct, just to confirm the many-to-many. Sure enough, there were more rows in DimProduct than there were distinct values in ProductID. Darn.
  2. Run Table RowCount against the source Product table -- the first table used in building DimProduct. ProductID was fully distinct in this table, and it had fewer rows than the DimProduct table I created. Now I confirmed it was not a source data issue and was instead something I introduced.
  3. Duplicate the DimProduct query (right-click query name, then Duplicate), so I could freely delete steps from the copy while debugging without having to worry about losing my original work. Rename query as DimProductDebug.
  4. Starting from the end of the DimProductDebug query, delete all query steps from the last merge through the end (right-click final merge step, then Delete Until End). Run Table RowCount against DimProductDebug. Has the resulting row count changed -- decreased back to the row count of the source Product table?
  5. If the row count is still too high, repeat the process. The merge that was deleted was not the culprit. Starting from the end of the DimProductDebug query, delete all query steps from the new final merge through end, then recheck the DimProductDebug row count.
  6. Once the problem merge that created rows is identified, run the Table Profile query against the table that was joined in that step, to better understand its grain and inform how it should be joined.

Other quality and profiling ideas in Power Query

These are simple applications of the functions, but they get the job done. And there is no overhead from saving these disabled queries off in their own query group, to use as needed.
 
Here are some other data quality and profiling ideas, along the lines of the row counts and table profiling above:
  • Field profile, a slimmed down version of Table Profile, run against only one field (with less overhead)
  • Field distribution, because the quality/distribution data built into Power Query is based on just the first 1000 rows 
  • Relationship profile to determine the distinctness of keys and integrity of a relationship before you build it in the model view or start using it to drive visualizations
  • Query compare, to ensure results of two queries are the same, ex., after rewriting a query to optimize performance, you would want to ensure you got the same results (spoiler: solution here)
  • Regression queries specific to your Dataset to rerun each time you make changes, and ensure the results have not unexpectedly changed, or to compare data in the source to data in the Dataset
For more advanced Power BI developers and enterprises, packaging these queries as custom functions (with enhancements, ex., including non-distinct and distinct row counts in the results of the same function) and including them in your template Report file will be a boon to developers.

Contact Form

Name

Email *

Message *