Does the order of fields affect Qllik aggregation/GROUP BY performance?


The short answer is: not really.

It's not an interesting conclusion, but I am documenting the negative result here in case somebody in the future wonders whether this variable is worth exploring.

The slightly longer answer is that ordering the GROUP BY fields in cardinality (number of distinct values) from highest to lowest was on average about 2% faster than lowest to highest in my tests, but that's not worth the time investment to explore as a variable in my work. It's not the difference of getting near real-time data or squeezing a load into a tight window of time.

If you care...

I tested by generating 20 million rows of random data with three dimensions and three measures and storing it to QVD. The dimensions had 3, 100, and 10000 distinct values, respectively (Dim1, Dim2, Dim3). I then loaded the QVD into memory and did a RESIDENT load aggregating all three measures, grouping by all three dimensions, resulting in a table about half of the original size, by row count.

The two scenarios for that RESIDENT load were listing the fields in the GROUP BY in:
  • Dim1, Dim2, Dim3 order (3, 100, 10000 distinct values): averaged 2:35
  • Dim3, Dim2, Dim1 order (10000, 100, 3 distinct values): averaged 2:32
So ordering from most to least distinct helped a tiny bit in this case, I guess.

Let me know if you find something different.

Contact Form

Name

Email *

Message *