Ranking within a group (quickly) in Qlik Load Script

A customer recently came to Axis with a problem: They are trying to rank within a group in the script, and it is taking a ridiculously long time -- 24 minutes for less than 1M records -- and they would like to perform this on upwards of 70M records. Revisiting the ranking expression reduced the time to generate the results to just 4 seconds, a 99.7% improvement.

What is ranking within a group?

Think of it as ranking a column of values in a data source table, but resetting the rank based on a dimension's values, similar to AGGR, in the front end. For example: ranking each Customer's orders by OrderAmount, such that each Customer in the data has an order ranked 1, 2, 3, ...up to however many total orders that Customer has. By comparison, ranking without a group would just rank all Orders in the whole data set, without regard for the Customer, from 1 to the row count of the table.

Before

There is a popular solution on Qlik Community I found that appears to have been the inspiration for what the customer was doing. This approach appears in multiple posts:

AutoNumber(OrderAmount, Customer) as Rank

...
ORDER BY
    Customer,
    OrderAmount desc

It ranks the OrderAmount values per Customer, and, importantly, relies on ordering the data by Customer, then OrderAmount (descending) . Because of the ORDER BY, the OrderAmount values are introduced to the AutoNumber in order from highest to lowest, for each Customer. In my customer's data, this approach seems to start okay, but slows to a crawl, as the row count gets higher in the final table.

The logic itself is overkill, computationally: AutoNumber compares new field values to existing field values, but we don't care about the OrderAmount values themselves, because the data have already been sorted. We just want to count up from 1 to however many rows there are for each Customer, then start over at 1 for the next Customer.

After

I tested a few approaches that arrive at the same result, and this was the fastest:

If(Customer = Previous(Customer), Peek(Rank) + 1, 1) as Rank

...
ORDER BY
    Customer,
    OrderAmount desc

This checks to see whether the Customer on a given row is the same as the Customer on the previous row. If so, increment the ranking up by 1. If not, restart the rankings at 1. Again, this relies on ordering the data by Customer, then OrderAmount (descending). This expression also outperformed using Peek in both places instead of the Previous function.

With this change, the customer can now rank all 70M records 5x faster than it was taking to do just 1M before.

Some of you may be wondering: what if there is an OrderAmount tie, and I want to assign the same rank to both amounts? I tested adding conditional logic to accommodate that, and it did not meaningfully affect the reload time.

If(Customer <> Previous(Customer), 1, 
    If(OrderAmount <> Previous(OrderAmount), Peek(Rank) + 1, Peek(Rank))) as Rank

Contact Form

Name

Email *

Message *