Dynamic cohort analysis in Tableau…in one chart

Screenshot of design team mockup, to be implemented in Tableau

The challenge

Our design team recently came up with an analysis sheet that enabled users to define any cohort and compare to any other cohort they like. On the surface it may not sound challenging: create a visualization for each cohort and give each visualization its own set of filters.

Not so fast, though! Both cohorts are in the same visualization (ex., Cohort 1, Cohort 2, Delta, Delta %, …), which is more challenging to implement in Tableau. This ruled out two different sets of filters because there is only one set of filters for the viz. You may also consider the common workaround of overlaying two similar charts, but that still didn't enable us to calculating deltas between the cohorts or even keep the axes in sync with unpredictable cohort sizes.

The last material requirement was that the cohorts must be completely independent. We were comparing two sets of data that could overlap, not in the cohort to out of the cohort, as in many examples I found.

The solution: Set Controls

I had a sense that Sets would help, if I could make it easy to govern what went into the sets. Fortunately, Set Controls were introduced in 2020. Set Controls look like filters but really facilitate creating arrays of values that make up a Set. Making selections in Set Controls, i.e., changing Set membership, does not affect any visualization unless you explicitly refer to the Sets in calculations.

Set Controls (they just look like filters)

This KB article ("Create a Multiple Value Selection Parameter"?) seeded the solution in my mind, which shows an example of how to do this using one field. In my case, we needed to account for multiple filters you may use to define the Sets, so it is a slight extension of the concept.

Follow along

Here are the steps to implement, using the 2022.4 version of Superstore. 

Create a new Worksheet and select the Sample - Superstore data source, which contains Profit and a handful of interesting filters we can use to define our cohorts. We are going to compare the Profitability between two dynamic cohorts.

 
 
Decide what filters you would like to make available to define the cohorts. We'll pick Region, Category, and Subcategory.
 
Create Sets for those filters. For each of those fields, create two sets: one for cohort A and one for cohort B. Right-click the field name, then Create, Set. You will do this a total of six times. (Technically it will be faster to create Sets for cohort A, duplicate them, then rename for cohort B.)
 
Create Set using Region

Rename for Cohort A (optionally, click All)


Create Calculated Fields for cohort membership to use in all subsequent calculations, returning a binary true/false. The Set references themselves are also evaluated as a binary true/false. The reason for creating this is to centralize what Fields/Sets we consider for cohort membership, so we don't have to retype the list of Sets in other calculations. (Again, it will likely be faster to create a field for cohort A, then duplicate and edit.)
Repeat this for cohort B, as well

Create measures for each Cohort. You can create any measure you like, following this pattern. I created Profit, Sales, and Profit % (i.e., Profit / Sales) for each cohort.
Create these for cohort A, then duplicate and update for cohort B

Lastly, create a delta Profit % measure, to compare the cohorts.
 
Now we need to show the Set Controls so an end user can define the cohorts. For each of the six Sets we created, right-click and Show Set. They appear on the right, just as filters would.


Build some visuals using the cohort-specific measures. I created a bar chart, then duplicated the Worksheet and made a pivot table, but go wild. Note that right now, the measure values are identical because the Set membership identical.




 
Define your cohorts by making selections in the Set Controls. For the cohort A Category I selected "Furniture" and then "Office Supplies" for cohort B. Note how the chart results update based on our dynamic cohorts. This visualization is telling us that Office Supplies have historically been much more profitable than Furniture.
 
Control for empty rows. As you define narrower cohorts or group by dimensions with many distinct values (AKA high cardinality), you will start to see more blank rows in your visualization, where neither cohort has any measure values for the dimension values in the viz. Because our cohorts are only part of the aggregation functions -- SUM IF -- no filtering of the data as a whole is applied, i.e., nothing in the WHERE clause.
 
To generate blank rows in this screenshot, I switched Customer Name for Region in my Pivot table, then defined cohort A as "Art" and cohort B as "Binders". Note that many customers have bought neither. Now we'll hide those rows.
 
2 out of 3 Chads buy neither art nor binders

Add one more Calculated Field that considers membership in cohort A or B returns true.
 
Add the new Calculated Field to the Filter shelf and set as "True". The blank rows in the table will disappear because this new filter limits the data for the whole visualization to only what is in a cohort.
 
Download my working file here: Superstore with Cohort Analysis.twbx

A few implementation notes

We built the calculations such that you only need to modify Is in Cohort A and Is in Cohort B if you want to change the Sets that define the cohorts. Everything else would be inherited automatically from those -- the measures, the filter. I did this for consistency and to simplify accommodating changes over time.

In calculations, Set membership is evaluated like a SQL IN operator. The logic lives entirely in the SELECT part of the query in our SUM IF calculations, unless you use a Set or calculation with a Set as a filter, in which case it will be evaluated in the WHERE clause.

From a performance standpoint, my two cautions are:

  1. Limit the number of fields you use to define cohorts
  2. Try to use low cardinality (few distinct values) fields

Keep in mind that any higher level filters applied to the visualizations will apply uniformly to both sets, ex., if you wanted to see both cohorts for the same time period. Other filters continue to work normally.

NOT the solution

I struggled to find examples of cohort analysis or comparative analysis in Tableau. Most I found were focused on mutually exclusive cohorts -- in the cohort or out of the cohort, more like brushing -- not truly dynamic sets of data, which may overlap. Needing to combine the dynamic cohorts in one viz also prevented us from creating a viz for each cohort, side by side, which is much simpler.

Here are some things I considered and why they wouldn't work to meet the requirement. I find it to helpful to specifically note what I didn't choose and why because understanding the constraints of a platform makes it easier to come up with creative solutions to challenges in it.

  • There is no Tableau expression syntax to selectively respect or ignore filters, so I could not create copies of the filter fields for defining the cohorts and pick which fields to respect in the measure calculations. Everything in a viz responds to the same filters.
  • All data source tables must be related, so I could not create unrelated tables ("island tables") to act as pick lists to refer to in the expressions, ex., Filter = [Filter Island Value].
  • You can't use fields from multiple sources in the same viz without blending, so I couldn't duplicate the data source and use one for each cohort in the chart.
  • So then I thought I would try blending copies of the same data source, but blending does not enable you to apply a filter to one blended data source at a time; only both, so I could not filter each blended source independently to create the cohorts.
  • Parameters avoid filtering the data and can be referred to independently, but a parameter can have only one value, so I could not use parameters to create truly dynamic cohorts.

Other platforms

To do this in Qlik, you would likely use Alternate States or expressions using Set Analysis to refer to filters in island tables.

To do this in Power BI, you would do something similar to the island tables in Qlik, referring to them using DAX. You could also explicitly ignore or respect filters using DAX.

Contact Form

Name

Email *

Message *