Screenshot of design team mockup, to be implemented in Tableau |
The challenge
Our design team recently
came up with an analysis sheet that enabled users to
define any cohort and compare to any other cohort they like. On the surface
it may not sound challenging: create a visualization for each cohort and give
each visualization its own set of filters.
Not so fast,
though! Both cohorts
are in the same visualization (ex., Cohort 1, Cohort 2, Delta, Delta %,
…), which is more challenging to implement in Tableau. This ruled out two different sets of filters because there is only one set of filters for the viz. You may also consider the common workaround of overlaying two similar charts, but that still didn't enable us to calculating deltas between the cohorts or even keep the axes in sync with unpredictable cohort sizes.
The last material
requirement was that the cohorts must be completely independent. We were
comparing two sets of data that could overlap, not in the cohort to out
of the cohort, as in many examples I found.
The solution: Set Controls
I had a sense that Sets would help, if I could make it easy to govern what went into the sets. Fortunately, Set Controls were introduced in 2020. Set Controls look like filters but really facilitate creating arrays of values that make up a Set. Making selections in Set Controls, i.e., changing Set membership, does not affect any visualization unless you explicitly refer to the Sets in calculations.
Set Controls (they just look like filters) |
This KB article ("Create a Multiple Value Selection Parameter"?) seeded the solution in my mind, which shows an example of how to do this using one field. In my case, we needed to account for multiple filters you may use to define the Sets, so it is a slight extension of the concept.
Follow along
Here are the steps to implement, using the 2022.4 version of Superstore.
Create a new Worksheet and select the Sample - Superstore data source, which contains Profit and a handful of interesting filters we can use to define our cohorts. We are going to compare the Profitability between two dynamic cohorts.
Create Set using Region |
Rename for Cohort A (optionally, click All) |
Create Calculated Fields for cohort membership to use in all subsequent calculations, returning a binary true/false. The Set references themselves are also evaluated as a binary true/false. The reason for creating this is to centralize what Fields/Sets we consider for cohort membership, so we don't have to retype the list of Sets in other calculations. (Again, it will likely be faster to create a field for cohort A, then duplicate and edit.)
Repeat this for cohort B, as well |
Create measures for each Cohort. You can create any measure you like, following this pattern. I created Profit, Sales, and Profit % (i.e., Profit / Sales) for each cohort.
Create these for cohort A, then duplicate and update for cohort B |
Lastly, create a delta Profit % measure, to compare the cohorts.
Build some visuals using the cohort-specific measures. I created a bar chart, then duplicated the Worksheet and made a pivot table, but go wild. Note that right now, the measure values are identical because the Set membership identical.
2 out of 3 Chads buy neither art nor binders |
Add one more Calculated Field that considers membership in cohort A or B returns true.
A few implementation notes
We built the calculations such that you only need to modify Is in Cohort A and Is in
Cohort
B if you want to change the Sets that define the cohorts. Everything else would be inherited automatically from those -- the measures, the filter. I did this for consistency and to simplify accommodating changes over time.
In
calculations, Set membership is evaluated like a SQL IN operator. The logic lives
entirely in the SELECT part of the query in our SUM IF calculations,
unless you use a Set or calculation
with a Set as a filter, in which case it will be evaluated in the WHERE
clause.
From a performance standpoint, my two cautions are:
- Limit the number of fields you use to define cohorts
- Try to use low cardinality (few distinct values) fields
Keep in mind that any higher level filters applied to the visualizations will apply uniformly to both sets, ex., if you wanted to see both cohorts for the same time period. Other filters continue to work normally.
NOT the solution
I struggled to find examples of cohort analysis or comparative analysis in Tableau. Most I found were focused on mutually exclusive cohorts -- in the cohort or out of the cohort, more like brushing -- not truly dynamic sets of data, which may overlap. Needing to combine the dynamic cohorts in one viz also prevented us from creating a viz for each cohort, side by side, which is much simpler.
Here are some things I considered and why they wouldn't work to meet the requirement. I find it to helpful to specifically note what I didn't choose and why because understanding the constraints of a platform makes it easier to come up with creative solutions to challenges in it.
- There is no Tableau expression syntax to selectively respect or ignore filters, so I could not create copies of the filter fields for defining the cohorts and pick which fields to respect in the measure calculations. Everything in a viz responds to the same filters.
- All data source tables must be related, so I could not create unrelated tables ("island tables") to act as pick lists to refer to in the expressions, ex., Filter = [Filter Island Value].
- You can't use fields from multiple sources in the same viz without blending, so I couldn't duplicate the data source and use one for each cohort in the chart.
- So then I thought I would try blending copies of the same data source, but blending does not enable you to apply a filter to one blended data source at a time; only both, so I could not filter each blended source independently to create the cohorts.
- Parameters avoid filtering the data and can be referred to independently, but a parameter can have only one value, so I could not use parameters to create truly dynamic cohorts.
Other platforms
To do this in Qlik, you would likely use Alternate States or expressions using Set Analysis to refer to filters in island tables.
To do this in Power BI, you would do something similar to the island tables in Qlik, referring to them using DAX. You could also explicitly ignore or respect filters using DAX.