Evaluating technical options (or not) when building BI solutions

from Google image search results for "evaluating options" from pehub.com

An associate learning a new technology with a complex expression language asked (paraphrasing):

Is there an "easy" way to tell which functions and calculation approaches are going to be more resource intensive than others without having to do detailed debugging and performance measurement?

It's a good question and one that applies to learning any new technology. Here are my thoughts.

1. When performance and data volumes are not a concern, save decision making effort by learning/using best practices and following generally accepted development patterns

When data volumes, reload windows, and/or concurrency are not huge, it probably won't make a difference if you can manage to squeeze more performance out of a solution. (And that's if you can find a better way, which is uncertain.) Most solutions fall in that category.

In these situations, default to best practices and prioritizing making your work maintainable and understandable to others over perhaps trivial differences in efficiency. Avoiding low-value decisions and introducing pointless variation in how things are done are good practices, too. And these decisions are reversible, too. You can deploy a solution sooner but can always revisit possible optimizations.

I think of two categories of best practices.

General analytics best practices

These are development patterns you build up over time with experience and synergistically enable you to learn new technologies and concepts faster. They're true of most BI platforms and even databases. Some examples are to push known work upstream, star schemas are good, and that integer or Boolean comparisons evaluate faster than text comparisons.

Platform-specific best practices

These are specific and sometimes idiosyncratic, and you're unlikely to spontaneously arrive at them. They're unique to a platform and may even contradict general best practices. Some examples are organizing transformations to leverage Query Folding in Power BI, learning the few things you can do in Qlik while maintaining optimized QVD loads, or using containers to build a Tableau dashboard.

Ideally you also learn what makes them best practices. Best practices are usually best practices because they complement how a platform works, under the hood -- how the "engine" works, how cache is used, what order of operations is followed, etc. Many platforms also have built-in or third-party automated solutions to check for best practice adherence but understanding how the platform works is often needed to make meaning of the results and propose superior solutions.

2. When performance matters a lot or you must create a unique solution, confirm that your alternatives do the same thing, then do more detailed performance measurement

Not all problems fit into the common situations addressed by best practices, requiring unique approaches. And sometimes you do need to cut the footprint of a large or high concurrency solution to the nth degree.

In these cases, treat this problem as you would any consulting problem.

1. Articulate the desired outcome/insights

The trick here is to not get too caught up in a specific technical experience, which severely limits your options. "We want to click this button, then hide this and show this other thing." No, let's back up. What insights would a user be trying to uncover? What's the point of the problem we're trying to solve? There may be several ways to get there, some of which work much better with how the platform works.

2. Generate options

I have no shortcut to increase your creativity, but here are two things to keep in mind.

  • Use native features as much as possible. Perhaps with small concession from users, out-of-the-box features could be used in their entirety to solve the problem, which often leads to good performance and low maintenance.
  • When you have multiple ideas, ensure that they achieve the outcome, noting minor differences to review with users. Behaviors may be subtly different and more or less desirable and easy to understand. For example, in Qlik, there is different filtering behavior when you use a link table compared to a concatenated fact table. Or when writing formulas a few different ways, one way may respond dynamically to filtering while another option is fixed.

3. Evaluate options/prototype

Once you have generated options, prototype the few most promising and measure the performance, for comparison.

Each platform has its own ways of doing this, depending on what you are measuring (reload performance, UI performance, data model footprint, …), but the one thing to keep in mind is to ensure that you are adequately isolating variables so you can be confident in your results. A few things to be aware of are whether cached results are returned, whether the BI environment is busy at the time of your testing, and whether the source data environment is busy at the time of your testing.

Some may see this exercise as tedious, but this is how you learn about exceptions to best practices and other unique hacks that are worth sharing.

Contact Form

Name

Email *

Message *