29 maart 2024

Dealing with missing and late arriving values with OtherSymbol and other solutions

Deel dit bericht
Dealing with missing and late arriving values in your data model using the OtherSymbol variable and other solutions

At Bitmetric we continuously hone our skills, and we like to help you do the same. That’s why we regularly post a new Qlik certification practice question to our LinkedIn company page. Some time ago we asked the following Qlik Data Architect certification practice question about dealing with missing or late arriving values in your source data.

Dealing with missing dimensions or values in Qlik with the OtherSymbol value handling variable.


This was somewhat of a trick question, as there are multiple (somewhat) correct answers.

The correct answers are A and C, and in some cases even D

Of course, the right solution often depends upon the context of the problem. Let’s take a look at all three of these possible solutions and see how they work, and in which cases they might be useful. But first, what problem are we solving?

Consider the (simplified) data model below. When looking at the Salesperson and Sales tables, we see there are sales for SalespersonId 4 and 5 that don’t have a corresponding salesperson.

A datamodel with missing dimension values. This could be due to early arriving facts, data quality issues, etc.

After we load this data into Qlik, the issue becomes clear. When no data is selected, sales made by these missing SalespersonId’s is included in the total. These sales are excluded when all salespersons are selected. This can be confusing for users and lead to incorrect conclusions being drawn from the data. It’s advisable to do something about it.

Qlik, selecting all values yields a different value than selecting no values.

Now that we know what the issue is, let’s look at 3 ways we can resolve it.

The first possible solution is the one least likely to be correct. Instead of dealing with the missing dimension values, we simpy remove the facts that don’t have a corresponding dimension value:

Delete facts without corresponding dimensions

We can accomplish this using statements such as KEEP, or WHERE EXISTS. For example with the following script:

Salesperson:
RIGHT KEEP (Sales)
LOAD * INLINE
[
SalespersonId, Salesperson
1, Winnie the Pooh
2, Tigger
3, Piglet
];

Problem solved, right? Well… only if we can be absolutely sure that these facts aren’t needed. For example because it’s test data, or when the business rule states that only sales that can be attributed to a salesperson are considered actual sales.

At Bitmetric, we don’t encounter this scenario frequently. Typically, it’s better to make it clear to the user that certain dimension values are missing, and to make them selectable for further analysis. We’ll look at two different ways to accomplish this next.

The first approach is to add the missing values to the dimension table in the script. The steps to do this are:

  • Copy all dimension keys in the dimension table to a temporary field, in a temporary table;
  • Concatenate the dimension keys in the fact table to the dimension table, but only those keys that do not exist in the temporary field we created in the previous step. For this we use WHERE NOT Exists().
  • Drop the temporary table and field.

In our example, the script looks like:

Now the missing dimensions values are visible to the user, and selecting all values yields the same result as selecting no values:

This solution works well, but there’s another solution that’s even lazier simpler. Let’s take a look at that next.

The second approach is to use the OtherSymbol value handling variable. This approach seems to be less well known than the previous approach, but is a lot easier to implement. The steps are:

  • Set the OtherSymbol variable to a ‘dummy’ value that will be used as the key value for the missing dimension value
  • Add the unknown value as the last record of the dimension table

In our example, this results in the following script:

Of course, in a typical scenario we would CONCATENATE the record to the existing dimension table. The result is shown below. All unknown salespersons are now mapped to the Unknown salesperson and are now selectable by the user.

OtherSymbol

This works, but how? The OtherSymbol method works by adding values that exist in the same field in previously loaded tables, but haven’t been found yet in the table that is currently being loaded. this means that there are two things we need to keep in mind:

  • The fact table must be loaded before the dimension table
  • The unknown/OtherSymbol record need to be the last record of the dimension table

Another thing to keep in mind is that OtherSymbol creates a row for each unknown value. Because only a single salesperson Unknown is show we might assume all sales are linked to a single record. In reality, the same record is created for each value:

You may wonder what the best method is better. As mentioned at the start of this post, it all depends on the context of the problem. To get the first question out of the way, performance between both methods is roughly equal. Besides performance, the pro’s and cons of both methods are:

✅ Pro

  • Well known and understood pattern
  • More flexible when scripting, for example including the ID in the label

Con

  • More script

✅ Pro

  • Simple
  • Less script

❌ Con

  • Very specific load pattern, which leaves room for error
  • Less flexible

If pressed, at Bitmetric we’ll choose the WHERE NOT Exists() method over the OtherSymbol method. It’s good to be aware of both methods though, hopefully now you are too 😉

That’s it for this week, hope to see you again on a future blog post!

Take your Qlik skills to the next level!

Since 2013, the Masters Summit for Qlik is the premier advanced training for Qlik. Join us in Vienna and take your Qlik skills to the next level.

Join the team!

Do you want to work within a highly-skilled, informal team where craftsmanship, ingenuity, knowledge sharing and personal development are valued and encouraged? Check out our job openings.

Data Model Friday Qlik Test Prep Qlik Solution

Hoe kunnen we je ondersteunen?

Barry beschikt over meer dan 20 jaar ervaring als architect, developer, trainer en auteur op het gebied van Data & Analytics. Hij is bereid om je te helpen met al je vragen.