29 maart 2024 Dealing with missing and late arriving values with OtherSymbol and other solutions Deel dit bericht At Bitmetric we continuously hone our skills, and we like to help you do the same. That’s why we regularly post a new Qlik certification practice question to our LinkedIn company page. Some time ago we asked the following Qlik Data Architect certification practice question about dealing with missing or late arriving values in your source data. This was somewhat of a trick question, as there are multiple (somewhat) correct answers. The correct answers are A and C, and in some cases even D Of course, the right solution often depends upon the context of the problem. Let’s take a look at all three of these possible solutions and see how they work, and in which cases they might be useful. But first, what problem are we solving? Facts without a corresponding dimension, what’s the problem? Consider the (simplified) data model below. When looking at the Salesperson and Sales tables, we see there are sales for SalespersonId 4 and 5 that don’t have a corresponding salesperson. After we load this data into Qlik, the issue becomes clear. When no data is selected, sales made by these missing SalespersonId’s is included in the total. These sales are excluded when all salespersons are selected. This can be confusing for users and lead to incorrect conclusions being drawn from the data. It’s advisable to do something about it. ℹ Intermezzo: what causes missing dimension values?There can be multiple causes for missing dimension values. A very common one being timing discrepancies known as early-arriving facts (or late-arriving dimensions), the data for the dimension table is simply lagging behind and will be loaded later. Another cause can be that sometimes the dimension value is just not available. For example, in a retail setting, some customers might pay with cash and the retailer has no further information about them. Now that we know what the issue is, let’s look at 3 ways we can resolve it. Remove non-corresponding values from the fact table The first possible solution is the one least likely to be correct. Instead of dealing with the missing dimension values, we simpy remove the facts that don’t have a corresponding dimension value: We can accomplish this using statements such as KEEP, or WHERE EXISTS. For example with the following script: Salesperson:RIGHT KEEP (Sales)LOAD * INLINE [ SalespersonId, Salesperson 1, Winnie the Pooh 2, Tigger 3, Piglet ]; Problem solved, right? Well… only if we can be absolutely sure that these facts aren’t needed. For example because it’s test data, or when the business rule states that only sales that can be attributed to a salesperson are considered actual sales. At Bitmetric, we don’t encounter this scenario frequently. Typically, it’s better to make it clear to the user that certain dimension values are missing, and to make them selectable for further analysis. We’ll look at two different ways to accomplish this next. Add missing values with WHERE NOT EXISTS The first approach is to add the missing values to the dimension table in the script. The steps to do this are: Copy all dimension keys in the dimension table to a temporary field, in a temporary table; Concatenate the dimension keys in the fact table to the dimension table, but only those keys that do not exist in the temporary field we created in the previous step. For this we use WHERE NOT Exists(). Drop the temporary table and field. In our example, the script looks like: Now the missing dimensions values are visible to the user, and selecting all values yields the same result as selecting no values: This solution works well, but there’s another solution that’s even lazier simpler. Let’s take a look at that next. The OtherSymbol value handling variable The second approach is to use the OtherSymbol value handling variable. This approach seems to be less well known than the previous approach, but is a lot easier to implement. The steps are: Set the OtherSymbol variable to a ‘dummy’ value that will be used as the key value for the missing dimension value Add the unknown value as the last record of the dimension table In our example, this results in the following script: Of course, in a typical scenario we would CONCATENATE the record to the existing dimension table. The result is shown below. All unknown salespersons are now mapped to the Unknown salesperson and are now selectable by the user. This works, but how? The OtherSymbol method works by adding values that exist in the same field in previously loaded tables, but haven’t been found yet in the table that is currently being loaded. this means that there are two things we need to keep in mind: The fact table must be loaded before the dimension table The unknown/OtherSymbol record need to be the last record of the dimension table Another thing to keep in mind is that OtherSymbol creates a row for each unknown value. Because only a single salesperson Unknown is show we might assume all sales are linked to a single record. In reality, the same record is created for each value: WHERE NOT Exists() or OtherSymbol You may wonder what the best method is better. As mentioned at the start of this post, it all depends on the context of the problem. To get the first question out of the way, performance between both methods is roughly equal. Besides performance, the pro’s and cons of both methods are: WHERE NOT Exists() ✅ Pro Well known and understood pattern More flexible when scripting, for example including the ID in the label ❌ Con More script OtherSymbol ✅ Pro Simple Less script ❌ Con Very specific load pattern, which leaves room for error Less flexible If pressed, at Bitmetric we’ll choose the WHERE NOT Exists() method over the OtherSymbol method. It’s good to be aware of both methods though, hopefully now you are too 😉 That’s it for this week, hope to see you again on a future blog post! More from the Bitmetric team Take your Qlik skills to the next level! Since 2013, the Masters Summit for Qlik is the premier advanced training for Qlik. Join us in Vienna and take your Qlik skills to the next level. Join the team! Do you want to work within a highly-skilled, informal team where craftsmanship, ingenuity, knowledge sharing and personal development are valued and encouraged? Check out our job openings. Data Model Friday Qlik Test Prep Qlik Solution Hoe kunnen we je ondersteunen? Barry beschikt over meer dan 20 jaar ervaring als architect, developer, trainer en auteur op het gebied van Data & Analytics. Hij is bereid om je te helpen met al je vragen. Bel ons Mail ons 8 oktober 2024 Artificial Intelligence, Machine Learning, and Deep Learning Explained: How They Impact Your Business In today’s rapidly evolving technological landscape, Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are transforming industries and redefining how businesses operate. In this blog post, we will break down these three definitions and elaborate on them. AI 25 september 2024 Building Ethical AI: Practical Frameworks for Responsible Innovation AI is transforming industries with innovation and efficiency. But with great power comes great responsibility. The real question is: How do you turn ethical principles into actionable guidelines for AI development? And what steps should your team take to make it happen? AI 17 september 2024 What is AI Ready Data Data quality is all about how accurate, consistent, complete, and up-to-date your data is. If your data is good, you’ll get reliable insights and be able to make smarter decisions. It’s a key part of making sure your AI and machine learning projects are successful. AI Qlik
8 oktober 2024 Artificial Intelligence, Machine Learning, and Deep Learning Explained: How They Impact Your Business In today’s rapidly evolving technological landscape, Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are transforming industries and redefining how businesses operate. In this blog post, we will break down these three definitions and elaborate on them. AI
25 september 2024 Building Ethical AI: Practical Frameworks for Responsible Innovation AI is transforming industries with innovation and efficiency. But with great power comes great responsibility. The real question is: How do you turn ethical principles into actionable guidelines for AI development? And what steps should your team take to make it happen? AI
17 september 2024 What is AI Ready Data Data quality is all about how accurate, consistent, complete, and up-to-date your data is. If your data is good, you’ll get reliable insights and be able to make smarter decisions. It’s a key part of making sure your AI and machine learning projects are successful. AI Qlik