Using the subset ratio in Qlik to spot errors in key fields

Checking the subset ratio on key fields is a very good way to spot errors in key field associations in Qlik Sense and QlikView

Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question about subset ratios in Qlik Sense:

A Data Architect checks the data model viewer and finds that the Customer in the Sales table has a subset ratio of 72%. What does this mean?

The correct answer is C: 28% of the CustomerID’s have never placed an order

To validate and check the quality of the data model which you have just created, the data model viewer is an important tool. One of the things you can check here is the subset ratio. The subset ratio is the percentage of present distinct values within that table compared to the total distinct values of the chosen field in the whole data model . To demonstrate this see the following image:

In the image above we have selected the key field CustomerID in the Sales table. We can see in the bottom half of the screen that the field CustomerID has a total of 100 distinct values in all tables in the model, not just in the selected table, visible as Total distinct values. By selecting this field within the Sales table however, we see that there are 72 distinct values in this table alone, visible as the Present distinct values. So of all 100 distinct CustomerID’s within all tables we now know that there are 72 distinct CustomerID’s in the Sales table. So by diving these we will receive the subset ratio:

Present distinct values / total distinct values = 72 / 100 = 0,72 or 72%

Now knowing that there is a total of 100 values this means that the Customer table must be filled with 100 distinct CustomerID’s. Having a look in the data model viewer confirms this:

Using the data model viewer in Qlik Sense

There are 100 distinct values present in the Customer table. Now by subtracting the 72% of the Sales table of the total 100% we end up with 28% of CustomerID’s in the total model (in this case all present in the Customer table) which have never placed an order.

Other things to keep in mind

Some other things to keep in mind about the subset ratio:

What if the subset ratio of the dimension table is also lower then 100%?

If this amount would have been lower there would have been a discrepancy between the Customer table and the Sales table in which both tables would have had values not present in the other. For a fact table it is not uncommon to have a subset ratio of lower then 100%, however a dimension table, like the Customer table in the example, with a subset ratio of less then 100% means that you should have a look at the data in the model. If for example the subset ratio in the Customer table would have been 90%, it means that we have 10% of distinct CustomerID’s present in the Sales table, which are not being matched with a CustomerID in the Customer table.

What if the total of the subset ratio’s is 100%?

If the combined total of the subset ratio’s of all tables would be 100% it means that there are no matching values between the tables. Good luck 😉

FAQ

Addressing or correcting the errors identified by a low subset ratio in Qlik Sense or QlikView typically involves a manual review of the data model and the relationships between tables. Once a low subset ratio is identified, indicating discrepancies or missing data, users should examine the key fields and relationships to understand the cause of these discrepancies. This might include checking for incorrect key field associations, missing entries, or data quality issues in the source data. Corrective actions can vary, ranging from modifying the data load script to adjust how tables are joined, correcting data entry errors in the source data, or redefining relationships within the data model to ensure all relevant data is accurately associated.

Qlik Sense and QlikView do not have built-in tools that automatically fix discrepancies highlighted by the subset ratio. However, they offer robust data modeling and scripting capabilities that allow users to manually address and rectify these issues. Users can leverage the data model viewer and the script editor to make adjustments and improvements to the data model. Additionally, Qlik’s community forums and documentation provide extensive resources and examples on how to handle common data modeling challenges.

The subset ratio’s impact on the performance and accuracy of data analysis in Qlik Sense or QlikView is significant. A low subset ratio can indicate that not all data is being correctly associated, which may lead to incomplete or inaccurate analysis outcomes. For instance, if customer IDs are not matching correctly between tables, sales analysis could miss out on certain transactions, leading to underreported sales figures. Furthermore, discrepancies in key field associations can lead to performance issues, as the data model might be more complex or less optimized than intended. Ensuring a high subset ratio, where appropriate, helps in maintaining the integrity and reliability of the data analysis, enabling more accurate and efficient reporting and insights.

That’s it for this week. See you next Friday?

More from the Bitmetric team

Take your Qlik skills to the next level!

Since 2013, the Masters Summit for Qlik is the premier advanced training for Qlik. Join us in Vienna and take your Qlik skills to the next level.

Join the team!

Do you want to work within a highly-skilled, informal team where craftsmanship, ingenuity, knowledge sharing and personal development are valued and encouraged? Check out our job openings.

Data Model Friday Qlik Test Prep Solution

Hoe kunnen we je ondersteunen?

Barry beschikt over meer dan 20 jaar ervaring als architect, developer, trainer en auteur op het gebied van Data & Analytics. Hij is bereid om je te helpen met al je vragen.

Bel ons

Mail ons

The correct answer is C: 28% of the CustomerID’s have never placed an order

Other things to keep in mind

FAQ

How do you address or correct the errors once a low subset ratio is identified?

Are there any tools or features within Qlik Sense or QlikView that automatically fix or suggest fixes for discrepancies highlighted by the subset ratio?

How does the subset ratio impact the performance and accuracy of data analysis in Qlik Sense or QlikView?

More from the Bitmetric team

Take your Qlik skills to the next level!

Join the team!

Hoe kunnen we je ondersteunen?

Artificial Intelligence, Machine Learning, and Deep Learning Explained: How They Impact Your Business

Building Ethical AI: Practical Frameworks for Responsible Innovation

What is AI Ready Data