Using the GROUP BY clause in Qlik Sense

Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question about the GROUP BY clause in Qlik:

A Data Architect receives the table shown in figure 1. The current front end solution shows the wrong results. The request is to clean this table in the script so that the front end shows the total sales per customer by month compared to the target per month. Which solution should be used?

The correct answer is D.

The problem stated in this question was that the data in the front end showed wrong results. By examining the table in Figure 1 we can quickly determine that the problem lies in the Target. This was duplicated per row for each Customer and Month. So when using a Sum(Sales) and Sum(Target) in the frontend, Customer A in January would end up with a total sales amount of 41 and a total target amount of 100, which should have been 50. It was requested to fix this in the script.

To create the solution in the script we need to aggregate the sales for each customer for each month while loading the table. This should be done by using Sum() as the aggregation function. However just using Sum() in the load is not enough, when making aggregations we need to tell Qlik what to aggregate them by. This is done using the Group By clause.

So let’s have a look at how the final table should be loaded into the data model and which answer provides this.

It is apparent that we need to sum the sales per customer and month. This is done by adding the following script:

Totals:
LOAD
Customer
, Month
, Sum(Sales) As SalesAmount
Resident Sales
GROUP BY
Customer
, Month
;

Aggregating values while loading a table in the script always has to be accompanied by the Group By clause. In this case we need the Sum() of Sales per Customer and Month, so the Group By should include those as is shown above.

All that is left now is to add the target to the table. As we already figured out is that the target is duplicated per row, so if we check the possible answers, using a join and distinct load this can negate this:

Target:
LEFT JOIN (Totals)
LOAD DISTINCT
Customer
, Month
, Target
RESIDENT Sales
;

The script above will load the table shown below and join it to the final table:

If we wouldn’t have used distinct load the table below would have been loaded and joined duplicating rows in the joined table:

The target table, if we hadn't used DISTINCT.

This in turn could lead to bad results in the front end, since by the duplication of the rows not only the target would be duplicated but also the rows containing the SalesAmount, which is unwanted.

By combining these two options we would get the end result we are looking for and the answer to this question. First aggregating the values per customer and month, then adding the unique target per customer and month to that table.

Some other things to notice:

It is not needed to include all Group By fields into the load. If we only load Customer and sum(Sales) grouped by Customer and Month, we would have gotten the same aggregated results, be it without the Month field, which was needed in this example month to get the appropriate end result.
Loading fields and not adding them to the Group By clause won’t work. All fields that are being loaded into the table and are not aggregated, need to be mentioned in the Group By clause. For example; if we would load Customer, Month and Sum(Sales) and only group these on Customer we would receive an error. In the resulting table Qlik can’t aggregate the Sum(Sales), since it doesn’t know how to aggregate on the Month field, without telling to do so, it has only received to Group By Customer.

And then finally, before we round up a quick shoutout to everyone who answered C. This is technically speaking definitely a correct answer. The end results are not wrong and the data model is not bad either:

The two tables are connected on Customer and Month, resulting in a synthetic key, something which in the official documentation is noticed as something unwanted and a possible cause of low performance.

However, in this case the SalesAmount and Target are to be linked to each other trough Customer and Month. So instead of creating a composite key ourselves:

Totals:
LOAD
Customer & ‘-‘ & Month AS %CustomerMonthKEY

Qlik has done this for us in the form of a synthetic key and table. However, since we are preparing for Qlik certification questions, synthetic keys are ruled out. Be assured that in any Qlik certification you will be taking, a synthetic key would never be the correct answer.

That’s it for this week. See you next Friday?

More from the Bitmetric team

Qlik Cloud Backup

Protect your investment in Qlik with daily incremental backups stored in an encrypted environment with redundant storage. Available for as little as 2 Euro per day. Learn more.

Join the team!

Do you want to work within a highly-skilled, informal team where craftsmanship, ingenuity, knowledge sharing and personal development are valued and encouraged? Check out our job openings.

Friday Qlik Test Prep Performance Script Solution

Hoe kunnen we je ondersteunen?

Barry beschikt over meer dan 20 jaar ervaring als architect, developer, trainer en auteur op het gebied van Data & Analytics. Hij is bereid om je te helpen met al je vragen.

Bel ons

Mail ons

The correct answer is D.

More from the Bitmetric team

Qlik Cloud Backup

Join the team!

Hoe kunnen we je ondersteunen?

The Cost of Bad Data: What Is It Really Doing to Your Business?

Rethinking Pop-Ups in Qlik Cloud: A Simple Yet Effective Solution

Qlik Data Flow: Simplifying Data Transformation Without Code