19 October 2022 Qlik subset load with SAMPLE and FIRST Share this message Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question about Qlik subset creation in load scripts with the FIRST and SAMPLE statements: While many people knew about loading data subsets using First, it was clear that the Sample prefix is a lot less known. The correct answer is D: Random 10% of total rows are loaded from Customer.QVD Arguably one of the most encountered problems during development is long loading times of (very) large datasets. In order to speed up development it is obviously nice to be able to create a subset in the load of the dataset. There are various ways of doing this, but looking at the comments and answers to the question, First is one which came to mind by many. First is a prefix which limits the load to a specified number of rows. For example First 1000 will limit the load to the first 1000 rows from the table. Making the load and thus the development time quicker. So what is the catch? However, in some cases it could be that loading the first rows doesn’t cut it. For example: The first rows of the dataset are filled with data of questionable quality (empty rows or test data) Data is sorted on alphabet and only name records starting with A are loaded Dates. The data could be sorted on entrydate and you only load the first dates from the dataset Especially that last example makes sense if we are looking into front-end results. How do we compare year to date measures or point in time reporting if we only load a few dates? Take the following example dashboard from a car sales company. In this example we have used First 1000 as a prefix. We can clearly see that there is data in the dashboard, but that most of the charts and KPI objects are not really telling us much. In this way we cannot check whether the measures are performing as expected. Now using an equivalent sized Sample load, in this case using Sample (0.1) as prefix we get the following result: This dashboards speaks more to the imagination, giving us a better look at all the various data. Since we now have data randomly picked from the total dataset we can immediately see that year to date and in time measures are performing as expected. Using the SAMPLE prefix in Qlik The sample prefix has the following syntax: Sample p ( loadstatement | selectstatement ) The p argument is to indicate the probability that a record is read. It is optional to put this amount in parentheses. So looking at the example from the question we can tell that a random 10% is added to the loaded dataset. Things to keep in mind? Sample will look through the entire dataset. First will limit it to a fixed row number and stop. This means that sample could potentially still be slower. Every time you reload again the sample set will change. Keep that in mind. Each row is evaluated separately. So with Sample (0.1) each row will have 10% chance to end up in the resulting table. This means that if you load a dataset consisting of 1000 rows, it will on average load 100 rows into the table, but for each iteration this could deviate a little bit, since it will go through the total dataset and evaluate each row on 10%. Sample must be a value between 0 and 1. Combining First and Sample is possible. Wait what? Combining the FIRST and SAMPLE prefixes Yes it is possible to combine both prefixes. Should you and what does it do? Let’s have a look at the results first to understand what is happening. The following load script is used: FIRST 1000SAMPLE (0.1)LOAD*FROM [LIB:\\Customers.qvd](qvd); After this load we also have loaded Sample (0.2), Sample (0.3) and so on, untill Sample (1). If we then check the resulting data models record and row numbers we see the following results: As we can see the load is limited to 1000 rows, but still random sampled. There is no difference in order of these prefixes. So it doesn’t matter if we use first and then sample or sample and then first. However the logical conclusion of First 1000 Sample (0.1) would be that Qlik would load the first 1000 rows and then samples 10% of these, giving us probably 100 rows as result. But this is not the case. When combining first and sample the order doesn’t matter. Qlik will always evaluate sample first and then limit this load to the first 1000 samples. As we can see in the table above this is as expected. Sample(1) will result in a 100% chance for each row to return. This gradually moves to a bigger sample with how less the chance is a record is added. So sample (0.1) means that each row has a 10% chance of being added to the data model. First 1000 tells the load script to stop at row 1000, which could be a higher record number because of the chance of addition. So should you use this? All depends on the size of the data sets and what you wish to evaluate. But if you want to do a over time analysis on a subset of data, the usage of sample should be sufficient. That’s it for this week. See you next Friday? More from the Bitmetric team Qlik Cloud Backup Protect your investment in Qlik with daily incremental backups stored in an encrypted environment with redundant storage. Available for as little as 2 Euro per day. Learn more. Join the team! Do you want to work within a highly-skilled, informal team where craftsmanship, ingenuity, knowledge sharing and personal development are valued and encouraged? Check out our job openings. Friday Qlik Test Prep Solution How can we help? Barry has over 20 years experience as a Data & Analytics architect, developer, trainer and author. He will gladly help you with any questions you may have. Call us Mail us 10 March 2025 Qlik Data Flow: Simplifying Data Transformation Without Code Qlik Data Flow simplifies data transformation with a visual editor. Clean, join, and reshape data using drag and drop, no coding needed. Automatically generate Qlik script as you build. Learn how it works, see a step-by-step example, and compare it to Qlik Data Manager and Qlik Script. ETL No-Code Qlik Script 6 March 2025 Just because your data is in the cloud does not make it safe. Qlik Cloud does not automatically back up your data, leaving your apps vulnerable to loss, corruption, or accidental deletion. Learn why a backup solution is essential and how Bitmetric’s Qlik Cloud Backup keeps your data safe. Take action now to protect your analytics Governance Qlik Security Support 24 February 2025 Update: Qlik Cloud icons collection Discover our updated collection of 239 Qlik Cloud icons on GitHub. Featuring new Data Flow icons in SVG and PNG formats, perfect for enhancing your Qlik schematics and documentation. Qlik Visualization
10 March 2025 Qlik Data Flow: Simplifying Data Transformation Without Code Qlik Data Flow simplifies data transformation with a visual editor. Clean, join, and reshape data using drag and drop, no coding needed. Automatically generate Qlik script as you build. Learn how it works, see a step-by-step example, and compare it to Qlik Data Manager and Qlik Script. ETL No-Code Qlik Script
6 March 2025 Just because your data is in the cloud does not make it safe. Qlik Cloud does not automatically back up your data, leaving your apps vulnerable to loss, corruption, or accidental deletion. Learn why a backup solution is essential and how Bitmetric’s Qlik Cloud Backup keeps your data safe. Take action now to protect your analytics Governance Qlik Security Support
24 February 2025 Update: Qlik Cloud icons collection Discover our updated collection of 239 Qlik Cloud icons on GitHub. Featuring new Data Flow icons in SVG and PNG formats, perfect for enhancing your Qlik schematics and documentation. Qlik Visualization