2 november 2022

How to do an incremental BUFFER LOAD in Qlik Sense

Deel dit bericht
Qlik data modelling - Bitmetric Friday Qlik Test Prep question

Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question about incremental loads using BUFFER (incremental) LOAD:

Qlik buffering load statement question of the week

It was interesting to see that we received quite a mixed bag of responses. Some people went for the ‘logical’ answer, others for the ‘sales’ answer, but of course we were looking for the ‘certification’ answer 😉

The correct answer is B: Use the BUFFER (incremental) prefix

Whereas Answer A is obviously nonsense we also saw quite some comments that answered C or D. To be fair, in our consulting practice we will most likely also opt for answer D. Depending on the scale and future requirements we might even opt for answer C. From the Qlik certification perspective though, there is only one right answer: B. Let’s see how it works.

Buffer load statement

The BUFFER load statement is used in front of a LOAD or SELECT statement and tells Qlik to automatically create a QVD file of the result and use this to cache/buffer subsequent loads. The statement is as follows:

Buffer [(option [ , option])] ( loadstatement | selectstatement )
option::= incremental | stale [after] amount [(days | hours)]

After using the buffer statement a QVD buffer file is made and stored in the buffer subfolder. This is located in the Qlik program data folders. Buffer QVD files are removed when the app that created them no longer exists or no longer references the created buffer.

The same thing happens when you change the fields you load. If you make any change to the load or select statement the buffer will be deleted. This happens because the buffer filename is calculated by running a 160-bit hexadecimal hash function on the entire select and load statement.

When to use?

The buffer load is valuable in it’s ability to reduce load times by buffering the loaded table into a QVD. As exampled in this question, this is best used with large files like logfiles. Logfiles have the advantage that they are incremental in their nature by default. All new created logs are appended to previous loaded logs and no changes happen in the old data.

How to use?

In the following examples we are loading an almost 40 million row logfile in various ways. Let’s see what the difference between the different loads is, how buffer exactly works and what the different advantages and disadvantages are.

Incremental load:

First of we do an incremental load of the logfile:

Incremental load script

This results in the following load time:

Incremental load results

As we can see the total load takes one and a half minutes. Now let’s see what the buffer load does:

Buffer incremental load:

Buffer incremental load script

This results in the following load time:

The buffer load only takes 44 seconds. But wait, there is more to get here. Since buffer, as we have learned, stores a buffer QVD by itself, we don’t need to store the log file. In that case we can even win some more time:

As we can see this brings the buffer reloading time back to only 1/3 of the incremental load.

Why does this happen?

The answer to why the buffer load in this case is faster lies in the way the incremental load of the buffer works. The first incremental load works on a where exists clause. This means that it needs to fully read through both the stored log.qvd and the logfile to check on the logid field and see if they already exist or not.

Buffer works in a different way. The buffered QVD has a header with metadata:

In this metadata we can find the sourcefilesize field. This is the size the previous read version of the sourcefile (the logfile). When using the buffer incremental load syntax, Qlik simply skips to that position in the input file and starts loading data from that point onward.

This is the reason why this works well with logfiles, since they don’t (or shouldn’t) change after a line has been written. In case you have files where historical data can still change, this is not a right approach as we can see when we list all the advantages and disadvantages of buffer incremental load versus where exists incremental load.

Advantages and disadvantages of using BUFFER (incremental)

Advantages

  • Easy to script and understand
  • Faster load times compared to where exists
  • Good solution for loading large logfiles

Disadvantages

  • Changes in the load/select statement structure delete the buffer
  • Less control over the resulting file (stored in the Qlik program data folder)
  • Does not work when historical data can change

That’s it for this week. See you next Friday?

Take your Qlik skills to the next level!

Since 2013, the Masters Summit for Qlik is the premier advanced training for Qlik. Join us in Vienna and take your Qlik skills to the next level.

Join the team!

Do you want to work within a highly-skilled, informal team where craftsmanship, ingenuity, knowledge sharing and personal development are valued and encouraged? Check out our job openings.

Friday Qlik Test Prep Performance Script Solution

Hoe kunnen we je ondersteunen?

Barry beschikt over meer dan 20 jaar ervaring als architect, developer, trainer en auteur op het gebied van Data & Analytics. Hij is bereid om je te helpen met al je vragen.