Loading XLSX with an indefinite number of header rows in Qlik

 

We recently had a client with a large number of budget Excel files that needed to be loaded into Qlik, where each Budget file submitter was supposed to follow a spreadsheet template. As you would expect, there was variation across the files that created a challenge to load them consistently. 

However, another issue was more mysterious: Qlik would think there were an unexpected number of header lines to skip at the top of a sheet, even though the header was actually on a predictable, consistent row, e.g., row 16. So, we had a case where one sheet with the header on row 16 was thought by Qlik to have 16 header lines and another sheet with the header on the same row was thought by Qlik to have 15. There was nothing visible that was out of compliance with the Budget file to fix, yet it was causing reload errors and confusion. What would you do?

(If you're curious: This seemed to have something to do with Qlik *automatically* skipping rows that were completely empty and had no formatting as though they didn't exist at all, even in the preview when generating the load statement. But even this seemed inconsistent, where I could make a skipped row "appear" but then couldn't make it disappear again. At this point I decided troubleshooting the XLSX itself was no longer the best use of my time...)

We found an acceptable workaround, which requires loading from the sheet twice: once to index where it finds the header row, then again to load all the data on the sheet using that index as an input. Even if you are not seeing this weird behavior, the solution below makes loading user-maintained files more resilient because rows can be added or removed above the headers without consequence.

Solution

Finding the header row

This may seem abstract with the variables (some of which I removed that weren't relevant to the header problem), but in total we were loading 20+ spreadsheets with about 20 sheets each, so loops and variables were necessary. In total the final code volume to load them was relatively low.

The assumptions are that the header is within the first 20 rows and the first column is always "Department". The WHERE clause condition will vary for you, but just needs to be something that will evaluate as true on the header row only.

FindHeader:
FIRST 20 LOAD
    RecNo() as Index,
    F1
FROM
    [$(vFileNamePath)] (ooxml, embedded labels, table is [$(vSheetName)])
WHERE
    F1 = 'Department';

LET vHeaderRows = FieldValue('Index', 1);
DROP TABLE FindHeader;

TRACE *** -     Header is on row *$(vHeaderRows)*;

In this step, we are loading the XLSX as though there are embedded labels and assuming 0 header rows. Empty labels (where row 1 of the column is blank) translates to values like F1, F2, … in the load statement, hence the F1 in our load.


RecNo versus RowNo for finding the index is important, and this is a good example of the difference. RecNo evaluates against the source while RowNo applies to the result of the load. The result of this load always contains one row, so RowNo would always return 1, which isn't helpful. RecNo, on the other hand, returns the row number of the source spreadsheet, which is what we're trying to determine.

To demonstrate the difference, here are the results using both RecNo and RowNo with the source spreadsheet previewed above.


'Department' was found on the 15th row of the first column by Qlik, so we know we can reliably use that number for Qlik to skip the correct number of lines when we load all the data in the next step.

Loading the same sheet again, with the correct number of lines skipped

This load is straightforward and uses the vHeaderRows variable set during the first pass.

[Budget]:
LOAD
    FileDir()  as FileDir,
    FileName() as FileName,
    FileTime() as FileTime,
    *
FROM
    [$(vFileNamePath)] (ooxml, embedded labels, header is $(vHeaderRows) lines, table is [$(vSheetName)]);

I hope that's helpful for anybody else who runs into this situation with QlikView or Qlik Sense.

Contact Form

Name

Email *

Message *