How to automatically merge multiple CSV files into one XLS file with Alteryx

Suppose you have hundreds of CSV files with the same structure and you want to combine them in a single files for an effective hadling. One of the most efficient ways is that of using Alteryx, as it involves a super easy 1-step procedure.

Suppose the files are all in a single directory (here C:/SANDBOX), with different names.

Capture

Open Alteryx, then add Input Data tool connecting to C:\SANDBOX\*.csv, where the wildcard will filter and select all the files ending with a “.csv”.

As for file format, use “Comma-Delimited Text Files”, then you might want to flag “First Row Contains Field Names”.

Capture2

Finally, add on Output Data tool pointing e.g. to C:\SANDBOX\Database.xlsx|Output, where the text after the pipe identifies the sheet name and run the code.

Run the code, and in less than a second you will get a single pivot-friendly Excel file ready to use.

Capture2 Capture3

How to automatically download multiple files from a drop-down field with iMacros

In many cases it happens that data provided by websites can be downloaded only one column/row at a time (e.g., per year, per variable, etc.), requiring some manual downloading effort such as clicking on a drop-down field or on several links.

In this post I will explain ho this work can be easily automated using iMacro to automate the downloading process.

As an example, I will use an Italian provider of insurance information that allow to download 1 row of data at a time. For the view in scope, it allows to download the revenues across products, for a given year of a given company. From these, we will create a pivot with all the 200 companies of the database.

 

Capture

STEP 1

Assuming iMacro is already installed, the following script will allow to create a loop to

  • Change the company of the view
  • Automatically “click” on the “Export” link (files will be downloaded as a CSV)

To automate this, the following code is enough:


TAB T=1
TAG POS=1 TYPE=SELECT FORM=ID:riepilogoForm ATTR=ID:sImpresa CONTENT={{!LOOP}}
TAG POS=1 TYPE=FONT ATTR=TXT:ESPORTA
WAIT SECONDS=2

Specifically:

  • Row 1 ensures that focus of the script is Tab n° 1 (i.e., the firt on the left)
  • Row 2 restricts the focus of the script to the <FORM> element for which ID is “riepilogoForm”, and to its <SELECT> element for which ID is “sImpresa” (in our case, a drop-down field), actually selecting its {{!LOOP}}-th element
  • Row 3 clicks, within the document, on the <FONT> element for which its TXT attribute (a non-HTML attribute that checks the context of <FONT> is “Esporta” (the text triggering the JS that create the file)
  • Row 4 requires the script to wait 2 seconds. This might be non-necessary anymore as latest versions of iMacro holds the loop iterations until any triggered download is ended

As identifying the right commands to use might be sometimes uneasy, a practical shortcut could be that of using the “Record” option of iMacros, checking which elements are identified (and how) and then re-using the automatically generated script, editing it based on needs.

TIP: In order to effectively download the files without having the windows “Save as …” holding the process, it is advisable to pre-select a default directory as download target for the Firefox/Chrome browser.

By running the mentioned script, one would get as many CSV as the iteration specified, one for each company in the list.

SCALABILITY: Unfortunately standard iMacro language does not allow for nested loops, but this limitation can be worked out using some JavaScript code, a topic that will be focus of a future post.