Zorro on Steroids II - Data Conversion

Zorro stores its history data in binary format as a list of T1 or T6 structs. The advantage of the format is that the files occupy less disk space than ASCII files. You can download ready to use FXCM data from Zorro's download page. But what if you want to use data from different broker? Or data covering larger time period? Or what if you need to simulate variable spread? Then it's necessary to get data from different source and convert them.

There is plenty of data sources, some are paid and some are free, quite often even the paid ones are of poor quality. But that's for another article. One of the better sources for Forex data is Dukascopy. They provide free Forex data going back to 2004 (at least for majors and some of the crosses) and there are several different tools available that allow you to download, manage and export them (e.g. StrategyQuant Tick Downloader). But how do we convert Dukascopy tick data to Zorro format?

Dealing With Tick Data

If you work with tick data you'll notice that they can be quite large. A comma-separated values file containing one year worth of data can have over one gigabyte in size. So it might be impossible to load a file containing data spanning several years into memory. Unfortunately Zorro stores its price structures in decreasing order of time stamps while market data usually come sorted in increasing order. This complicates the conversion because it's generally easier to traverse text files in forward fashion in most systems / applications.

There are several possible workarounds. To mention some of them:

  • Reverse the line order using tac (or similar) command and convert the resulting file.
  • Through one pass store positions of all new line characters and through second pass traverse the file from back using fseek (or its equivalent).
  • Traverse the file backwards and load each of its successive parts into buffer, extract and convert currently loaded lines. Store partially loaded lines for the next iteration.
  • Traverse the file backwards loading one character at a time.

In general I was able to achieve better conversion times using combination of Python and Zorro rather than Zorro alone so I decided to stick with this approach.

Implementation

The Python script source code can be downloaded from here. You'll need also Zorro conversion script which can be downloaded here.

I've tried several different approaches and the fastest was the one using 'tac' command. Using two passes was slightly slower. The slowest method was reading the file backwards one character at a time. In the following table there's comparison of the methods when converting 1.3 GB file:

MethodElapesed time
Using 'tac'19m52s
Using two passes22m10s
Reading characters backwards40m36s

As far as I know the 'tac' command is not available on Windows so if you want to run the script from Windows command line you have to use its equivalent or a slower method. All of the methods are included in the script and changing the one currently used can be done simply by changing it in split_file function. Maybe some other small changes will be required in order to run the program on Windows. I tested it only in Cygwin environment.

Usage

You need to have Python and Zorro (version 1.50 or newer) installed. Copy ConvertData.c to Zorro/Strategy folder. You need to edit it and set the basePath variable to your tick data path. You can also comment out #define SPREAD directive if you want to produce only .t1 files containing ask prices. In convert.py script you have to set zorro_path variable (the directory containing Zorro.exe). And that's it. The easiest way is to call python convert.py [tick_data_directory]. This will produce both T1 and T6 data from all files in the directory. T6 data are built based on one minute intervals and contain spread values for opening prices in fVal variable. For more information run python convert.py -h.

Please note that everything works with the implicit assumption that the input files were produced using StrategyQuant Tick Downloader. If you want to use different source you'll have to either produce .csv files in the same format or modify the scripts to accommodate for the differences.

Also note that the asset being converted has to be defined in AssetsFix.csv file.

Schizo Frenetik

Read more posts by this author.

Subscribe to StatsMage — Quant Ideas Worth Sharing

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!