Thursday, March 21, 2013

Algorithmic Trading System Update 03/21/2013 - GLD GDX pairs trading

Up until now, all of my toy algorithms which I have been using to test how well my algorithmic trading system will fire trades has been working very well. I've gotten ahold of the GLD and GDX 1 second data and I decided to try to implement the pairs trading algorithm. It is now that I suddenly found a Grand Canyon of a flaw in my system. I've built it so that algorithms can only read one ticker at a time (lol?). My backtesting system was designed to read data files, but I never wrote anything to merge the data together based on time. Luckily, it didn't take me long to rewrite portions of it to allow the support of multiple tickers.

I put in quite a bit of thought into the merging portion because I knew this could become a bottleneck. Finally, I decided that I would merge all of the tick data from different symbols together and dump out set of merged files. When I read in these files, I would get a stream of already merged tick data. Granted that there was 15 GB of uncompressed data, it was out of the question to load everything into memory first. If I were to build an algorithm which needed more than 2 symbols, this system would become unsustainable due to RAM limitation. By merging all of the tick data first and then streaming through the files, my process stays under 70MB of RAM usage. The process of merging was pretty expensive. It took about 3 minutes to merge 6 years of data from 2 symbols. This would have added 3 minutes to every backtest that I ran. Considering the data does not change with each run, this would have been wasted.

After closing the gap in my system, I was quickly able to implement the draft version of the algorithm. Every time the day of the tick data changed, I used RDotNet to run a new cointegration calculation on End Of Day data. Then, I could recalculate my beta and p value. For now, I'm ignoring the p value and using only the beta for the hedge ratio. As I ran my backtest, this resulted in over 1000 calls to the RDotNet engine. I kept mysterious getting memory violations in my calls, often at the same date iterations. After a long amount of confusion and unsuccessful attempts to fix code, I found out that R has its own garbage collector. Some bug in R or RDotNet was probably causing a memory leak or segmentation faults. After calling the garbage collector before I perform each cointegrtaion, I managed to consistently finish running each backtest.

I dumped out the list of trades to a file after each backtest. At first, the results were delightful. I was delighted to have any results at all. What gave me real joy was to see that most of the trade log was flush with alpha. This was a nice lift from the terrible pnl that my toy algorithms gave me. There were some weird trades that resulted in large losses. Upon further inspection, I realized that I had bad data. Some ticks were just plain wrong. The price of GDX, which has stayed below 70, had weird spikes of 249 prints. It is great that I found these now and that I can tend to them before I run more backtests or run my system against real money.

This is rather exciting as the code I have for the GLD GDX pair can easily be run on any other pair. Once I work out all of the annoyances, I will be able to have more confidence while running it on other symbols.

No comments:

Post a Comment