Friday, June 28, 2013

Algorithmic Trading System Update 6/28/2013

It's been a while since I've updated. It's not that I've been lazy. I've had to focus all of my energy into studying for my CFA. It was dreadful knowing that I was losing precious hours and days to not being able to get the system up and running faster, but I'm finally back on track. I don't think I can ever get chartered for the CFA since I don't work in a finance company, and I need 4 years of related experience for the actual charter. However, learning more is never in vain. While not all of it is apparent today, I know I've taken away more from the curriculum than I think I did. I definitely have a better understanding of many of the statistical methods due to having a test enforced upon me.

As embarrassing as it is, I have to admit that the implementation of my statistical arbitrage strategies were wrong. In fact, after finally having time to look at the project again, I've found a number of really bad problems. I did some calculations incorrectly, which suddenly explained why almost every trade executed on one side of the spread but not the other. I also realized that I should be cointegrating the logarithm of the prices, not the prices themselves. One might be amazed and wonder how my backtests were showing any alpha generation at all. I guess when your margin of error is sufficiently wide enough, you can still capture profits when you're wrong. Upon fixing these problems, I saw that the alpha generation increased dramatically. Sharpe ratios went up, and drawdowns went down for most of the fixed strategies.

Now that I don't have to study, and the system is coming along nicely, I've actually had some time to research for more strategies. In the course of looking for more strategies, CPU became the bottleneck. I can't just run a backtest once. I need to run it multiple times in order to get estimates where the optimal parameters would be. It's impractical to find the optimal parameters, considering the difference in real world situations and due to the amount of data that needs to be tested. I have to resort to running a number of backtests until I have a set of parameters that are "good enough".

As a result of practical limitations, I invested some time into making my backtests run faster. Originally, My backtests took about 25 minutes to run for a set of 8 years of 1 second tick data for each pair of stocks. After taking some timing measurements, I found out that the cointegration was taking about 6 and a half minutes. I removed the calls to my database, and cached the data in memory, bringing it down to 2 and a half minutes. This is a nice 4 minute improvement, but I was able to find a bigger one. My tick data files were being stored as comma separated csv files. By converting to binary data files, I was shave another 7 minutes off the average backtest time. Depending on the dataset I'm testing against now, I can run a test in about 7 to 15 minutes, which is a very healthy cut from the prior 18-25 minutes.

My strategies currently cannot be vectorized for threading. As a result, I run 7 backtests simultaneously, each with a different set of parameters. 7 because that's what gets my CPU to 100% utilization. I've also added a mechanism to queue up more backtests so that there is less time wasted due to me not getting to my computer in time to start the next tests. Faster turnarounds allow me to tweek and retest before I forget what I had in my head. This will become more exciting as I become more confident about allowing my system to auto-fire.

No comments:

Post a Comment