I’m working on a couple of projects at the moment

  • Big Data Interview Prep – this is a Q&A site that I started to put together a while ago to help people looking to interview for big data roles. Very much still under development but I’m adding questions and answers.
  • Horse Racing Data Warehouse – this is an ongoing project designed to consolidate bloodline, sales and race data from various sites into a very large daily updated horse racing dataset. Making extensive use of Scrapy, AWSSpark, Parquet and Avro the warehouse currently contains data on over 100K races and well over 1 million horses giving me an interesting dataset to use testing the latest and greatest big data and machine learning technologies on.