I’m working on a couple of projects at the moment
- Big Data Interview Prep – this is a Q&A site that I started to put together a while ago to help people looking to interview for big data roles. Very much still under development but I’m adding questions and answers.
- Horse Racing Data Warehouse – this is an ongoing project designed to consolidate bloodline, sales and race data from various sites into a very large daily updated horse racing dataset. Making extensive use of Scrapy, AWS, Spark, Parquet and Avro the warehouse currently contains data on over 100K races and well over 1 million horses giving me an interesting dataset to use testing the latest and greatest big data and machine learning technologies on.