Spinning up a Spark Cluster on AWS EC2: Step-by-Step

Previously I walked through running Spark locally for development but one of the major challenges of learning to use distributed systems is understanding how the various components are installed and interact with each other in a production like environment.

You can use Vagrant or virtual machine images to run a cluster

Scrapy and DynamoDB on AWS

Amazon DynamoDB is a fully managed proprietary NoSQL database service that is offered by as part of the Amazon Web Services portfolio.

If you’ve considered using MongoDB for storing your scraped results and if like me you’re doing your scraping from the cloud anyway then why not make use of

ImportError: No module named settings

So after scratching my head for a while as to why my shiny new spider wasn’t doing what it was supposed to do I finally found this issue¬†which hasn’t been merged in as yet.

Simple fix, do not call your scrapy project test!

