I’ve recently moved all my AWS instances over to Amazon Linux and wanted to write a short update to installing Scrapy as the process is slightly different from Ubuntu.
Amazon Linux is a distribution that evolved from Red Hat Enterprise Linux (RHEL) and CentOS. It is available for use
Amazon DynamoDB is a fully managed proprietary NoSQL database service that is offered by Amazon.com as part of the Amazon Web Services portfolio.
If you’ve considered using MongoDB for storing your scraped results and if like me you’re doing your scraping from the cloud anyway then why not make use of
With the release of scrapy 1.0 I thought it was time to update this post to reflect these changes as the older instructions probably won’t work anymore.
Keeping It Free!
See the updated version for installing scrapy 1.0 and above here.
This post will cover the basics of getting started with Amazon AWS, creating an account, creating an EC2 instance, installing scrapy and scrapyd and finally making sure you do it all for free!
Keeping It Free!
Before you go any
Previously I walked through running Spark locally for development but one of the major challenges of learning to use distributed systems is understanding how the various components are installed and interact with each other in a production like environment.
You can use Vagrant or virtual machine images to run a cluster
In interesting news this week MapR has announced the availability of it’s Hadoop distribution in the AWS market place. All three editions of the MapR Distribution; Community, Enterprise and Enterprise Database Edition are all available.
Community Edition includes the core MapR Data Platform, including MapR File System and MapR Database, along