Installing Scrapy on Amazon Linux

scrapylogo

I’ve recently moved all my AWS instances over to Amazon Linux and wanted to write a short update to installing Scrapy as the process is slightly different from Ubuntu.

Why Amazon Linux?

Amazon Linux is a distribution that evolved from Red Hat Enterprise Linux (RHEL) and CentOS. It is available for use within Amazon EC2: it comes with all the tools needed to interact with Amazon APIs, is optimally configured for the Amazon Web Services ecosystem, and Amazon provides ongoing support and updates.

Prerequisites

  • Obviously an AWS account
  • Create a Role in IAM, I use S3 to store the results of scraping so I create a Role and assign the AmazonS3FullAccess policy.
  • When you create the instance assign the IAM Role you created above. This allows you to assign access to other AWS services to the server without embedding you access key and secret key on the server or in scripts. Note that you can only assign a IAM Role while the instance is being created, you can change the permissions once the instance is created to add new services but you need to make sure you add the Role when you launch. If you have an existing instance and want to add a Role, then the easiest way is to shut it down, take a snapshot and launch again with the new Role.

If you used one of the smaller instances such as t2.nano or t2.micro you’ll need to add some swap space for the install to work otherwise the lxml compilation will fail.

Then it’s as easy as:

At this point you might want to take a snapshot or create an AMI so you can quickly spin up scrapy instances in the future.

About This Author

Big Data and Python geek. Writes an eclectic mix of news from the world of Big Data and Telecommunications interspersed with articles on Python, Hadoop, E-Commerce and my continual attempts to learn Russian!

Post A Reply