scrapylogo

Installing Scrapy on Amazon Linux

I’ve recently moved all my AWS instances over to Amazon Linux and wanted to write a short update to installing Scrapy as the process is slightly different from Ubuntu.

Why Amazon Linux?

Amazon Linux is a distribution that evolved from Red Hat Enterprise Linux (RHEL) and CentOS. It is available for use

Read More

white_datatunnel

Data Monetisation – A Poisoned Chalice For Carriers?

The idea that the data flowing across a communication service providers (CSP) network could hold intrinsic value is nothing new. The principles of infonomics, that information should be accounted for with the same formality as traditional assets was first proposed in the late 90’s by Doug Laney, vice president and

Read More

spark-logo-trademark

Spinning up a Spark Cluster on AWS EC2: Step-by-Step

Previously I walked through running Spark locally for development but one of the major challenges of learning to use distributed systems is understanding how the various components are installed and interact with each other in a production like environment.

You can use Vagrant or virtual machine images to run a cluster

Read More

scrapy_and_aws

Scrapy and DynamoDB on AWS

Amazon DynamoDB is a fully managed proprietary NoSQL database service that is offered by Amazon.com as part of the Amazon Web Services portfolio.

If you’ve considered using MongoDB for storing your scraped results and if like me you’re doing your scraping from the cloud anyway then why not make use of

Read More

The Rise of Encryption

Encryption. It’s a volatile subject at the moment, especially in the wake of last Friday’s horrific attacks in Paris. The UK government has been campaigning against unbreakable encryption in the run up to the publication of the Investigatory Powers Bill which was released last week and there are new calls

Read More