Posts in Category: big data

scrapy_and_aws

Scrapy and DynamoDB on AWS

Amazon DynamoDB is a fully managed proprietary NoSQL database service that is offered by Amazon.com as part of the Amazon Web Services portfolio.

If you’ve considered using MongoDB for storing your scraped results and if like me you’re doing your scraping from the cloud anyway then why not make use of

Read More

mapr

MapR on AWS

In interesting news this week MapR has announced the availability of it’s Hadoop distribution in the AWS market place. All three editions of the MapR Distribution; Community, Enterprise and Enterprise Database Edition are all available.

Community Edition includes the core MapR Data Platform, including MapR File System and MapR Database, along

Read More

spark-logo-trademark

Spinning up a Spark Cluster on AWS EC2: Step-by-Step

Previously I walked through running Spark locally for development but one of the major challenges of learning to use distributed systems is understanding how the various components are installed and interact with each other in a production like environment.

You can use Vagrant or virtual machine images to run a cluster

Read More

The Rise of Encryption

Encryption. It’s a volatile subject at the moment, especially in the wake of last Friday’s horrific attacks in Paris. The UK government has been campaigning against unbreakable encryption in the run up to the publication of the Investigatory Powers Bill which was released last week and there are new calls

Read More

Spark 1.1.0 released

Spark continues it’s rapid release cycle with the first minor update to the 1.x release branch. This release brings operational and performance improvements in Spark core along with significant extensions to Spark’s newest libraries: MLlib and Spark SQL. It also builds out Spark’s Python support and adds new components to the

Read More