Scrapy in a Container – Docker Development Environment

687474703a2f2f626c6f672e70687573696f6e2e6e6c2f77702d636f6e74656e742f75706c6f6164732f323031332f31312f646f636b65722e706e67

I discovered the joys of docker a while ago and whenever I’m trying out some new technology I try and set aside some time to develop a Dockerfile so I can build a development environment.

This allows the easy distribution and running of development environments across all platforms, AWS included. Write once, run anywhere.  Perfect if you’re always running up new development environments.

Here’s a Dockerfile for a scrapy development environment. Everything you need to quickly run a spider on any platform without worrying about python versions or libxml!

It contains:

To build the container, download the latest Dockerfile from github.

and then run the docker build command, note the trailing dot

This will download the ubuntu base image and install scrapy and friends for you.

To run the container you can use a number of commands. For an interactive shell you can use something like the following, which starts the container, maps a local host directory to a directory inside the container and then runs a bash shell:

Note that this won’t persist the container, to start the container in the background use:

You will then need to attach to the running container:

 

About This Author

Big Data and Python geek. Writes an eclectic mix of news from the world of Big Data and Telecommunications interspersed with articles on Python, Hadoop, E-Commerce and my continual attempts to learn Russian!

Post A Reply