This is a first post in a series, that’s gonna describe how I’ve used python for web scraping a blog. This first blog post describes how I setup a virtual environment for Python 3.5 on my Ubuntu 16.04 LTS machine.

But let me start off by sketching out what I actually want to achieve:

Situation: Over a year, I’ve written a weekly blog post on quotes. I’ve used two quotes per post, making a total of 104 quotes.
Problem: I’d like to have a list of those quotes, but I don’t want to manually copy and paste them all in a text document or Excel Sheet.
Solution: Use python to scrape the blog to identify those posts, and extract the quotes.

Excellent. Let’s get started.

The Ministry of Testing Dojo offers a quick guide Automate Almost Anything with Python. I’m digging into Getting Started which wants to use Python 3.x, which is fine by me – I’ve so far only used Python 2.x, but it makes sense to switch and try something new. Alas, I’ve got the wrong Python on my machine (I’m on Ubuntu 16.04 LTS):

$ python --version
Python 2.7.12

Or not? Stackoverflow tells me that Python 2.x and Python 3.x can both happily co-exist on my machine. And indeed, they already do:

$ ls -lh /usr/bin/ | grep python
lrwxrwxrwx 1 root root    9 Nov 24 00:52 python -> python2.7
lrwxrwxrwx 1 root root    9 Nov 24 00:52 python2 -> python2.7
-rwxr-xr-x 1 root root 3.4M Dec  4 18:14 python2.7
lrwxrwxrwx 1 root root    9 Dec 21 21:59 python3 -> python3.5
-rwxr-xr-x 2 root root 4.3M Nov 28 16:53 python3.5

Only the current default is not what I want to use. I don’t think I want to change the default (other things might depend on it). Thankfully, in python, there is this thing called ‘virtualenv’. Virtualenv is a tool that creates isolated Python environments for Python libraries, for example one where you use a different python version then your default. This allows you to have multiple side-by-side installations of Python, which is useful if you want to have a specific project environment to loose the worry of affecting other projects. So let’s use ‘virtualenv’ to create my own context in which to develop my script, following the installation instructions.

Firstly, I need ‘pip’, pythons package manager:

$sudo apt install python-pip
...
$ pip --version
pip 8.1.1 from /usr/lib/python2.7/dist-packages (python 2.7)

Hmmmm. This has installed pip for my context, i.e. python 2.7. Is this gonna cause me problems later on? I believe not – ‘virtualenv’ should create it’s own, isolated environment, independent of other pythons, with its own versions of ‘pip’, ‘setuptools’ etc, regardless of which python or pip was used to install all this in the first place.

Cracking on, next I’ll need ‘virtualenv’.

$sudo apt install virtualenv
...
Setting up python3-virtualenv (15.0.1+ds-3ubuntu1) ...
Setting up virtualenv (15.0.1+ds-3ubuntu1) ...

And now let’s create a virtual environment for me to use my Python 3.5 in. So I’ll create a directory, and call ‘virtualenv’ on it.

$ mkdir /home/ks/dev/python3
$ virtualenv /home/ks/dev/python3/
Running virtualenv with interpreter /usr/bin/python2
New python executable in /home/ks/dev/python3/bin/python2
Also creating executable in /home/ks/dev/python3/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.

Uuuuups. Probably not surprisingly, this looks as if it has now created a virtual environment with Python 2.x in it. Yikes. I guess I’ll be learning how to remove a virtualenvironment soon. But let’s check first – let’s activate the virtual environment and ask it:

$ source /home/ks/dev/python3/bin/activate
(python3) $ which python
/home/ks/dev/python3/bin/python
(python3) $ python --version
Python 2.7.12
(python3) $ deactivate
$ which python
/usr/bin/python
$ python --version
Python 2.7.12

I activate the virtual environment by calling the included ‘activate’ shell script with the built in Unix command ‘source’ – this gives me an isolated interactive context, as indicated by the ‘(python3)’ prefix.
Using the Unix command ‘which’ I can identify the location of the python executable (now inside my virtual environment!) and double check on the version. I use ‘deactivate’ (a function defined in virtualenv’s ‘activate’ shell script), to terminate the interactive session, rerun my checks, and alas, I’m back to my systems default python.

But how do I get this to use the python I want? Maybe virtualenv knows:

$ virtualenv -h
Usage: virtualenv [OPTIONS] DEST_DIR

Options:
....
-p PYTHON_EXE, --python=PYTHON_EXE
           The Python interpreter to use, e.g.,
           --python=python2.5 will use the python2.5 
           interpreter to create the new environment. 
           The default is the python2 interpreter on 
           your path (e.g. /usr/bin/python2)

Cool. Let’s remove the incorrect virtualenv, and create a new one pointing to the python I want:

$ rm -r /home/ks/dev/python3/
$ virtualenv -p /usr/bin/python3 /home/ks/dev/python3
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/ks/dev/python3/bin/python3
Also creating executable in /home/ks/dev/python3/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.

And now let’s check this again:

$ source /home/ks/dev/python3/bin/activate
(python3) $ which python
/home/ks/dev/python3/bin/python
(python3) $ python --version
Python 3.5.2

Success!

References:

Advertisements