Visualizing data with Matplotlib
Overview
Teaching: 20 min
Exercises: 30 minQuestions
How can I plot my data?
How can I save my plot for publishing?
Objectives
Plot simple graphs from data.
Use the pyplot
library from matplotlib
for creating effective visualizations
The mathematician Richard Hamming once said, “The purpose of computing is insight, not numbers,” and
the best way to develop insight is often to visualize data. Visualization deserves an entire
lecture of its own, but we can explore a few features of Python’s matplotlib
library here. While
there is no official plotting library, matplotlib
is the de facto standard. Let’s import it -
import matplotlib.pyplot
Everytime we use a function from this library - say the plot
function we need to pre-pend it with the library name matplotlib.pyplot
. Typing matplotlib.pyplot.plot
many times is quite repetative and can lead easily to typo-mistakes.
Instead we can import the library with a shortened nickname:
import matplotlib.pyplot as plt
Now each time we want to use the plot
function we can call it using plt.plot
instead.
Some IPython Magic
If you’re using an IPython / Jupyter notebook, you’ll need to execute the following command in order for your matplotlib images to appear in the notebook when
show()
is called:%matplotlib inline
The
%
indicates an IPython magic function - a function that is only valid within the notebook environment. Note that you only have to execute this function once per notebook.
First, as in earlier tutorials, let’s read-in the cleaned data and assign the wavelengths and transmittance data to variables:
numpy.loadtxt("./data/transmittance_cleaned.csv")
wavelengths = data[:,0]
ITO_transmittance = data[:,1]
CdS_transmittance = data[:,2]
Si_transmittance = data[:,3]
GaAs_transmittance = data[:,4]
Basic plots can be generated quickly with matplotlib
Let’s take a look at the transmittance data for the ITO:
plt.plot(ITO_transmittance)
plt.show()
Here we have asked
asked matplotlib.pyplot
(which we’ve shortened to plt
) to create and display a line graph of the ITO transmittance values.
At the moment the x-axis has no physical significance; it is an integer range.
Instead we can ask matplotlib.pyplot
to plot a line graph of transmittance vs wavelength.
We can also add a labelled axes and a title.
plt.plot(wavelengths,ITO_transmittance)
plt.xlabel("wavelength (nm) ")
plt.ylabel("Transmittance (%)")
plt.title("ITO transmittance")
plt.show()
We can easily extend this to plot multiple lines. We also each line and create a legend so we can differentiate between each material.
plt.plot(wavelengths,ITO_transmittance,label='ITO')
plt.plot(wavelengths,CdS_transmittance,label='CdS')
plt.plot(wavelengths,Si_transmittance,label='Si')
plt.plot(wavelengths,GaAs_transmittance,label='GaAs')
plt.legend()
plt.xlabel("wavelength (nm) ")
plt.ylabel("Transmittance (%)")
plt.show()
To group similar plots we use a figure and subplots
As an alternative to having multiple lines on a single plot, you can group multiple plots in a single figure using subplots.
This script below uses a number of new commands. The function matplotlib.pyplot.figure()
creates a space into which we will place all of our plots. The parameter figsize
tells Python how big to make this space. Each subplot is placed into the figure using
its add_subplot
method. The add_subplot
method takes 3
parameters. The first denotes how many total rows of subplots there are, the second parameter
refers to the total number of subplot columns, and the final parameter denotes which subplot
your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a
different variable (axes1
, axes2
, axes3
, axes4
). Once a subplot is created, the axes can
be titled using the set_xlabel()
command (or set_ylabel()
).
Here are our four plots (one for each material) in a 2x2 arrangement:
import numpy
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10.0, 10.0))
axes1 = fig.add_subplot(2, 2, 1)
axes2 = fig.add_subplot(2, 2, 2)
axes3 = fig.add_subplot(2, 2, 3)
axes4 = fig.add_subplot(2, 2, 4)
axes1.plot(wavelengths,ITO_transmittance)
axes1.set_ylabel("Transmittance (%)")
axes1.set_title("ITO")
axes2.plot(wavelengths,CdS_transmittance)
axes2.set_title("CdS")
axes2.sharey(axes1)
axes3 .plot(wavelengths,Si_transmittance)
axes3.set_xlabel("wavelength (nm) ")
axes3.set_ylabel("Transmittance (%)")
axes3.set_title("Si")
axes4.plot(wavelengths,GaAs_transmittance)
axes4.set_xlabel("wavelength (nm) ")
axes4.set_title("GaAs")
axes4.sharey(axes3)
fig.tight_layout()
plt.savefig('./group_transmittance.png')
plt.show()
The call to tight_layout
improves the formatting (sub-plot placement) of the figure
The call to savefig
saves our figure to the file group_transmittance.png
.
There are many ways to plot and customise plots using matplotlib
We are only touching the surface of plotting with matplotlib. There are hundreds of different plots, and thousands of ways to customise your plot. A good starting point for exploring the options is by looking online: The official matplotlib gallery and the Python graph gallery have lots of code examples.
Scatter plots
Scatter plots are useful for identifying the association between two variables. For example, we could measure the > speed of a car at equal intervals between 0 and 10 seconds (inclusive) as 2, 4.2, 6.1, 8.3, 9.9 m/s. Scatter plot > this data.
Solution
import matplotlib.pyplot as plt import numpy as np time = np.linspace(0,10,5) speed = np.array([2,4.2,6.1,8.3,11.2]) plt.scatter(time,speed) plt.xlabel("time (s)") plt.ylabel("speed (m/s)")
Plot Scaling
Why do all of our plots stop just short of the upper end of our graph?
Solution
Because matplotlib normally sets x and y axes limits to the min and max of our data (depending on data range)
If we want to change this, we can use the
set_ylim(min, max)
method of each ‘axes’, for example:axes3.set_ylim(0,6)
Update your plotting code to automatically set a more appropriate scale. (Hint: you can make use of the
max
andmin
methods to help.)Solution
# One method axes3.set_ylabel('min') axes3.plot(numpy.min(data, axis=0)) axes3.set_ylim(0,6)
Solution
# A more automated approach min_data = numpy.min(data, axis=0) axes3.set_ylabel('min') axes3.plot(min_data) axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1)
Moving Plots Around
Modify the program to display the four plots next to each other instead of in a 2x2 arrangement.
Solution
You need to adjust the following code:
import numpy import matplotlib.pyplot as plt fig = plt.figure(figsize=(20.0, 5.0)) axes1 = fig.add_subplot(1, 4, 1) axes1 = fig.add_subplot(1, 4, 2) axes1 = fig.add_subplot(1, 4, 3) axes1 = fig.add_subplot(1, 4, 4)
Key Points
Use the
pyplot
library frommatplotlib
for creating simple visualizationsBasic plots can be generated quickly with matplotlib
To group similar plots we use a figure and subplots
There are many ways to plot and customise plots using matplotlib