Matplotlib Basemap Tutorial: Plotting global earthquake activity

Update:

This tutorial has been updated and moved to http://introtopython.org/visualization_earthquakes.html.

In the last section of the tutorial, we learned how to plot single points on our maps, and how to plot sets of points on our maps.  We will use both of these techniques to create a map of the most recent earthquakes on the entire planet.

Our Dataset

The US government maintains a live dataset of all earthquakes on the planet over the last seven days, which have a magnitude of 1.0 or greater.  If we examine the first few lines of the  text file of the dataset, we can identify the information that is most relevant to us:

Src,Eqid,Version,Datetime,Lat,Lon,Magnitude,Depth,NST,Region
nc,71897386,0,"Sunday, December  2, 2012 23:02:11 UTC",38.7892,-122.7437,1.2,1.80,19,"Northern California"
ak,10611594,1,"Sunday, December  2, 2012 23:02:00 UTC",63.1368,-151.4235,1.3,16.20, 6,"Central Alaska"
ci,15255025,0,"Sunday, December  2, 2012 22:53:17 UTC",36.0927,-117.8425,2.5,2.50,42,"Central California"

If you are unfamiliar with data that looks like this, it is called a “comma-separated value” text file.  Data is often stored in this format because it is easy for us to process.  We simply read each line of the file into our program, split the line at the commas, and store the data we care about in our program.

For now, we are only interested in the latitude and longitude of each earthquake.  If we look at the first line, it looks like we are interested in the 5th and 6th columns of each line.  In the directory where you save your program files, make a directory called “datasets”.  Save the text file as “earthquake_data.csv” in this new directory.  Now let’s load the file into our program, read the first two lines, and split the values at the commas:

data_file = open('datasets/earthquake_data.csv')

for index, line in enumerate(data_file.readlines()):
    print line.split(',')
    if index == 1:
        break

This is what we see:

['Src', 'Eqid', 'Version', 'Datetime', 'Lat', 'Lon', 'Magnitude', 'Depth', 'NST', 'Region\n']
['hv', '60435606', '1', '"Thursday', ' November 29', ' 2012 06:26:09 UTC"', '19.2325', '-155.3395', '2.0', '3.70', '29', '"Island of Hawaii', ' Hawaii"\n']

The first line is what we expect, with the latitude and longitude in the 5th and 6th elements of the list.  But the commas in the dates throw things off, so for the rest of the file the latitudes will be in the 7th column, and the longitudes will be in the 8th column.  Here is the code to read the rest of the file, and store just this information into two separate latitude and longitude lists:

data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))

We take the 7th and 8th items in each line, and turn them into decimal floats.  Then we add these numbers to our list of latitudes and longitudes.  If you have not seen the enumerate function before, it lets us have easy access to the index of the for loop.  We use this to make sure we are not reading in the first line of the file, which contains the text labels for each column. [Update:  Brandon Rhodes pointed me to Python’s csv module, which makes it easier to work with csv files.  I have added a section at the end of this post about how to use the csv module in this project.]

Using what we learned about plotting a set of points, we can now make a simple plot of these points:

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
for index, line in enumerate(data_file.readlines()):
    if index > 1:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

x,y = map(lons, lats)
map.plot(x, y, 'ro', markersize=6)

plt.show()

Here is the map this code generates:

Worldwide earthquakes of magnitude 1.0 or greater for the last 7 days.

Worldwide earthquakes of magnitude 1.0 or greater for the last 7 days.

This is fun; in less than 30 lines of code we have turned a giant text file into a very informative map!  But there is one fairly obvious improvement we should make.  Let’s try to make the points on a map represent the magnitude of each earthquake.  We start out by reading the magnitudes into a list along with the latitudes and longitudes of each earthquake:

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
magnitudes = []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

x,y = map(lons, lats)
map.plot(x, y, 'ro', markersize=6)

plt.show()

Now instead of plotting all of our points at once, we will loop through the points and plot them one at a time.  When we plot each point, we will adjust the dot size according to the magnitude.  Since the magnitudes start at 1.0, we can simply use the magnitude as a scale factor.  To get the marker size, we just multiply the magnitude by the smallest dot we want on our map.  A minimum marker size of 4 seems to work well:

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
magnitudes = []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

min_marker_size = 4
for lon, lat, mag in zip(lons, lats, magnitudes):
    x,y = map(lon, lat)
    msize = mag * min_marker_size
    map.plot(x, y, 'ro', markersize=msize)

plt.show()

We define our minimum marker size at 4.  We then loop through all the data points, calculating the marker size for each point by multiplying the magnitude of the earthquake by the minimum marker size we have set.

If you have not used the zip function before, it takes a number of lists, and pulls the one item from each list.  On each loop iteration, we have a matching set of longitude, latitude, and magnitude of each earthquake.

Here is the map that is generated:

Worldwide earthquakes, with marker sizes representing the magnitude of each earthquake.

Worldwide earthquakes, with marker sizes representing the magnitude of each earthquake.

There is one more significant change we can make, to generate a more meaningful visualization.  Let’s use some different colors to represent the magnitudes as well.  We will make small earthquakes green, moderate earthquakes yellow, and significant earthquakes red.  The following function will identify the appropriate color for each earthquake.

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
magnitudes = []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

def get_marker_color(magnitude):
    if magnitude < 3.0:
        return ('go')
    elif magnitude < 5.0:         return ('yo')     else:         return ('ro') min_marker_size = 4 for lon, lat, mag in zip(lons, lats, magnitudes):     x,y = map(lon, lat)     msize = mag * min_marker_size     map.plot(x, y, 'ro', markersize=msize) plt.show()

Now we add a line in our plotting loop to grab the appropriate color for each dot.  If you are new to python, make sure your function code is before your plotting code.

# --- Process Data --- data_file = open('datasets/earthquake_data.csv') lats, lons = [], [] magnitudes = [] for index, line in enumerate(data_file.readlines()):     if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

def get_marker_color(magnitude):
    if magnitude < 3.0:
        return ('go')
    elif magnitude < 5.0:
        return ('yo')
    else:
        return ('ro')

min_marker_size = 4
for lon, lat, mag in zip(lons, lats, magnitudes):
    x,y = map(lon, lat)
    msize = mag * min_marker_size
    marker_string = get_marker_color(mag)
    map.plot(x, y, marker_string, markersize=msize)

plt.show()

These changes give us a much more satisfying visualization:

Worldwide earthquakes, with magnitudes represented by both color and size.

Worldwide earthquakes, with magnitudes represented by both color and size.

We can easily see where the most significant earthquakes are happening.  Before we finish, let’s add a title to our map.  Our title needs to include the date range for these earthquakes, which requires us to pull in a little more data  when we parse the raw text:

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
magnitudes = []
timestrings = []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))
        timestrings.append(' '.join(line.split(',')[4:6]).strip('"'))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

def get_marker_color(magnitude):
    if magnitude < 3.0:
        return ('go')
    elif magnitude < 5.0:
        return ('yo')
    else:
        return ('ro')

min_marker_size = 4
for lon, lat, mag in zip(lons, lats, magnitudes):
    x,y = map(lon, lat)
    msize = mag * min_marker_size
    marker_string = get_marker_color(mag)
    map.plot(x, y, marker_string, markersize=msize)

plt.show()

The date and time information is stored in the 5th through 7th items in the list, so we take these items together to make our datestrings.  We use the strip function to take out the double quotes that were included in the original text file.  To make our title, we just use the dates of the first and last earthquake.  Since the file includes the most recent earthquakes first, we need to use the last items as the starting date:

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
magnitudes = []
timestrings = []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))
        timestrings.append(' '.join(line.split(',')[4:6]).strip('"'))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

def get_marker_color(magnitude):
    if magnitude < 3.0:
        return ('go')
    elif magnitude < 5.0:
        return ('yo')
    else:
        return ('ro')

min_marker_size = 4
for lon, lat, mag in zip(lons, lats, magnitudes):
    x,y = map(lon, lat)
    msize = mag * min_marker_size
    marker_string = get_marker_color(mag)
    map.plot(x, y, marker_string, markersize=msize)

title_string = "Earthquakes of Magnitude 1.0 or Greater\n"
title_string += "%s through %s" % (timestrings[-1], timestrings[0])
plt.title(title_string)

plt.show()

This gives us the following map:

Our final map of the world's most recent earthquakes, with an informative title.

Our final map of the world’s most recent earthquakes, with an informative title.

Just to be clear, here is the entire file required to make this map, assuming you have saved the data file as “earthquake_data.csv”.  That file should be saved in a directory called “datasets”, which lives in the same directory as your program file:

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
magnitudes = []
timestrings = []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))
        timestrings.append(' '.join(line.split(',')[4:6]).strip('"'))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color = 'gray')
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

def get_marker_color(magnitude):
    if magnitude < 3.0:
        return ('go')
    elif magnitude < 5.0:
        return ('yo')
    else:
        return ('ro')

# Variable size dots:
#  go through each lat, lon, plot it individually, calculating size dynamically
#  magnitude 1.0 = min dot size; larger quakes scaled by magnitude
min_marker_size = 4
for lon, lat, mag in zip(lons, lats, magnitudes):
    x,y = map(lon, lat)
    msize = mag * min_marker_size
    marker_string = get_marker_color(mag)
    map.plot(x, y, marker_string, markersize=msize)

title_string = "Earthquakes of Magnitude 1.0 or Greater\n"
title_string += "%s through %s" % (timestrings[-1], timestrings[0])
plt.title(title_string)

plt.show()

To get a sense of how flexible the basemap library is, check out how quickly we can change the look and feel of our map.  Comment out the line that colors the continents, and replace it with a call to bluemarble:

# --- Process Data ---
data_file = open('datasets/earthquake_data.csv')

lats, lons = [], []
magnitudes = []
timestrings = []
for index, line in enumerate(data_file.readlines()):
    if index > 0:
        lats.append(float(line.split(',')[6]))
        lons.append(float(line.split(',')[7]))
        magnitudes.append(float(line.split(',')[8]))
        timestrings.append(' '.join(line.split(',')[4:6]).strip('"'))

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
#map.fillcontinents(color = 'gray')
map.bluemarble()
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

def get_marker_color(magnitude):
    if magnitude < 3.0:
        return ('go')
    elif magnitude < 5.0:
        return ('yo')
    else:
        return ('ro')

# Variable size dots:
#  go through each lat, lon, plot it individually, calculating size dynamically
#  magnitude 1.0 = min dot size; larger quakes scaled by magnitude
min_marker_size = 4
for lon, lat, mag in zip(lons, lats, magnitudes):
    x,y = map(lon, lat)
    msize = mag * min_marker_size
    marker_string = get_marker_color(mag)
    map.plot(x, y, marker_string, markersize=msize)

title_string = "Earthquakes of Magnitude 1.0 or Greater\n"
title_string += "%s through %s" % (timestrings[-1], timestrings[0])
plt.title(title_string)

plt.show()

This is what you should see, although it may take a long time to run depending on your system’s resources:

The same map, with a different look and feel.

The same map, with a different look and feel.

We could continue to refine this map.  For example, we could read our data from the url instead of from a downloaded text file.  That way our map would always be current.  We could determine each dot’s color on a continuous scale of greens, yellows, and reds.  Now that you have a better sense of how to work with basemap, I hope you enjoy playing with these further refinements as you map the datasets you are most interested in.

We won’t be making any more maps in this tutorial, but I will leave you with one more post about some interesting datasets to look at.

Update: Using Python’s csv module to parse the dataset

In this tutorial, I parsed the dataset manually. We read in each line of the datafile as a string, and then split the string at the commas. We had to be aware of the number of the commas in each time stamp, which is not the nicest way of dealing with csv data. As Brandon Rhodes pointed out in a comment, Python’s standard library includes a module for parsing csv files. The following listing produces the same map we just saw, but uses the csv module to parse the data:

# --- Process Data ---
import csv
filename = 'datasets/earthquake_data.csv'

lats, lons = [], []
magnitudes, timestrings = [], []
with open(filename, 'rb') as f:
    reader = csv.reader(f)
    # ignore header row
    reader.next()
    for row in reader:
        lats.append(float(row[4]))
        lons.append(float(row[5]))
        magnitudes.append(float(row[6]))
        timestrings.append(row[3])

# --- Build Map ---
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
              lat_0=0, lon_0=-130)
map.drawcoastlines()
map.drawcountries()
#map.fillcontinents(color = 'gray')
map.bluemarble()
map.drawmapboundary()
map.drawmeridians(np.arange(0, 360, 30))
map.drawparallels(np.arange(-90, 90, 30))

def get_marker_color(magnitude):
    if magnitude < 3.0:
        return ('go')
    elif magnitude < 5.0:
        return ('yo')
    else:
        return ('ro')

# Variable size dots:
#  go through each lat, lon, plot it individually, calculating size dynamically
#  magnitude 1.0 = min dot size; larger quakes scaled by magnitude
min_marker_size = 4
for lon, lat, mag in zip(lons, lats, magnitudes):
    x,y = map(lon, lat)
    msize = mag * min_marker_size
    marker_string = get_marker_color(mag)
    map.plot(x, y, marker_string, markersize=msize)

title_string = "Earthquakes of Magnitude 1.0 or Greater\n"
title_string += "%s through %s" % (timestrings[-1], timestrings[0])
plt.title(title_string)

plt.show()

This looks pretty similar to our manual parsing code, but it is a little more efficient. I had not used Python’s with statement before. This statement ensures that a file is closed properly once it has been read, even if there are errors in processing the file. With our file open, we initialize a csv reader object. The reader object has a next function, which we use to skip over the header row. Then we loop through each row in the data file, and pull out the information we want. You might notice that we don’t have to worry about any string slicing; the csv module figures out which commas are used to separate fields, and which are used in string fields such as the datetime fields.

If you are just playing around with visualization, you may stick with the manual approach to parsing your csv files. But if you want to get better at visualizations, it is well worth your while to become familiar with the csv module. For example, it is possible to use the csv module to read our data directly into a dictionary. In this approach, we could read in the first line to determine the keys for our dictionary, and store the values directly in the dictionary.

Next:  Interesting datasets to explore

Back to Tutorial contents

Advertisements

About ehmatthes

Teacher, hacker, new dad, outdoor guy
This entry was posted in programming and tagged , , , , . Bookmark the permalink.

2 Responses to Matplotlib Basemap Tutorial: Plotting global earthquake activity

  1. Note that Python comes with a built-in “csv” module that knows all of the rules about the way that Excel-compatible CSV files like this use double-quotes to protect commas that would otherwise cause the start of a new field — if you use its “csv.reader” call, then you will not have to do your own fragile splitting and quote-removal. But otherwise, this tutorial is great — thanks so much, I will be using your example to try plotting where on the earth’s surface a satellite is visible!

    • ehmatthes says:

      Thanks for the feedback, Brandon. I am aware of your work in the Python community, so your feedback is particularly encouraging.

      I was not aware of the csv module. I will use this in my next plotting project, and share this improved approach with my students.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s