Author Archives: cmorton222

Link

I was trying to create a figure of a legend (without a plot) and it took me way to long to figure it out. What I wanted was something like this:

legend.png

I initially though this would be really easy using the approach in this stack overflow post

import matplotlib.pyplot as plt

labels = ['Label 1', 'Label 2', 'Label 3', 'Label 4']
colors = []

fig = plt.figure()
fig_legend = plt.figure(figsize=(2, 1.25))
ax = fig.add_subplot(111)
lines = ax.plot(range(2), range(2), range(2), range(2), range(2), range(2), range(2), range(2))
fig_legend.legend(lines, labels, loc='center', frameon=False)
plt.show()

legend.png

This worked well enough for making lines, but what I really wanted were the boxes/patches, like you would get with a bar chart or histogram.  I tried switching to plotting bars:

fig = plt.figure()
fig_legend = plt.figure(figsize=(2, 1.25))
ax = fig.add_subplot(111)
bars = ax.bar(range(4), range(4), color=colors, label=labels)
fig_legend.legend(bars.get_children(), labels, loc='center', frameon=False)

but this resulted in the following “RuntimeError: Can not put single artist in more than one figure“.

I tried as many different combinations of “ax.get_legend_handles_labels()”, “bars.get_children()”, plotting each bar separately, and plotting histograms, but kept getting the same error.

Finally, after far too long, I realized I didn’t need to play anything to get a patch object and could create them manually.  The following code was used to generate the figure at the top.

fig = plt.figure(figsize=(2, 1.25))
patches = [
    mpatches.Patch(color=color, label=label)
    for label, color in zip(labels, colors)]
fig.legend(patches, labels, loc='center', frameon=False)
plt.show()

Adding state outlines to PRISM figures (without using Basemap)

I was showing my earlier PRISM/matplotlib examples to a student and their immediate response was to ask how hard it would be to add state outlines to the map without using Basemap.  A quick Google/stackoverflow search suggested that it shouldn’t be too hard (link, link, link, link), as long as the state outlines were in the same coordinate system/projection as the raster data.

The first thing was to read in the PRISM array and compute the extent from the HDR file in order to set the imshow extent.

import numpy as np

def read_prism_hdr(hdr_path):
    """Read an ESRI BIL HDR file"""
    with open(hdr_path, 'r') as input_f:
    header_list = input_f.readlines()
    return dict(item.strip().split() for item in header_list)

prism_path = 'PRISM_tmean_30yr_normal_4kmM2_annual_bil.bil'
header_dict = read_prism_hdr(prism_path.replace('.bil', '.hdr'))

prism_nodata = float(header_dict['NODATA'])
prism_rows = int(header_dict['NROWS'])
prism_cols = int(header_dict['NCOLS'])

# Assume the cells are square and have the same cellsize in each direction
prism_cs = float(header_dict['XDIM'])
# prism_cs = float(header_dict['YDIM'])

# Build extent as xmin, xmax, ymin, ymax
# ULXMAP and ULYMAP are upper left cell center
prism_extent = [
    float(header_dict['ULXMAP']) - 0.5 * prism_cs,
    float(header_dict['ULXMAP']) - 0.5 * prism_cs + prism_cols * prism_cs,
    float(header_dict['ULYMAP']) + 0.5 * prism_cs - prism_rows * prism_cs,
    float(header_dict['ULYMAP']) + 0.5 * prism_cs
]

# Read in the array
prism_array = np.fromfile(prism_path, dtype=np.float32)
prism_array = prism_array.reshape(prism_rows, prism_cols)
prism_array[prism_array == prism_nodata] = np.nan

My first approach was basically copied from this script.  I used the 1:20,000,000 state shapefile from the US Census Cartographic Boundaries website.   I installed gdal, fiona, descartes, and shapely using the wheel files built by Christoph Gohlke, since none of the modules would install with conda (on my windows computer). I had to use GDAL 1.11.4 instead of 2.0.2, otherwise I got an error when reading the shapefile with fiona.

import os
from collections import defaultdict

from descartes import PolygonPatch
import fiona
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from shapely.geometry import MultiPolygon, Polygon, shape

# Read the polygon geometries from the shapefile
shp_path = 'cb_2014_us_state_20m.shp'
mp = MultiPolygon(
    [shape(pol['geometry']) for pol in fiona.open(shp_path, 'r')])

fig = plt.figure()
ax = fig.add_subplot(111)
ax.imshow(prism_array, cmap='viridis', extent=prism_extent)
patches = [PolygonPatch(p, fc='none', ec='0.1', zorder=1) for p in mp]
ax.add_collection(PatchCollection(patches, match_original=True))
plt.grid()
plt.show()

prism_missing_patches

The problem with this combination of approach and dataset, are that the state features are multi-polygons, but only the first polygon is read if the feature is a multi-polygon.  As you can see, some of the state with more than one polygon are missing (California, Michigan, Florida, etc.).

The following approach will check whether the feature is a polygon or multi-polygon. If it is a multi-polygon, it will read each polygon geometry from the multi-polygon.

geom_list = []
with fiona.open(shp_path, 'r') as cell_f:
    for feature in cell_f:
        ftr_geom = shape(feature['geometry'])

        if ftr_geom.is_empty:
            continue
        elif ftr_geom.geom_type == 'MultiPolygon':
            # Unpack multi-polygons to lists of polygons
            geom_list.extend([g for g in ftr_geom if not g.is_empty])
        elif ftr_geom.geom_type == 'Polygon':
            geom_list.append(ftr_geom)
        else:
            print('Invalid geometry type')
            continue
mp = MultiPolygon(geom_list)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.imshow(prism_array, cmap='viridis', extent=prism_extent)

patches = [PolygonPatch(p, fc='none', ec='0.1', zorder=1) for p in mp]
ax.add_collection(PatchCollection(patches, match_original=True))

plt.grid()
plt.show()

prism_all_patches

I will definitely try this again and see if there is a simpler approach or one that relies on fewer modules.

Reading PRISM BIL arrays without using GDAL

Since it isn’t always easy to get GDAL working with Conda, I decided to try reading the PRISM BIL file using built in Python modules and NumPy.

The first example I found (Plotting USGS BIL DEMs in Python) provided 95% of the steps needed to get this working using the struct module.  I only needed to change the datatype to a float32/short.  I decided to hardcode the width, height, datatype, number of bands, and nodata value for now, but all of these could/should be read from the HDR file.

import struct

prism_path = 'PRISM_tmean_30yr_normal_4kmM2_annual_bil.bil'
prism_cols = 1405
prism_rows = 621
prism_nodata = -9999

bil_f = open(bil_path, 'rb')
bil_data = bil_f.read()
bil_f.close()

# Unpack binary data into a flat tuple z
s = '<%df' % (int(prism_cols * prism_rows),)
z = struct.unpack(s, bil_data)

prism_array = np.zeros((prism_rows, prism_cols), dtype=np.float32)
for r in range(0, prism_rows):
    for c in range(0, prism_cols):
        prism_array[r][c] = float(z[((prism_cols) * r) + c])
        # prism_array.itemset((r, c), float(z[((prism_cols) * r) + c]))

prism_array[prism_array == prism_nodata] = np.nan

I then realized that NumPy should be able to easily (and quickly) read a simple single band binary array file using fromfile().

import numpy as np

prism_path = 'PRISM_tmean_30yr_normal_4kmM2_annual_bil.bil'
prism_cols = 1405
prism_rows = 621
prism_nodata = -9999

prism_array = np.fromfile(bil_path, dtype=np.float32)
prism_array = prism_array.reshape(prism_rows, prism_cols)
prism_array[prism_array == prism_nodata] = np.nan

Might as well read the width, height, and nodata values from the HDF file, since it is avaiable. You could get the datatype also.

import numpy as np

def read_prism_hdr(hdr_path):
    """Read an ESRI BIL HDR file"""
    with open(hdr_path, 'r') as input_f:
        header_list = input_f.readlines()
    return dict(item.strip().split() for item in header_list)

def read_prism_bil(bil_path):
    """Read an array from ESRI BIL raster file"""
    hdr_dict = read_prism_hdr(bil_path.replace('.bil', '.hdr'))
    # For now, only use NROWS, NCOLS, and NODATA
    # Eventually use NBANDS, BYTEORDER, LAYOUT, PIXELTYPE, NBITS

    prism_array = np.fromfile(bil_path, dtype=np.float32)
    prism_array = prism_array.reshape(
        int(hdr_dict['NROWS']), int(hdr_dict['NCOLS']))
    prism_array[prism_array == float(hdr_dict['NODATA'])] = np.nan
    return prism_array

prism_path = 'PRISM_tmean_30yr_normal_4kmM2_annual_bil.bil'
prism_array = read_prism_bil(prism_path)

Plotting PRISM BIL arrays using GDAL & Matplotlib

Now that the PRISM data is being released in a BIL format, I decided I should update my old post on plotting PRISM arrays. First, I tried using ndimage.imread() and matplotlib.imread(), but neither worked with the BIL format. GDAL can handle it though.

prism_tmean_normal

import matplotlib.pyplot as plt
from osgeo import gdal

prism_path = 'PRISM_tmean_30yr_normal_4kmM2_annual_bil.bil'
prism_nodata = -9999

prism_ds = gdal.Open(prism_path)
prism_band = prism_ds.GetRasterBand(1)
prism_array = prism_band.ReadAsArray().astype(np.float32)
# prism_nodata = prism_band.GetNoDataValue()
prism_array[prism_array == prism_nodata] = np.nan
prism_ds = None

plt.imshow(prism_array, cmap='viridis')
plt.show()

I really like the new Viridis colormap!

 

Reversing a matplotlib colormap

I wanted to try to test out the new Matplotlib colormap viridis:

Matplotlib Viridis colormap

Initially, I couldn’t find an easy way to use it since it isn’t in the current version (1.4.3).  I found the underlying data list it in “_cm_listed.py” in the Matplotlib Github listed as “_viridis_data”and built the colormap from the data list:

viridis_cm = LinearSegmentedColormap.from_list('viridis', cm_data)

Here is the figure I was working on with the viridis colormap:
figure2_viridis

Looking at the figure, what I really wanted was the reverse of the colormap. Since I wasn’t using one of the listed Matplotlib colormaps, I couldn’t use the “_r” try to reverse it (i.e. to invert “jet” use “jet_r” instead). I I looked around for awhile for a simple method to reverse the colormap but didn’t find any really helpful stackoverflow posts. Finally, after digging around in the Matplotlib help I noticed the revcmap function. From there it was simply a matter of calling the function on the extracted colormap data.

viridis_r_cm = matplotlib.colors.LinearSegmentedColormap(
    'viridis_r', matplotlib.cm.revcmap(viridis_cm._segmentdata))

The figure still doesn’t look great but it is the reverse:
figure2_viridis_r

Wrap integer values to fixed range

I had a project where I wanted to select a window of days centered on a specific day of year (DOY), but the window had to also be defined by DOY. For example, for DOY 2 (Jan. 2nd), a 7 day window (3 before and 3 after) would go from DOY 364 (Dec. 30th, non-leap years) to DOY 5 (Jan. 5th). This could probably be done using Python datetime objects or some built-in Python module, but I also needed to implement this in JavaScript and on Google Earth Engine also, so I thought it would be fun to come up with a generic algorithm.

It seemed like the modulo operator (%) would be a good place to start. For the range 0-n, x % n for x > 0 will be the correctly wrapped value.

x_list = range(8, 23)
print(x_list)
[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
print([x % 10 for x in x_list])
[8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]

In Python, the modulo operator gets the sign of the output from the divisor. So for negative values the modulo will still correctly wrap the value.

x_list = range(-12,3)
print(x_list)
[-12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2]
print([x % 10 for x in x_list])
[8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]

This doesn’t work in JavaScript though, or any language where the sign of the output comes from the dividend. To show this in Python you can use the to get the sign from the dividend.

import math
x_list = range(-12,3)
print(x_list)
[-12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2]
print([int(math.fmod(x,10)) for x in x_list])
[-2, -1, 0, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1]

The divisor based modulo operator can be implemented from the dividend based one as:
((x % n) + n) % n

import math
x_list = range(-12,3)
print(x_list)
[-12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2]
print([int(math.fmod(math.fmod(x,10) + 10, 10)) for x in x_list])
[8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]

This will wrap values to the range 0-n, but what about the range a-b? For this I decided to just shift the range down to 0-n, wrap the values, then shift them back to a-b.
(x - a) % (b - a + 1) + a

The following Python example will wrap the values to the range 2-7. To do the same thing in Earth Engine or JavaScript just means including the extra operations from above.

a = 2
b = 7
x_list = range(-5,12)
print(x_list)
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
print([((x - a) % (b - a + 1) + a) for x in x_list])
[7, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5]

Binary erosion without using SciPy ndimage

One of the best parts about Python is all of the great add-on modules that allow you to do very complex operations in just a few lines of code.  That being said, I also greatly enjoy when I can’t use a specific module or function and I am forced to really learn an algorithm in order to recreate it.  I recently needed to do a binary erosion of a NumPy array without using the binary_erosion() function in the SciPy ndimage package (the ArcGIS Python install only includes NumPy and matplotlib).

The following is an example of binary erosion using the SciPy ndimage function

>>> import numpy as np
>>> from scipy import ndimage
>>> input_array = np.ones((7,7), dtype=np.int)
>>> input_array[3,0] = 0
>>> input_array[1,3:5] = 0
>>> input_array
array([[1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 0, 0, 1, 1],
       [1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1]])
>>> structure = np.ones((3,3), dtype=np.int)
>>> ndimage.binary_erosion(input_array, structure)
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 0]])

I searched stackoverflow for awhile, but most of the responses were to use ndimage.  Since the algorithm itself is quite simple (set the output pixel to the minimum value of all the pixels in the input pixel’s structure/neighborhood) I decided to start from scratch, instead of borrowing from stackoverflow like I usually do.  Initially, I was going to just iterate over every cell in the input array, select the subset using the structure element as a mask, and then calculate the minimum.  The drawback to this approach is that I would need quite a bit of logic to handle the edge cases because the structure element would also need to be sliced.  Instead, I decided to just pad the input array with zeros so the structure element would fit without modification.

>>> pad_array = np.zeros((9,9), dtype=np.int)
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]])
>>> pad_array[1:-1,1:-1] = input_array
>>> pad_array
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 0, 0, 1, 1, 0],
       [0, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]])

Padding with zeros also made it trivial to propagate the edge cells into the array like occurs with ndimage.binary_erosion().  One potential downside of this approach is that I am creating an additional copy of the input array in order to pad it.

Below is the numpy binary erosion function. I am sure that this is a very inefficient approach, but it worked!

def np_binary_erosion(input_array, structure=np.ones((3,3)).astype(np.bool)):
    '''NumPy binary erosion function
    
    No error checking on input array (type)
    No error checking on structure element (# of dimensions, shape, type, etc.)
    
    Args:
    input_array: Binary NumPy array to be eroded. Non-zero (True) elements
        form the subset to be eroded
    structure: Structuring element used for the erosion. Non-zero elements
        are considered True. If no structuring element is provided, an
        element is generated with a square connectivity equal to two 
        (square, not cross).
    Returns:
        binary_erosion: Erosion of the input by the stucturing element
    '''
    rows, cols = input_array.shape
    
    ## Pad output array (binary_erosion) with extra cells around the edge
    ## so that structuring element will fit without wrapping.
    ## A 3x3 structure, will need 1 additional cell around the edge
    ## A 5x5 structure, will need 2 additional cells around the edge
    pad_shape = (
        input_shape[0] + structure_shape[0] - 1, 
        input_shape[1] + structure.shape[1] - 1)
    input_pad_array = np.zeros(pad_shape).astype(np.bool)
    input_pad_array[1:rows+1,1:cols+1] = input_array
    binary_erosion = np.zeros(pad_shape).astype(np.bool)
    
    ## Cast structure element to boolean
    struc_mask = structure.astype(np.bool)
    
    ## Iterate over each cell
    for row in xrange(rows):
        for col in xrange(cols):
            ## The value of the output pixel is the minimum value of all the
            ## pixels in the input pixel's neighborhood.
            binary_erosion[row+1,col+1] = np.min(
                input_pad_array[row:row+3, col:col+3][struc_mask])
    return binary_erosion[1:rows+1,1:cols+1]

And the result using the function

>>> np_binary_erosion(input_array, structure).astype(np.int)
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 0]])