OverviewTeaching: 30 min Exercises: 0 minQuestions
What additional capabilities does GeoPandas provide, including data access, plotting and analysis?
How does it integrate with other common Python tools?
How do GeoPandas data objects integrate with analyses of raster data over vector geospatial features?Objectives
Explore additional GeoPandas capabilities in reading from PostGIS and using its plot method.
Learn how to dissolve (aggregate) polygons into larger units, and apply spatial joins across GeoDataFrames, as examples of GeoPandas spatial operators.
Learn how to explore project (CRS) information and reproject.
Plot choropleth maps
Using folium to create interactive maps from a GeoDataFrame
Explore simple but powerful capabilities offered by the rasterstats package to generate summaries and statistics of raster properties over vector features, and explore these via GeoPandas.
We covered the basics of GeoPandas in the previous episode and notebook. Here, we’ll extend that introduction to illustrate additional aspects of GeoPandas and its interactions with other Python libraries, covering fancier mapping, analysis (unitary and binary spatial operators), raster zonal stats + GeoPandas.
Here are the main sections in this episode / notebook:
choroplethmap based on calculated watershed areas
sjoin, of polygons on points
We’ll use these throughout the rest of the tutorial. Note that we’re not using the latest and greatest
Matplotlib, 2.x. Some package dependency is forcing the use of
('1.5.3', '0.20.3', '0.3.0')
Read HydroBASINS “all-levels” (lev00) hierarchical watersheds dataset for the Pacific NW, from Amazon Cloud PostgreSQL/PostGIS database. Watersheds in the dataset are at the finest (highest resolution) “Pfastetter” hierarchical level, level 12.
read_postgis is called as before, except now we’ll apply a SQL filter (server side) to the PostGIS dataset to select only the Pfastetter level-4 watershed with code 7831:
WHERE pfaf_4 = 7831. This is most of Western Washington. Watershed polygons will still be read at their original level 12 resolution.
413 polygon features returned. Let’s examine the attributes for the first feature.
gid 19945 hybas_id 7.00001e+09 next_down 0 next_sink 7.00001e+09 main_bas 7.00001e+09 dist_sink 0 dist_main 0 sub_area 135.4 up_area 135.4 endo 0 coast 0 order 1 sort 19945 pfaf_1 7 pfaf_2 78 pfaf_3 783 pfaf_4 7831 pfaf_5 78310 pfaf_6 783101 pfaf_7 7831010 pfaf_8 78310101 pfaf_9 783101010 pfaf_10 7.83101e+09 pfaf_11 7.83101e+10 pfaf_12 7.83101e+11 polygongeom (POLYGON ((-123.35 46.36666666666669, -123.349... Name: 0, dtype: object
Plot a categorical map with coloring based on the aggregating column
pfaf_7. Watershed boundaries are at the high-resolution Pfastetter level 12.
Apply GeoDataFrame dissolve aggregation method (implemented from lower level
shapely operators) on level-7 Pfastetter codes (
pfaf_7) shown in the plot above. Aggregate attributes, retaining only
polygongeom, of course). This operation results in only 17 polygons, from the original 413.
Let’s examine some of the features.
|0||7831010||(POLYGON ((-123.5722222222222 46.2458333333333...||783101|
|1||7831020||POLYGON ((-123.1791666666666 46.33333333333336...||783102|
|2||7831031||(POLYGON ((-123.9597222222222 46.9666666666667...||783103|
|3||7831032||POLYGON ((-123.8583333333333 47.39583333333336...||783103|
|4||7831033||POLYGON ((-124.3 47.34583333333336, -124.30221...||783103|
Plot the results. Looks like the previous plot, except the polygon boundaries are now the pfaf_7 watersheds (same as the background colors).
Beware of “invalid” geometries!
dissolvemay fail if there are “invalid” geometries. This code is based on a GeoDataFrame examined in the previous, intro notebook. The 6 geometries/points reported are invalid (and are reported by the
is_valid()method). This dissolve statement does work, though.
seas_grp = seas[['oceans', 'geometry']] seas_oceans_diss = seas_grp[seas_grp.geometry.is_valid].dissolve(by='oceans') Ring Self-intersection at or near point 10.407218181818182 54.821390909090908 Self-intersection at or near point -79.365827272727287 76.296645454545455 Ring Self-intersection at or near point 10.979445510225332 54.380555030408686 Ring Self-intersection at or near point 133.61550925464189 -4.3005540903175188 Ring Self-intersection at or near point 121.91067196634913 -5.0593090510592447 Ring Self-intersection at or near point 115.29553592754269 -7.0082630551828515
Partly so we can calculate polygon areas in linear units, not geodetic degrees. But also because that’s the projection used by most state and local governments in Washington.
No need to go to a web site to learn more about what
epsg:2927 is. Use
pyepsg, which issues queries to http://epsg.io web services.
Extract the epsg code from the string returned by
crs['init'], then query epsg
<GeodeticCRS: 4326, WGS 84>
<ProjectedCRS: 2927, NAD83(HARN) / Washington South (ftUS)>
Apply the crs transformation (reprojection) using
Plot the reprojected map. Note that, being in a planar project (not geodetic), the shape looks different compared to the previous map. More “normal”. And the axes are now in
feet relative to some origin.
choroplethmap based on calculated watershed areas
As the projection is in
feet, auto-calculated polygon areas will be in feet2. So let’s convert to miles2 first (why not!). We’ll add a new column to the GeoDataFrame.
|0||7831010||(POLYGON ((863343.6925921033 347890.2242189167...||783101||1375.137396|
|1||7831020||POLYGON ((963803.8027083561 376154.9965688475,...||783102||2107.945774|
|2||7831031||(POLYGON ((776917.1877152125 614568.383332147,...||783103||528.472846|
Now you could get the area of a pfaf_6 watershed via simple Pandas DataFrame
Plot the choloropleth, using
The “fisher_jenks” value segmentation
scheme (using 7 segments, k=7) used is one of the available
pysal.esda.mapclassify.Map_Classifier classifiers from the powerful PySAL package (Python Spatial Analysis Library); GeoPandas can use these classifiers if PySAL is installed, as it is here. To get the list of classifiers, use:
Folium is very cool, specially for use in Jupyter notebooks; or to export into stand-alone HTML.
m.choropleth internally splits the geometry from the other attributes in
hydrobas_ww_p7_wasp, and rejoins them based on the key
key_on uses an attribute reference derived from GeoJSON representations; this is awkward, and hopefully will be simplified in future folium implementations.
This map is interactive, so play with it (zoom and pan). There is a lot more to explore in Folium! This is just a teaser.
sjoin, of polygons on points
We’ll use an old, local snapshot of NANOOS coastal and marine monitoring stations in the Pacific NW, from the NANOOS Visualization System (NVS) Data Explorer. While many stations are moorings on marine waters, some are inshore or in tidal shores and will overlap the watershed boundaries. The point file is in the GeoPackage format, an OGC format implemented in SQLite.
platform_label WADOH_HendrsnBay1 platform_type Fixed Shore Platform name Henderson Bay site, W shore short_name WADOH Henderson Bay lat 47.3612 lon -122.697 state Washington region Puget Sound provider WADOH provider_url http://www.doh.wa.gov provider_type State data_source Decagon data_source_url http://www.decagon.com url http://www.doh.wa.gov/CommunityandEnvironment/... status online status_date 2016-06-24T18:00:00Z nvs_station_url http://nvs.nanoos.org/Explorer?action=oiw:fixe... geometry POINT (-122.6966 47.3612) Name: 193, dtype: object
Points are on the coasts of the Pacific NW (BC, WA, OR) and out in the open ocean.
Apply “inner” spatial join with
sjoin operator from the
gpd.tools package. An inner join will retain only overlapping features. Then plot as a map overlay on top of
hydrobas_ww_p7, categorizing (coloring) each point by the
pfaf_6 watershed it’s in.
We’ll end by mixing features from a GeoDataFrame with a raster, applying zonal statistics using the cool and light weight rasterstats package.
Monthly Juy long-term climatology precipitation. The original monthly time series data are from the PRISM Climate Group; the monthly climatology and Pacific NW clip were created by your truly and Don Setiawan for the BiGCZ project.
Rasterstas uses rasterio to read rasters (and
fiona to read vector datasets), so we’ll first do a quick exploration of rasterio.
<open RasterReader name='../data/prism_precipitation_july_climatology.tif' mode='r'>
Examine the metadata read from the raster file (we can confirm CRS is epsg:4326), then plot the raster.
with_bounds=Trueis supposed to display the X & Y coordinate labels (lat & lon) from the metadata. We don’t know why this isn’t working.
rasterstats package. Can pass a
GeoDataFrame directly (instead of the file path to a GIS file) because it implements our old friend, the
__geo_interface__ method. For the raster, we pass its file path.
zonal_stats returns a geojson with the original properties plus the zonal statistics.
rasterstats requires the geometry column to be named “geometry”. So, we’ll rename it first.
|0||(POLYGON ((-123.5722222222222 46.2458333333333...||783101||7831010||259||49.824776||33.674382||23.438837|
|1||POLYGON ((-123.1791666666666 46.33333333333336...||783102||7831020||404||96.738289||29.443886||15.356521|
While still a fairly young package, GeoPandas offers powerful capabilities.
GeoJSON and the common
__geo_interface__enable convenient and widespread geospatial data object exchange across geospatial packages in Python.