OverviewTeaching: 40 min Exercises: 0 minQuestions
What is GeoPandas?
What functionality and advantages does GeoPandas offer over other Python geospatial tools?
What geospatial storage, analytical and plotting capabilities does it include?
What is its relationship to Pandas?Objectives
Learn to use GeoPandas by reading from common vector geospatial formats (shape files, GeoJSON, etc), PostGIS databases, and from geospatial data generated on the fly.
Learn to leverage Pandas functionality in GeoPandas, for effective, mixed attribute-based and geospatial analyses.
Learn about common geospatial operations, including reprojection, spatial joins, and overlays.
Emilio Mayorga, University of Washington. 2018-9-9
GeoPandas adds a spatial geometry data type to
Pandas and enables spatial operations on these types, using shapely. GeoPandas leverages Pandas together with several core open source geospatial packages and practices to provide a uniquely simple and convenient framework for handling geospatial feature data, operating on both geometries and attributes jointly, and as with Pandas, largely eliminating the need to iterate over features (rows). Also as with Pandas, it adds a very convenient and fine-tuned plotting method, and read/write methods that handle multiple file and “serialization” formats.
shapely, these spatial data types are limited to discrete entities/features and do not address continuously varying rasters or fields.
- While GeoPandas spatial objects can be assigned a Coordinate Reference System (
CRS), operations can not be performed across CRS’s. Plus, geodetic (“unprojected”, lat-lon) CRS are not handled in a special way; the area of a geodetic polygon will be in degrees.
GeoPandas builds on mature, stable and widely used packages (Pandas, shapely, etc). It is being supported more and more as a preferred Python data structure for geospatial vector data.
When should you use GeoPandas?
When it may not be the best tool?
We’ll use these throughout the rest of the tutorial.
('2.2.3', '0.23.4', '0.4.0')
Like a Pandas
GeoSeries is the building block for the more broadly useful and powerful
GeoDataFrame that we’ll focus on in this tutorial. Here we’ll first take a bit of time to examine a
GeoSeries is made up of an index and a GeoPandas
geometry data type. This data type is a shapely.geometry object, and therefore inherits their attributes and methods such as
GeoPandas has six classes of geometric objects, corresponding to the three basic single-entity geometric types and their associated homogeneous collections of multiple entities:
GeoSeries is then a list of geometry objects and their associated index values.
Entries (rows) in a GeoSeries can store different geometry types
GeoPandas does not constrain the geometry column to be of the same geometry type. This can lead to unexpected problems if you’re not careful! Specially if you’re used to thinking of a GIS file format like shape files, which store a single geometry type. Also beware that certain export operations (say, to shape files …) will fail if the list of geometry objects is heterogeneous.
But enough theory! Let’s get our hands dirty (so to speak) with code. We’ll start by illustrating how GeoSeries are constructured.
GeoSeriesfrom a list of
shapely Pointobjects constructed directly from
Though you will rarely need this raw approach!
0 POINT (1 2) 1 POINT (1.5 2.5) 2 POINT (2 3) dtype: object
GeoSeriesfrom a list of
shapely Pointobjects using the
0 POINT (-120 45) 1 POINT (-121.2 46) 2 POINT (-122.9 47.5) dtype: object
A GeoSeries (and a GeoDataFrame) can store a CRS implicitly associated with the geometry column. This is useful as essential spatial metadata and for transformation (reprojection) to another CRS. Let’s assign the CRS.
plot method accepts standard
matplotlib.pyplot style options, and can be tweaked like any other
Let’s get a bit fancier, as a stepping stone to GeoDataFrames. First, we’ll define a simple dictionary of lists, that we’ll use again later.
Note this convenient, compact approach to create a list of
Point shapely objects out of X & Y coordinate lists (an alternate approach is shown in the Advanced notebook):
[<shapely.geometry.point.Point at 0x7fb9438989e8>, <shapely.geometry.point.Point at 0x7fb9438b46d8>, <shapely.geometry.point.Point at 0x7fb9438b4e10>]
We’ll wrap up by creating a GeoSeries where we explicitly define the index values.
a POINT (-120 45) b POINT (-121.2 46) c POINT (-122.9 47.5) dtype: object
- It’s worth noting that a GeoDataFrame can be described as a Feature Collection, where each row is a Feature, a geometry column is defined (thought the name of the column doesn’t have to be “geometry”), and the attribute Properties includes the other columns (the Pandas DataFrame part, if you will).
- More than one column can store geometry objects! We won’t explore this capability in this tutorial.
We’ll build on the GeoSeries examples. Let’s reuse the
data dictionary we defined earlier, this time to create a DataFrame.
Now we use the DataFrame and the “list-of-shapely-Point-objects” approach to create a GeoDataFrame. Note the use of two DataFrame attribute columns, which are effectively just two simple Pandas Series.
There’s nothing new to visualize, but this time we’re using the
plot method from a GeoDataFrame, not from a GeoSeries. They’re not exactly the same thing under the hood.
gpd.read_file is the workhorse for reading GIS files. It leverages the fiona package.
|0||S.Atlantic||1||South Atlantic Ocean||POLYGON ((-67.26025728926088 -59.9309210526315...|
|1||N.Pacific||0||North Pacific Ocean||(POLYGON ((180 66.27034771241199, 180 0, 101.1...|
|2||Southern||3||Southern Ocean||POLYGON ((180 -60, 180 -90, -180 -90, -180 -60...|
|3||Arctic||2||Arctic Ocean||POLYGON ((-100.1196521436255 52.89103112710165...|
|4||Indian||5||Indian Ocean||POLYGON ((19.69705552221351 -59.94160091330382...|
crs was read from the shape file’s
Now we finally plot a real map (or blobs, depending on your aesthetics), from a dataset that’s global-scale and stored in “geographic” (latitude & longitude) coordinates. It’s not the actual ocean shapes defined by coastal boundaries, but bear with me. A colormap has been applied to distinguish the different Oceans.
oceans.shp stores both
Multi-Polygon geometry types (a
Polygon may also be viewed as a
Multi-Polygon with 1 member). We can get at the geometry types and other geometry properties easily.
0 Polygon 1 MultiPolygon 2 Polygon 3 Polygon 4 Polygon 5 MultiPolygon 6 Polygon dtype: object
0 5287.751094 1 11805.894558 2 10822.509589 3 9578.786157 4 9047.879388 5 9640.457926 6 8616.721287 dtype: float64
envelope method returns the bounding box for each polygon. This could be used to create a new spatial column or GeoSeries; directly for plotting; etc.
Does it seem weird that some envelope bounding boxes, such as the North Pacific Ocean, span all longitudes? That’s because they’re Multi-Polygons with edges at the ends of the -180 and +180 degree coordinate range.
“Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.” A subset comes bundled with GeoPandas and is accessible from the
gpd.datasets module. We’ll use it as a helpful global base layer map.
|0||28400000.0||Asia||Afghanistan||AFG||22270.0||POLYGON ((61.21081709172574 35.65007233330923,...|
|1||12799293.0||Africa||Angola||AGO||110300.0||(POLYGON ((16.32652835456705 -5.87747039146621...|
Its CRS is also EPSG:4326:
Here’s a compact, quick way of using the GeoDataFrame plot method to overlay two GeoDataFrames while customizing the styles for each layer.
We can also compose the plot using conventional
matplotlib steps and options that give us more control.
Time to explore
Let’s stop for a bit to explore on your own, hack with your neighbors, ask questions.
The fact that it’s on an Amazon Cloud is irrelevant. The approach is independent of the location of the database server; it could be on your computer.
First we’ll read the database connection information from a hidden JSON file, to add a level of security and not expose all that information on the github GeoHackWeek repository. This is also a good practice for handling sensitive information.
Open the database connection, returning a connection object:
Now that we’ve used the connection information, we’ll overwrite the
password keys (for security) and print out the dictionary, to give you a look at what needs to be in it:
Finally, the magic: Read in the
world_seas PostGIS dataset (a spatially enabled table in the PostgreSQL database) into a GeoDataFrame, using the opened connection object. Note the use of a simple SQL query string:
select * from world_seas
Close the connection. Clean up after yourself.
Let’s take a look at the GeoDataFrame.
|0||1||Inner Seas off the West Coast of Scotland||18||4283||False||North Atlantic Ocean||(POLYGON ((-6.496945454545455 58.0874909090909...|
|1||2||Mediterranean Sea - Western Basin||28A||4279||False||North Atlantic Ocean||(POLYGON ((12.4308 37.80325454545454, 12.41498...|
|2||3||Mediterranean Sea - Eastern Basin||28B||4280||False||North Atlantic Ocean||(POLYGON ((23.60853636363636 35.60874545454546...|
|3||4||Sea of Marmara||29||3369||False||North Atlantic Ocean||(POLYGON ((26.21790909090909 40.05290909090909...|
|4||5||Black Sea||30||3319||False||North Atlantic Ocean||(POLYGON ((29.04846363636364 41.25555454545454...|
Color the layer based on one column that aggregates individual polygons; using a categorical map, as before, but explicitly selecting the column (
column='oceans') and categorical mapping (
categorical=True); displaying an auto-generated legend, while displaying all polygon boundaries. Each “oceans” entry (ocean basins, actually) contain one or more ‘seas’.
Additional plotting examples
See http://darribas.org/gds17/labs/Lab_02.html for great examples of lots of other cool GeoPandas map plotting tips.
Combine what we’ve learned. A map overlay, using
world as a background layer, and filtering
seas based on an attribute value (from
oceans column) and an auto-derived GeoPandas geometry attribute (
world is in gray, while the filtered
seas is in color.
We won’t go into all details about what’s going on. Suffice it to say that we issue an OGC WFS request for all features from the layer named “oa:goainv” found in a GeoServer instance from NANOOS, requesting the response in
GeoJSON format. Then we use the geojson package to “load” the raw response (a GeoJSON string) into a
geojson feature object (a dictionary-like object).
The “oa:goainv” layer is a global dataset of monitoring sites and cruises where data relevant to ocean acidification are collected. It’s a work in progress from the Global Ocean Acidification Observation Network (GOA-ON); for additional information see the GOA-ON Data Portal.
Let’s examine the general characteristics of this GeoJSON object, including its
__geo_interface__ interface, which we discussed earlier.
<class 'geojson.feature.FeatureCollection'> dict_keys(['type', 'totalFeatures', 'crs', 'features']) 572
Now use the
from_features constructor method to create a GeoDataFrame directly from the
Finally, let’s visualize the data set as a simple map overlay plot; and as an example, display the values for the last feature.
Oceans Indian Ocean additional_organizations agency city comments comments_about_overlaps contact_email contact_name http://www.go-ship.org/Contacts.html country data_url https://cchdo.ucsd.edu/cruise/33RO20180423 department deploy_date depth_range duration frequency geometry POINT (54.5 -30) id 626 latitude -30 line_xy 54.5,-30.0::54.4975,-29.494::54.4988,-28.9747:... location longitude 54.5 method method_documentation organization Global Ocean Ship-Based Hydrographic Investiga... organization_abbreviation GO-SHIP overlaps_with parameters dissolved inorganic carbon; total alkalinity; ... parameters_planned platform_name I07N platform_name_kml 626.kml platform_type RH point_xy 54.5,-30.0 project GO-SHIP seas Indian Ocean sensors source_doc RepeatHydrography track_pt_lat -30.0 track_pt_lon 54.5 type url https://cchdo.ucsd.edu/cruise/33RO20180423 Name: 571, dtype: object
Time to Explore
Let’s stop for a bit to explore on your own, hack with your neighbors, ask questions. Then we’ll transition to the next notebook, on more advanced topics.
Leveraging Pandas and core vector geospatial libraries, Python GeoPandas greatly simplifies the use of vector geospatial data and provides powerful capabilities.