Vector Data Processing using Python Tools

Encodings, Formats and Libraries

Overview

Teaching: 4 min
Exercises: 0 min
Questions
  • What are common ways to encode vector geospatial data in Python, and how do they relate to broader encoding standards?

Objectives
  • Learn about common community standards (broader than Python) for vector data encoding, and how they’re implemented in core Python libraries.

  • Learn to transform from one encoding type to another. Learn the GeoJSON format and exchange encoding interface, including the __geo_interface__ method implemented across libraries.

Standard Geometry/Data Encodings

Standards exist for text (and binary) encoding of the geometry and feature types we saw. These include OGC/ISO Well-Known Text (WKT) and Well-Known Binary (WKB), OGC Geography Markup Language (GML), and GeoJSON (based on JSON, and now an ISO standard). GML is a complex XML-based encoding that we won’t discuss much here. We also won’t touch on WKB in much detail. In this tutorial we’ll focus on WKT and specially GeoJSON.

Well-Known Text (WKT)

WKT text representations of geometric entities are fairly intuitive. Here are some examples of the basic geometric entities we saw earlier (where coordinates are ordered as X Y):

WKT is specially common in interactions with spatially enabled relational databases, and also as readable representations of internal binary encodings.

GeoJSON and __geo_interface__ interface

GeoJSON arose as a community convention leveraging the ubiquity of JSON encoding, specially on the web. Being JSON, it maps easily into Python dictionaries. See http://geojson.org, including examples at http://geojson.org/geojson-spec.html#examples. Its straightforward data structure has made it popular for data exchange in web applications and for passing data between software libraries. But beware that for large data, GeoJSON, being a text-based file format, is a poor choice for geospatial data storage.

The Python __geo_interface__ method has been widely adopted for passing feature data around between code packages as GeoJSON-like objects (basically, Python dictionaries). We’ll explore this method in the next two episodes. To read more about it, see GeoJSON and the geo interface for Python, https://gist.github.com/sgillies/2217756, and Processing vector features in Python. For converting to and from convenient, Pythonic, GeoJSON-like objects, the light-weight geojson package is a good option.

GeoJSON Example of a point feature collection with two features:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [102.0, 0.5]
      },
      "properties": {
        "prop0": "value0",
        "prop1": 0
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-90.0, 10.5]
      },
      "properties": {
        "prop0": "value3",
        "prop1": 3
      }
    }
  ]
}

GIS file formats, including Relational Databases (RDBMS)

Standard encodings and libraries for projections, codes

Key Points