Visualizing a building shapes taxonomy - The City Summit 🏙️🗻 project

21 December 2024 - 19 mins read time
Tags: City Summit Overture Maps Python Geospatial Data Visualization

Final visualisation for 20 different cities around the world.

Back story

While working on new features for the OvertureMaestro library, I started to explore the dataset of buildings that is provided by Overture Maps. Remembering an old project focusing on assembling multiple images of women of different ethnicities (see here) the question came to mind - What would the “average” shape of a building in a given city look like?

So, as is my habit, I started on a side project instead of finishing the functionality I was currently working on.

The results of this project can be accessed on Streamlit: https://city-summit.streamlit.app/

You can also run the code locally using this Jupyter notebook.

Getting the data

Downloading the Overture Maps data for the buildings is quite easy. I have used the OvertureMaestro library, but you can also access it with official overturemaps-py library as well as using DuckDB SQL query.

from overturemaestro import convert_geometry_to_geodataframe, geocode_to_geometry

buildings = convert_geometry_to_geodataframe(
    "buildings", "building", geometry_filter=geocode_to_geometry("Paris")
)

This code will download the data locally as a GeoParquet and return a GeoDataFrame for the user. You can also get the path to the file instead using convert_geometry_to_parquet function.

First tests

To begin with, I knew I had to solve two problems: how to centre all the buildings so that they were in the same space, and how to rotate them in a similar orientation so that the edges of the buildings go mainly vertical and horizontal.

Translating building vertices

For the first part, I have used the Azimuthal Equidistant projection, that maintains the distances and azimuths. Each building was projected based on their centroid.

Example of the Azimuthal Equidistant projection (from Wikipedia).

# Geometry projection to AEQD
import pyproj as proj
from shapely.ops import transform
from shapely import Polygon

# Example building
g = Polygon(
    [
        [17.0251207, 51.0607852],
        [17.0250685, 51.0607947],
        [17.0250616, 51.0607808],
        [17.025115, 51.060771],
        [17.0251207, 51.0607852],
    ]
)

centroid = g.centroid
crs_wgs = proj.Proj("epsg:4326")
aeqd_proj = proj.Proj(
    f"+proj=aeqd +lat_0={centroid.y} +lon_0={centroid.x} +datum=WGS84 +units=m"
)
project = proj.Transformer.from_proj(crs_wgs, aeqd_proj, always_xy=True).transform

geom_proj = transform(project, g)

geom_proj # notice that values are close to zero - above and below, these are in meters
# POLYGON ((2.0498206612070975 0.2566105251930882, -1.6097078406850456 1.3134800558941384, -2.093439190884303 -0.2328869958116028, 1.6502174436552757 -1.3231316731725968, 2.0498206612070975 0.2566105251930882))

Finding the best building rotation angle

The second part was trickier. How can I determine when the building is “straight” or “right”?

My first idea was to rotate the building for each angle individually from 0 to 89 and for each angle calculate the azimuths of edges, weighted by their length and create some kind of score to determine the best rotation angle.

After thinking things through, and after reviewing the Shapely library documentation, I found a simpler solution.

Using the minimum_rotated_rectangle function, I can get a minimum rectangle surrounding a given building, a de facto rotated bounding box. What if I just rotate this rectangle so that it is equal to the real bounding box? The idea was that this configuration should keep the footprint of a building the smallest and most edges should be in horizontal and vertical orientations.

The difference between original and rotated building with bounding box and a minimum rotated rectangle.

To get the right angle, I iterated all the points of the resulting rectangle in turn, counted the azimuth between two points and rotated the shape by the negative value obtained. But in this way I will obtain 4 angles, which will be two pairs in the horizontal and vertical axis. How do I choose one angle from these 4? I decided that I wanted the rotated building to be longer than taller (width >= height), and for the centroid to be in the middle, or to be above half the height of the rectangle.

import math
from shapely import affinity

# Rotation logic
coords = list(geometry.minimum_rotated_rectangle.exterior.coords)

for i in range(len(coords) - 1):
    p1 = coords[i]
    p2 = coords[i + 1]
    angle = math.degrees(
        math.atan2(p2[1] - p1[1], p2[0] - p1[0])
    )  # https://stackoverflow.com/questions/42258637/how-to-know-the-angle-between-two-points
    rotated_geometry = affinity.rotate(geometry, angle=-angle, origin="centroid")

    minx, miny, maxx, maxy = rotated_geometry.bounds
    width = maxx - minx
    height = maxy - miny

    # Building is higher than it is wider
    if round(height, 2) > round(width, 2):
        continue

    # The centroid is in the lower half of the rectangle
    if abs(round(maxy, 2)) > abs(round(miny, 2)):
        continue

    return rotated_geometry

raise RuntimeError("Rotation not found")

Now that I think about it, it would be possible to choose the right angle of rotation without rotating the building, based on the length and angle between the diagonals, but I would rather stay with the current implementation.

Basic visualizations

To start, I used the plot() function from the GeoPandas library displaying only the edges of the buildings.

All building shapes outlines visualized.

To make these less cluttered, I divided the set of buildings into small and large based on the total area. I also added a slight background colouring for the buildings.

I have also experimented with leaving only vertices, without edges.

Bigger buildings vertices only visualization.

Switch to the heightmap

For me, those previous results weren’t satisfactory, so I started experimenting with the rasterio library and the rasterize functionality for switching vector data into raster data.

import numpy as np
from affine import Affine
from rasterio.features import MergeAlg, rasterize

# 1 px will equal to 1 meter,
# higher values will increase number of pixels per meter
resolution = 1

minx, miny, maxx, maxy = geoseries.total_bounds

canvas_width = int(np.ceil(maxx - minx)) * resolution
canvas_height = int(np.ceil(maxy - miny)) * resolution

canvas = rasterize(
    shapes=geoseries,
    fill=0,
    # Add padding of 2 pixels on each side
    out_shape=(canvas_height + 4, canvas_width + 4),
    # Add values together from many shapes
    merge_alg=MergeAlg.add,
    transform=(
        # Add 2 pixels to move away from edge
        Affine.translation(xoff=minx - 2, yoff=miny - 2)
        # Scale buildings to fit bigger canvas
        * Affine.scale(1 / resolution)
    ),
)

heightmap = np.flipud(canvas) # flip canvas vertically

Plotting the heightmap

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LogNorm

fig, axes = plt.subplots(1, 2, figsize=(20, 10), dpi=300)

heightmap_masked = np.ma.masked_where(heightmap == 0, heightmap)

axes[0].imshow(heightmap_masked)
axes[0].set_axis_off()
axes[0].set_title("Linear colour scale")

axes[1].imshow(heightmap_masked, norm=LogNorm())
axes[1].set_axis_off()
axes[1].set_title("Log colour scale")

plt.show()

Generated heightmap in linear and logarithmic colour scales.

As you can see, the centre of the heightmap is dominated by extremely high values. This makes sense; after all, most buildings in cities have a fairly small footprint. To discover more detail, it is useful to display the values on a logarithmic scale.

3D visualization

Having a heightmap, I began to wonder what it would look like in 3D. Using the Surface plot function from the Plotly library, I was able to generate an interactive visualisation for the calculated data.

Loading the palette

The next step to improve the look was the ability to change the colour palette. For this I used the PyPalettes library with which you can easily access thousands of palettes from various sources.

from pypalettes import load_cmap
import numpy as np

loaded_cm = load_cmap("ag_Sunset", cmap_type="continuous")
# resample to 256 values
interpolated_colours = loaded_cm(np.linspace(0, 1, 256))

# replace first colour with white or black
white = np.array([1, 1, 1, 1])
interpolated_colours[:1, :] = white

# transform the palette to a plotly-compatible format:
# a list of rgb(a) css values
plotly_map = list(
    map(
        lambda c: f"rgb({int(c[0] * 255)}, {int(c[1] * 255)}, {int(c[2] * 255)})",
        interpolated_colours[:, :3]
    )
)

Screenshot from the <a href='https://python-graph-gallery.com/color-palette-finder/' target='_blank'>Python Color Palette Finder</a> website. — Screenshot from the Python Color Palette Finder website.

Polishing the visualization

To make the plot prettier I’ve done these additional steps:

Applied logarithm on the heightmap
Removed the axes
Applied the new palette

import numpy as np
import plotly.graph_objects as go


# Calculate the ratio to keep proper aspect
y_ratio = heightmap.shape[0] / heightmap.shape[1]

# Applying logarithm on 0 will return -inf
# Replace these values with -1
with np.errstate(divide="ignore"):
    z = np.log(heightmap)
    z[np.isneginf(z)] = -1

# Camera position
x_eye, y_eye, z_eye = 0.8, 0.8, 1.5 

fig = go.Figure(data=[go.Surface(z=z, colorscale=plotly_map)])
fig.update_traces(
    showscale=False,
    contours_z=dict(show=True, usecolormap=True, project_z=False),
)
fig.update_scenes(aspectratio=dict(x=1, z=1, y=y_ratio))
fig.update_layout(
    title=dict(text=f"<br>Paris", font=dict(size=20)),
    autosize=True,
    scene_camera_eye=dict(x=x_eye, y=y_eye, z=z_eye),
    margin=dict(l=5, r=5, b=5, t=5),
    scene=dict(
        xaxis=dict(visible=False),
        yaxis=dict(visible=False),
        # Clip the values to be above the slightly above -1
        zaxis=dict(visible=False, range=[-0.99, None]),
    ),
)

Just for comparison, I have also created a visualization for buildings without rotating them.

Summit with buildings left in their original orientation.

Streamlit implementation

I wanted to share this project on the internet, but just showing the visualizations or animations wouldn’t cut it for me. In my open-source endevours I’m strongly advocating for exploring the areas you personally know, so I wanted to enbale everyone to easily select their home city and create the summit on the spot without any hassle.

I have never deployed any Python app on the internet, but I heard about Streamlit and Gradio. After quick search on the internet, I decided to give Streamlit a shot.

Creating the app

I was pleasantly surprised by how easy it is to create a Streamlit application deployed in the Community Cloud. You log int with your GitHub account, select a template (I went with the Blank one), choose a name / domain and everything is created for you. Then you can start with coding using GitHub Codespaces right away. It was also my first time using the Codespaces and I must admit that it is a very convenient process (though I am an avid VS Code user).

Coding experience

Streamlit documentation is really well maintained with interactive examples. I was able to quickly draft a conditional form for the user with state caching and values validation (mainly Nominatim geocoding errors or limiting the max area of interest).

City Summit parameters form with values validation.

Porting the code from my notebook required some changes, mainly to the tracking functions that displayed progress - I was using a track function from rich.progress, but had to wrap it with manual st.progress calls in every loop iteration.

One of the limiting factors was resources. It’s not oficially mentioned in the documentation (or I didn’t find it), but I have found an answer on Streamlit Discuss forum that a Community Cloud app has up to 2 CPUs, 1 GB of memory and 50 GB of storage. I had to completely remove multiprocessing from the app to avoid code throttling, because Streamlit displays errors if resources usage is maxed out. I even had to make a quick patch to the OvertureMaestro library to allow passing a max number of multiprocessing workers.

To speed up the calculations, I opted to replace geometry projection from Azimuthal Equidistant to Universal Transverse Mercator (UTM) Coordinate Reference System (CRS), because it can be applied on the whole dataset of buildings at once and the difference in precision isn’t that important for this project.

Additionally, to lower the memory usage, I’ve implemented batch processing for downloaded buildings data. OvertureMaestro is downloading Overture Maps data into a GeoParquet file. Instead of loading it all to a GeoDataFrame with read_parquet function, I’m iterating over Arrow batches with PyArrow library to keep the memory usage low.

City Summit result for the city of London.

Overall I was happy with how fast I was able to push out a finalized app to the public. Automatic theme adjustment for the Plotly chart with st.plotly_chart was also really handy. All required calculations are saved on disk, so consequent runs do not have to redo all the steps (until the app is stopped because of no activity).

I’ve also included the option to download the generated heightmap as a numpy array if anyone wants to experiment with it.

The code is publicly available on the City Summit 🏙️🗻 GitHub repository.

Summary

What I learned

The main thing I learned during this project was how to write apps with Streamlit.

The data wrangling part (getting all the building shapes into a common space and finding the proper rotation) was a nice mind-puzzle to figure out, but wasn’t as hard to achieve with currently available tools in Python.

For the visualization part, I initially tried to use Matplotlib 3D surface plot functions, but the results weren’t satisfactory, so I switched to Plotly.

As an additional feature, I was thinking about transforming the generated heightmap into an STL file for 3D printing purposes if anyone wishes to do that. I’ve checked some sources on how to transform heightmaps into 3D objects, but decided not to pursue this path for now.

In the end, I am satisfied with the visualisations generated, and with the interactive application I deployed on Streamlit.

Used libraries

OvertureMaestro - for downloading Overture Maps data for a given city and for geocoding the string to a geometry.
GeoPandas - for geospatial operations on a dataset of downloaded buildings and I/O.
Shapely - for rotations and translations of a building geometry.
Rasterio - for rastering the buildings dataset into a heightmap.
Plotly - for displaying 3D surface plot using heightmap data.
NumPy - for 2D array (heightmap) manipulation.
PyArrow - for parquet file batch processing.
PyPalettes - for easy access to many palletes.
Matplotlib - for transforming palettes.
streamlit-folium - a third party Streamlit component for displaying a Folium map, used for preview geocoded geometry to the user.
Lonboard - for displaying geocoded geometry in the Jupyter Notebook.

Streamlit:

LinkedIn:

Bluesky: