Partitioning large watersheds into sub-basins with RichDEM
Partitioning large watersheds into sub-basins with RichDEM follows a deterministic, raster-based hydrological routing sequence: condition the DEM, compute flow direction (D8 or D∞), generate a flow accumulation surface, apply a drainage threshold to isolate the stream network, and segment contiguous contributing areas. RichDEM’s C++ backend leverages memory-mapped I/O and OpenMP parallelization, enabling reproducible sub-basin extraction on multi-gigabyte DEMs without proprietary GIS overhead. This pipeline integrates directly into broader Watershed Delineation & Catchment Synchronization workflows, providing a scalable foundation for regional hydrologic modeling, flood routing, and water quality assessment.
Environment & Dependency Matrix
RichDEM’s Python bindings wrap compiled C++ routing algorithms and rely on strict GDAL alignment. Environment mismatches are the primary cause of silent raster corruption or segmentation faults.
| Component | Minimum Version | Configuration Notes |
|---|---|---|
| Python | 3.8–3.11 | Python 3.12+ requires RichDEM ≥0.3.5 or a conda-forge rebuild |
| RichDEM | 0.3.4 | Install via conda install -c conda-forge richdem for precompiled wheels |
| GDAL | 3.4.0 | Python gdal binding version must exactly match the system GDAL library |
| NumPy | 1.21 | Required for array interoperability and threshold masking |
| OS | Linux/macOS/Windows | Windows requires MSVC runtime; conda environments are strongly recommended |
| Parallelism | OpenMP 4.0+ | Set OMP_NUM_THREADS before importing RichDEM to control CPU allocation |
Critical memory note: RichDEM loads rasters into RAM by default. For DEMs exceeding 8–12 GB, enable virtual memory tiling via memmap=True during load or pre-tile with gdal_translate -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512.
Production Pipeline Code
The following script implements a complete, automated sub-basin partitioning pipeline. It assumes the input DEM has already undergone hydrological conditioning (e.g., sink filling with rd.FillDepressions or external LiDAR preprocessing).
import os
import numpy as np
import rasterio
from rasterio.transform import from_origin
import richdem as rd
import warnings
# Set parallelism BEFORE importing RichDEM
os.environ["OMP_NUM_THREADS"] = "8"
warnings.filterwarnings("ignore", category=FutureWarning)
def partition_subbasins(dem_path, out_dir, accumulation_threshold=1000, method="D8"):
"""
Partition a large watershed into sub-basins using RichDEM.
Parameters
----------
dem_path : str
Path to hydrologically conditioned DEM (GeoTIFF)
out_dir : str
Output directory for intermediate and final rasters
accumulation_threshold : int
Minimum flow accumulation cells to define a stream
method : str
Routing algorithm: 'D8' or 'Dinf'
"""
os.makedirs(out_dir, exist_ok=True)
# 1. Capture spatial reference from the source GeoTIFF, then load the
# elevation grid into a RichDEM array for fast in-memory processing.
print(f"[1/5] Loading DEM: {dem_path}")
with rasterio.open(dem_path) as src:
ref_transform = src.transform
ref_crs = src.crs
dem = rd.LoadGDAL(dem_path, no_data=-9999)
if dem is None:
raise RuntimeError("Failed to load DEM. Verify GDAL driver support and file integrity.")
# 2. Compute flow direction
print(f"[2/5] Computing {method} flow direction...")
routing_method = "Dinf" if method.upper() == "DINF" else "D8"
fd = rd.FlowDirection(dem, method=routing_method)
# 3. Generate flow accumulation
print("[3/5] Computing flow accumulation...")
acc = rd.FlowAccumulation(fd, method=routing_method)
# 4. Threshold to isolate stream network
print(f"[4/5] Extracting streams (threshold >= {accumulation_threshold} cells)...")
streams = (np.asarray(acc) >= accumulation_threshold).astype(np.int32)
# 5. Segment watersheds
print("[5/5] Labeling sub-basins...")
subbasins = rd.Watersheds(fd, streams)
# Export results with preserved geospatial metadata
_export_raster(subbasins, os.path.join(out_dir, "subbasins.tif"), ref_transform, ref_crs)
_export_raster(np.asarray(acc), os.path.join(out_dir, "flow_accum.tif"), ref_transform, ref_crs)
_export_raster(streams, os.path.join(out_dir, "stream_network.tif"), ref_transform, ref_crs)
print("Pipeline complete. Outputs saved to:", out_dir)
def _export_raster(data, out_path, transform, crs):
"""Export an array to GeoTIFF using a captured rasterio transform and CRS."""
arr = np.asarray(data)
with rasterio.open(
out_path,
"w",
driver="GTiff",
height=arr.shape[0],
width=arr.shape[1],
count=1,
dtype=arr.dtype,
crs=crs or "EPSG:4326",
transform=transform,
compress="LZW",
tiled=True,
blockxsize=512,
blockysize=512,
nodata=-9999
) as dst:
dst.write(arr, 1)
Memory & Parallelism Optimization
Large-scale hydrological routing is I/O and memory-bound. To prevent out-of-memory crashes or thread contention:
- Enable memory mapping: Pass
memmap=Truetord.LoadGDAL(). This forces the OS to page raster blocks on demand rather than loading the entire array into physical RAM. - Pre-tile source DEMs: Use
gdal_translatewithTILED=YESandBLOCKXSIZE=512before processing. Tiled GeoTIFFs align with RichDEM’s internal chunking strategy, reducing disk seek overhead. See GDAL GeoTIFF Driver Options for configuration details. - Control OpenMP threads: Set
OMP_NUM_THREADSto match physical cores (not logical threads). Hyperthreading rarely improves raster routing performance and can increase cache thrashing. - Avoid intermediate copies: RichDEM returns
rdArrayobjects that share memory with underlying buffers. Do not call.copy()unless explicitly modifying values; otherwise, you will double memory consumption.
Validation & Integration
Automated partitioning requires post-processing validation to ensure hydrological realism:
- Drainage Density Check: Calculate total stream length per unit area. Values significantly outside regional norms (e.g., 0.5–3.0 km/km² for temperate basins) indicate threshold misalignment or DEM artifacts.
- Gauge Alignment: Overlay USGS/NHD stream networks or field-mapped catchment boundaries. Sub-basin outlets should align with known monitoring stations or confluence points.
- Threshold Sensitivity: The
accumulation_thresholdparameter controls basin granularity. Lower values produce finer, more fragmented sub-basins; higher values merge tributaries into larger units. For methodological guidance on threshold selection and aggregation logic, consult Basin Partitioning Strategies. - Algorithm Choice: D8 routing is computationally efficient and suitable for coarse-resolution DEMs (>10 m). D∞ distributes flow across two downslope cells, reducing artificial channelization but increasing runtime by ~15–20%. Refer to the RichDEM Documentation for algorithm-specific parameters and edge-case handling.
When integrating outputs into distributed hydrologic models (SWAT, HEC-HMS, or custom Python frameworks), ensure CRS consistency, verify that -9999 nodata values are respected during raster math, and maintain a version-controlled record of the threshold and routing method used. This guarantees reproducibility and simplifies catchment synchronization across multi-temporal datasets.