Example n° 8: Extract binary mask from vector shapefile¶
The target use case for this example is:
- Lossy images representing regions of a gridded area for an interesting region with no masks available. Some images may contain data outside of the region of interest.
- An ancillary shapefile contains a record for each image, providing the region of interest (marked in yellow in the following image).
- We want to use the shapefile to embed a binary mask in the images.
For this example we will use the same data of example 7.
Note
On a Windows machine you can set-up a shell with all GDAL Utilities, running directly the file OSGeo4W.bat
under the %TRAINING_ROOT% folder.
- Navigate to the workshop directory
$TRAINING_ROOT/data/user_data/gdal_processing_data
(on Windows%TRAINING_ROOT%\data\user_data\gdal_processing_data
) and find the rgb_data directory.
First, let’s set an environment variable to enable usage of internal masks:
export GDAL_TIFF_INTERNAL_MASK=YES (linux)
or
SET GDAL_TIFF_INTERNAL_MASK=YES (windows)
In this example we are going to do the following processing steps:
- The shapefile geometries is rasterized by filtering on each element of the grid.
- The rasterized mask get attached to the original image.
This time we are going to set up a small python script to extract information from the rasters, and use it in the GDAL command invocations.
Let’s see the processing being used on single GeoTIFF first.
The command is included in a getmask.py
file. It’s minimal, skipping parameters validation and additional controls for the sake of simplicity:
from osgeo import gdal
import os
import subprocess
import sys
shapefile = 'rgb_vectormask.shp'
shape = 'rgb_vectormask'
# Extracting the filename from the arguments
file = str(sys.argv[1])
outfile = file + '.msk.tif'
# files are named like rgb_r1c2.tif, rgb_r2c2.tif, rgb_r4c4.tif...
# so let's extract the rxcy part identifying the position in the grid
grid_id = file[4:8]
src_raster = gdal.Open(file)
# Get the width and height of the raster as well as the bbox
x_size = src_raster.RasterXSize
y_size = src_raster.RasterYSize
bbox_dict = gdal.Info(src_raster, format="json")["cornerCoordinates"]
ur = bbox_dict["upperRight"]
ll = bbox_dict["lowerLeft"]
# This command creates an output tif having same size and bbox of the underlying image
# setting pixel value to 1 on the geometry matching the grid id.
gdal_rasterize_str = "{0} -at -burn 1 -where grid_id='{1}' -l {2} -a_srs EPSG:26913 -ts {3} {4} -te {5} {6} {7} {8} -CO TILED=YES -CO COMPRESS=DEFLATE -CO PHOTOMETRIC=MINISBLACK -CO NBITS=1 -OT BYTE -OF GTIFF {9} {10}"
gdal_rasterize_process = gdal_rasterize_str.format(
"gdal_rasterize",
grid_id,
shape,
x_size,
y_size,
ll[0],
ll[1],
ur[0],
ur[1],
shapefile,
outfile,
)
subprocess.run(gdal_rasterize_process, shell=True)
The command invocation can be integrated in a similar command used in example 7 to produce a GeoTIFF with internal mask.
Windows (you can put this on a vectormask.bat
file in the directory containing the files):
for %%F in (rgb_????.tif) do (
gdalbuildvrt -b 1 %%~nF_b1.vrt %%F
gdalbuildvrt -b 2 %%~nF_b2.vrt %%F
gdalbuildvrt -b 3 %%~nF_b3.vrt %%F
python getmask.py %%F
echo Recompose the 3 bands together with the mask:
gdalbuildvrt -separate %%~nF_final.vrt %%~nF_b1.vrt %%~nF_b2.vrt %%~nF_b3.vrt %%F.msk.tif
echo Write the result
gdal_translate -CO "COMPRESS=JPEG" -CO "PHOTOMETRIC=YCBCR" -CO "JPEG_QUALITY=85" -CO "TILED=YES" -b 1 -b 2 -b 3 -mask 4 %%~nF_final.vrt masked_%%~nF.tif
)
Finally, let’s add the overviews:
Windows (you can put this on a maskedaddo.bat
file in the directory containing the files))
for %%F in (masked*.tif) do (
gdaladdo -r average %%F 2 4 8 16 32
)
Linux:
#!/bin/bash
export GDAL_TIFF_INTERNAL_MASK=YES
FILES="rgb_????.tif"
echo Computing binary mask
for F in $FILES
do
echo Processing $f
A=${F%%.*}_b1.vrt
B=${F%%.*}_b2.vrt
C=${F%%.*}_b3.vrt
gdalbuildvrt -b 1 $A $F
gdalbuildvrt -b 2 $B $F
gdalbuildvrt -b 3 $C $F
python getmask.py $F
gdalbuildvrt -separate ${F%%.*}_final.vrt $A $B $C ${F%%.*}.msk.tif
done
echo Writing composed files
for F in $FILES
do
gdal_translate -CO "COMPRESS=JPEG" -CO "PHOTOMETRIC=YCBCR" -CO "JPEG_QUALITY=85" \
-CO "TILED=YES" -b 1 -b 2 -b 3 -mask 4 ${F%%.*}_final.vrt masked_${F%%.*}.tif
done
echo Adding overviews
FILESOV="masked*.tif"
for F in $FILESOV
do
gdaladdo -r average $F 2 4 8 16 32
done
Once configured in GeoServer with transparent footprint, the result will look as follows:
This kind of processing can be used in similar scenarios:
A set of georeferenced cartographic images (including legends and notes) composing a region/district, plus a shapefile with footprints for each sheet of the map
A set of lossy compressed datasets with shapefile footprint. Compression artifacts at the edges will ruine transparent masking based on pixel values.