Zonal statistics "corrupted size vs. prev_size"

I am using ZonalStatistics with S2 (NDVI) time series. My output file is missing some polygons, and it seems that it happens if there is no pixel centroid above the polygon.

I can read this in the logger :

corrupted size vs. prev_size
Aborted (core dumped)

I’m working with 7.0 (Ubuntu - APT build)

It looks like a memory error but the output file is okay, nevertheless I lost 10% of the polygons in the process. I tried SHP and GPKG.

What is expected with polygons that are the same size or smaller than a pixel ?

It could be more convenient to use NULL for those values, thus making sure the output shapefile is not missing features. And no core dump…

Hello,

ZonalStatistics uses rasterization to count pixel values. Yes, in the current implementation, only pixels that have their centroid inside a polygon are used (we use GDAL algorithm for rasterization with ALL_TOUCHED disabled).

In your case, the current implementation of ZonalStatistics is not well suited because your polygons are smaller than your pixels. The ZonalStatistics application is quite recent, and we might improve it a bit! For instance, enabling the user to chose ALL_TOUCHED, or something else (see the proposed workaround below).

In exchange of a quick workaround for your problem, it could be nice that you could provide a minimal example of this bug you found :angel:. Like providing a small subset of your image with some polygons that cause the issue. For that, if you want, you can open an issue on the Gitlab instance.

The workaround

Since rasterization is used, and you need pixels smaller than the polygons, one trick consist to upsample your image.
If you like python, you can develop a mini-pipeline that chains Orthorectification (to change the pixel spacing) and ZonalStatistics to do that (see the OTB python api doc).
Or, you can use gdalwarp or Orthorectification to resample the image prior to use ZonalStatistics but this would write huge temporary files and would be also slower.

Hope this workaround will help you! And that you will help us to improve the app in return :wink:

1 Like

Hi,
Merci Rémi !

I’m quite hurry on this one, I will just ignore those polygons for now. But it looks like depending from the input file, or the output format, or may be OTB version, it can fail also for big polygons.
I mean it has to be malloc or something else with memory, it looks almost random, I don’t always get the same output file length.
I will open an issue on git (hub or lab?) with a small data set.
But before that may be I can show you the debug logger and something like a valgrind output ?

I’m not a cpp dev but I’d be happy to help on this one.

On pourra bientôt en discuter à la MTD j’espère :slight_smile:

A +

I just removed polygons < 100m² and it went well (with GPKG). No core dump, and I lost 0.5 % of my features (because of pixel center I guess), without the output file corruption.

So it seems something went wrong with memory not because of the pixel center but for smaller polygons…

Salut Vincent,

This might be a bug in GDAL. I think so because we delegate all the rasterization stuff to it. It could be also interesting to check if otbcli_Rasterization gives the same error, and if gdal_rasterize throws warning(s)!

A+

Hi,

I tried both otb and gdal rasterize attribute mode (gid) and 10m spacing, with smaller polygons, and no errors or warnings.
May be my ndvi stack is corrupted…

I’ll try to reproduce with another image.

Bye