LargeScaleMeanShift - sqlite / gpkg fatal error

Hi OTB user community,

I’ve been running into some trouble running LargeScaleMeanShift on a small 800 x 600 px region of interest with a 4-band (principal components). I’m running the otbcli_LargeScaleMeanShift on Mac (otb 7.2.0). When I run the command using ESRI Shapefile as output the command works as expected. However, if I change the output vector format to SQLite or GeoPackage I am getting FATAL errors with those formats. Incidentally those failed processes also take a long time to fail, when the entire successful shp based process completes in approximately 9 sec. It appears that there is some issue during the update process. Is there a way to ensure successful ogr update within the command line? I’m testing these formats because I would like to scale up to use a larger image that would likely exceed 4GB limitations of Shapefile format. If there is another recommended OGR compatible vector format that may not have these issues or size limitations, I’d be happy to try that as well.

Below is an extract (beginning at the Vectorization stage) of each of my tests where I maintain the command syntax uniformity other than the output vector format which is varied to SQLite and GeoPackage respectively.

shp command:

otbcli_LargeScaleMeanShift -in “sh12_vnir_mosaic_rd_elc_rectified_r1_pca.tif?&bands=1:4” -spatialr 0.2 -ranger 15 -minsize 10 -mode vector -mode.vector.out region1_lsms_pca.shp -ram 8192 -progress 1

2021-02-08 10:23:19 (INFO) LargeScaleMeanShift: Vectorization …
2021-02-08 10:23:20 (INFO) LargeScaleMeanShift: Merging polygons across tiles …
2021-02-08 10:23:51 (INFO) LargeScaleMeanShift: Elapsed time: 9.58778 seconds
2021-02-08 10:23:51 (INFO) LargeScaleMeanShift: Final clean-up …

sqlite command:

otbcli_LargeScaleMeanShift -in “sh12_vnir_mosaic_rd_elc_rectified_r1_pca.tif?&bands=1:4” -spatialr 0.2 -ranger 15 -minsize 10 -mode vector -mode.vector.out region1_lsms_pca.sqlite -ram 8192 -progress 1

2021-02-08 06:45:02 (INFO) LargeScaleMeanShift: Vectorization …
2021-02-08 08:41:28 (INFO) LargeScaleMeanShift: Merging polygons across tiles …
2021-02-08 08:41:28 (FATAL) LargeScaleMeanShift: itk::ERROR: Cannot update a feature in the layer <region1_lsms_pca>:

gpkg command:

otbcli_LargeScaleMeanShift -in “sh12_vnir_mosaic_rd_elc_rectified_r1_pca.tif?&bands=1:4” -spatialr 0.2 -ranger 15 -minsize 10 -mode vector -mode.vector.out region1_lsms_pca.gpkg -ram 8192 -progress 1

2021-02-08 10:25:30 (INFO) LargeScaleMeanShift: Vectorization …
Warning 1: A geometry of type POLYGON is inserted into layer region1_lsms_pca of geometry type MULTIPOLYGON, which is not normally allowed by the GeoPackage specification, but the driver will however do it. To create a conformant GeoPackage, if using ogr2ogr, the -nlt option can be used to override the layer geometry type. This warning will no longer be emitted for this combination of layer and feature geometry type.
2021-02-08 11:09:23 (INFO) LargeScaleMeanShift: Merging polygons across tiles …
ERROR 1: Transaction not established
2021-02-08 12:15:25 (FATAL) LargeScaleMeanShift: itk::ERROR: Vectorization(0x7ffe2cf01840): Unable to commit transaction for OGR layer region1_lsms_pca.

Thank you for your kind assistance and any advice that might be offered to remedy this situation with these data formats.

Robert

Dear @rdzur,

I can reproduce the issue using OTB 7.2.0 and OTB 7.0.0 for Ubuntu.
I also tried to generate kml and geojson files, both failing with a different error message.

I noticed that this application is tested only with shapefiles as output. I will investigate further on this issue.
Sincerely.
Julien.

Hi Julien,

Thank you for your investigation of this issue. Will stand by. By the way I also attempted with PCIDISK, however, my OTB install does not appear to have been compiled with that driver. But I’m not exactly sure what drivers are available in the OTB install as I’d like to be able to execute orginfo --formats, however, that does not produce any result. In fact I’m really not certain where the ogr utilities are installed in OTB-7.2.0-Darwin64. I can see the gdal utilities under bin/ but no ogr2ogr or ogrinfo.

Thanks again for your investigation of this issue.

Robert

The image I used for my test was erroneous. I tried again with a correct input, and KML works fine. You could try to use this format for your larger image.

For the other formats, I opened an issue.

Hi Julien,

Appreciate the feedback on your test with KML. I will try out with my dataset with KML.

Thank you again for your efforts to work on these issues.

Robert

Hi Julien,

I thought I would report that the KML format worked fine on my small region of interest. At least it completed without reporting an error. One notable difference, however, was that it does not appear to have merged the computed tiles completely nor did it populate the same attributes as the shp file (i.e. nbPixels and other attributes are NULL in the KML output). Despite these issues I attempted to run the data at full scale.

I ran two different tests at full scale and they appear to reveal that I may not fully understand the “memory” parameter.

In my first test I set the memory flag to 62.5% of available system ram. And in the second test I set the memory to be below 40% of available system ram. This seemed to run ok up to computing stats: Computing stats on input image …: 100% [**************************************************] (5h 58m 40s)

In both cases the time to run up to the computing stats phase was similar. In both cases the Application appears to be consuming much more system memory than the input flag parameter. I killed the first process below and launched the second with a lower memory parameter and with more tiles (by reducing the tile size parameter), however, it still appears to be consuming nearly the

maximum resources.

otbcli_LargeScaleMeanShift -in “sh12_vnir_mosaic_rd_elc_rectified_pca_all_nd.tif?&bands=1:4” -spatialr 0.2 -ranger 15 -minsize 10 -mode vector -mode.vector.out sh12_vnir_mosaic_rd_elc_rectified_pca_lsms.kml -ram 80000 -tilesizex 28045 -tilesizey 28045 -progress 1

otbcli_LargeScaleMeanShift -in “sh12_vnir_mosaic_rd_elc_rectified_pca_all_nd.tif?&bands=1:4” -spatialr 0.2 -ranger 15 -minsize 10 -mode vector -mode.vector.out sh12_vnir_mosaic_rd_elc_rectified_pca_lsms.kml -ram 50000 -tilesizex 5000 -tilesizey 5000 -progress 1

Anyway, I’m not sure that this process will even complete under the current command scenarios and given the lack of merging I’d probably want to process with fewer tiles. If you have any thoughts about this and my approach, I’d be very interested in your opinion.

Thank you.

Robert

I noticed the same problem. I updated the issue with this information.

This is surprising. I will need to investigate more. I will come back to you.

Julien.

Thank you Julien,

One additional note to pass on; I let the process continue and after a couple days the machine did in fact run out of resources and rebooted itself with the terminal restored Sunday night.

Computing stats on input image …: 100% [**************************************************] (5h 58m 40s)
[Restored Feb 14, 2021 at 3:41:00 AM]

Thank you again for your assistance.

Robert

Dear @rdzur,

LargeScaleMeanShift application is the composition of multiple smaller applications. And as far as I understand, the ram parameter is only used by a small section of the pipeline (the computation of the image statistics). But most of the pipeline is not affected by this parameter. So it seems that the best way to manage the memory usage is to set a different size for the tiles (with the parameters tilesizex and tilesizey).

Julien.

Thanks Julien,

That’s what I was thinking when I dropped the tile tilesizex tilesizey; I’ll have to experiment further with that parameter and I think that makes sense however, with the lack of merging in the KML I was trying to limit the amount of potential manual merging that might be at play with smaller tile sizes. Anyway, thanks again for the ideas on this.

Robert.