TrainImagesClassifier - Validation? Sample size?

arijeannin · June 26, 2019, 10:00am

Hi everyone,

After couple of weeks using TrainImagesClassifier I still don’t understand some points:

1/ We can enter training shapes AND validation shapes, but apparently the number of shapes for each usage must match with the number of input rasters?! Is that really not possible to validate using only one shape? (I solved it using ComputeConfusionMatrix, but could be great if I could get it at the training stage)

2/ whatever the integer I use after the ‘-sample.mt’, OTB use all the pixels, maybe I did something wrong?

3/ I would like now to use TrainImagesClassifier for training only (no validation), so I put ‘-sample.vtr 0’, results an ITK error that stop the computation, why?

4/Just for the knowledge, what the difference using the input “ImageStatistics.xml” or not ?

Thank you for the help and many thanks for making OTB working great!
Ari

Cedric · June 26, 2019, 3:36pm

Hello Ari,

1/ It is not possible to use only one shape. Maybe a workaround would be to provide validation shapefile containing no geometries ?

2/ I think the -sample.mt option is not used if -sample.bm=1, which is true by default, so you should also set -sample.bm=0.

3/ As far as I know, TrainImagesClassifier will always perform the validation. If you set -sample.vtr=0 the training sample will be used as validation set.

4/ ImageStatistics.xml is a file containing mean and variance for each band of the input raster. It is used to normalize the input values before the training. This is important for some classifier (in particular those using distance, like knn or svm).

Hope that helps,
Cédric

arijeannin · June 27, 2019, 12:16pm

Helps a lot!! thank you!

For the 3/, -sample.vtr=0 returns apparently an error. I’m already using my training as validation with the lowest ratio possible (maybe) (-sample.vtr=0.1).
I think now that I should fake some validation shape and do not pay attention to the stats returned as far as I use ComputeConfusionMatrix later on.

Thanks again Cedric