TrainVecteurClassifier Error

KSATTIGNON · April 28, 2020, 8:16am

Hello,

I am new to OTB and I’m using it for my Master’s Thesis. I have performed already several tasks with it and it is performing extremely well! Thanks for the team of OTB and all people who have contribute to it.
However, I am having some difficulties since Saturday to use the command TrainVectorClassifier to train a RF model for crop classification ( Crop mapping with Sentinel-2 Level 3A, especially Irrigated and non-irrigated crops).

When I launched the command, I’m getting this error: (FATAL) TrainVecteurClassifier: The field name for feature value_0 has not been found in the vector file.
I am sending in attach a PDF with printscreens of the process for the classification. Capture d’écran.pdf (217.0 KB)

My training data contains 165097 polygons for 24 crop classes. But when I disable the testing data option, the TrainVectorClassifier ignores 18 of my classes. Log report.docx (24.3 KB)

A friend advised me to dissolve my training data and try again using the TrainImageClassifier, which I did and since 6 hours now my classification is still at 4%**

What must I do now. help me please.
Thank you in advance,
KS ATTIGNON

Cedric · April 28, 2020, 1:57pm

Hello,

It seems that your validation data (Testing.shp) does not contain the value_0 field. Is it really the case ? The validation data should contain the same field as the training data.

Cédric

KSATTIGNON · April 28, 2020, 2:05pm

The value_0 which is in the Training data arrived following the sample extraction.
To have it in the Testing data, I will have to do the same procedure with the Testing file too. And I wonder to know if I should calculate the sample selection and sample extraction with the Testing file too ??

Cedric · April 28, 2020, 2:10pm

yes !

Note that this validation data is not mandatory in TrainVectorClassifier

KSATTIGNON · April 28, 2020, 2:17pm

Ok Thanks.

But i still have a question : why the TrainVectorClassifier instead of 24 classes gives me only 6 Classes and ignore 16 classes when I have practically more than 10,000 polygons in each crop class.

KSATTIGNON · April 28, 2020, 2:23pm

And is it really necessary to dissolve the training data before making theTrainImageClassifier?

From 1am until now, my trainImageClassifier is only at 7%. Is this normal like that?
Is there a way to speed it up?

Thank you for your help.

Cedric · April 28, 2020, 2:58pm

What do you mean by “Dissloving the training data” ?

Looking at your log report, I notice two things :

It looks like your image has some very low values, reported by

Warning 1: Value -3.40282306073709653e+38 of field value_5 of feature 1069 not successfully written. Possibly due to too larger number with respect to field width

Is it the case ? These values will be used in training, and this can ruin the performances of the resulting classifier.

You set the bm parameter to 1. This means that the number of sample per class is at maximum the number of sample in the less represented class, here 90 samples (class 9), even if you have thousands of sample in other classes. This is not necessarily bad, but be aware of it !

Regarding processing time I don’t know … your input data looks huge. What is the size of the input image, and the number of bands ?

KSATTIGNON · April 28, 2020, 8:07pm

Dissolving the Training data is a fusion of polygons of classes to only one polygon per class.
I’m using the RPG 2018 data for Training with 165097 polygons. I use 70% of RPG 2018 data in my area for training and 30% for testing
I am not working on the whole sentinel-2 L3A image, but only on a concatenated image of NDVI + NDWI.
My study area is covered by 5 tiles (scenes) so it is a mosaic of 5 tiles between May and October.

The concatenated image is 15 GB with 14 bands (7 NDVI and 7 NDWI)