RF classification

Hello
This semester I am using OTB , instead of ENVI, for my classes of Multispectral Remote Sensing. I have performed already several tasks with it and it is performing extremely well! However, I am having some difficulties with classification.
I am trying to classify, using RF, a small subset of a Sentinel-2 using just a few classes (4) just for testing OTB. I am following the instructions in https://www.orfeo-toolbox.org/CookBook/recipes/pbclassif.html# but at the end a get a black image with all the pixels classified as 20 (the code value of the “baresoil” class). I suspect that it might be due to the fact that I am using just 46 samples (7 for class 10, 10 for class 20, 13 for class 30 and 16 for class 40). I am sending in attach a PDF with printscreens of the process and also the shapefile used for the classification. Since it is not possible to send the image, I am sending the printscreen of ReadImageInfo in the PDF file.CLASSIFICATION.rar (5.3 KB) Classification_OTB.docx (3.0 MB)

Thank you in advance,
Ana Navarro

Hello Ana,

Looking at your inputs I think the statistics file is missing in the ImageClassifier parameters. As the model has been trained on normalized features it expects normalized pixels as input.

Also note that feature normalization is not required for the random forest algorithm, and the results should be identical with or without performing it (this is not the case for svm for example).

Cédric

Hello Cédric

Thank you very much for your quick reply!
I had already figured it out and I was able to get a classified image to which I applied a lut_mapping_file.

However, I did same modifications to the steps’ sequence, replacing “TrainVectorClassifier” with “TrainImagesClassifier” and observed some differences in the classification results, with “TrainVectorClassifier” not able to classify correctly the less representative class! Although I haven’t performed yet the validation, from a visual analysis it seems that the results from “TrainImagesClassifer” are more accurate! Why so?

Thank you.

Best regards,

Ana

TrainImagesClassifier actually calls TrainVectorClassifier under the hood. The application chains :

  • ComputeImagesStatistics
  • PolygonClassStatistics on each input pair of vector file / image file
  • MultiImageSamplingRate (gather the results of PolygonClassStatistics and determines how many samples should be taken in each input vector file)
  • SampleSelection on each input pair of vector file / image file
  • SampleExtraction on each output of SampleSelection
  • TrainVectorClassifier

This means that the results of the classification depends on how the sample were chosen in SampleSelection.

In your case, maybe the results are better with TrainImagesClassifier because this application uses the same number of sample in each class by default (and max 1000 samples per class by default), and thus each class has the same weight in the classification whereas you used a total strategy in your TrainVectorClassifier processing chain, and therefore classes with less element are also less represented in the classification.

Cédric