Problem with input k in knn (TrainVectorClassifier)

n-soltanzade · February 22, 2021, 8:33am

Hello
Dear all,

I ran the KNN algorithm in the TrainVectorClassifier on my data.

I found that entering a different values of K (number if nearest neighbor) has no effect on the results of classfication ( evaluated by cohen’s kappa ) and the results for different k values are always similar to the results for the k= 32 which is the default k value in otb (I compared it with the results for the different k values in the Weka software’s knn model. In Weka, by changing the k values, the classfication results are changing.).

I guess there is a bug in the software.

I would be grateful if anyone can help to solve this problem. Thank you.
Regards

julienosman · February 22, 2021, 9:48am

Dear @n-soltanzade,
Thank you for your message. Changing the value for K should indeed modify the output classification. Could you tell us which version of OTB do you use? On which OS? How did you install it? Do you use it from the command line or QGIS or other?
Also, could you share with us the input parameters you used, and the logs you obtain?
This should help us understand what happens.
Sincerely.
Julien.

n-soltanzade · February 22, 2021, 10:55am

Thank you very much for your answer. I use both version 7.0.0 and developed version 7.2 on Windows. I run the mapla option (Graphic version).
My input is vector data in shp format (10 columns and about 18,000 rows) and so is the test data. I run the algorithm with the default value (k = 32). The result is kappaa = 0.67 and I try for other k, the result is still kappa = 0.67

n-soltanzade · February 23, 2021, 7:09am

I classified Sentinel-2 image with 10 bands
I upload picture of input parameters

and logs
I try it with other k such as k=100
the result is the same

julienosman · February 23, 2021, 8:25am

I can reproduce this behavior. I need to analyses this it.

n-soltanzade · February 26, 2021, 6:02am

Hi,
I am wondering if you have found the reason behind this. Thanks

n-soltanzade · April 3, 2021, 5:53pm

Hello,

Dear colleagues, can anyone help in this case? I would be very thankful if anyone could help solve this problem.

Using the knn algorithm in otb is the only solution available to me. Because it has the option to introduce parameter value (K) to the classification process and the option to use smote data (SampleAugmentation). The other qgis plugin does not have these features that are important to me. Any help, advice or guidance would be greatly appreciated. Very thanks and excuse me for bad English.

HV321 · April 15, 2021, 6:51am

Hi.

I also encounter the same problem. It seems that by changing the k value, the output of the model does not change. I deployed the KNN module in QGIS’s OTB Plugin.

#julienosman and # n-soltanzade:

Do you have any suggestions on how to solve the issue?

Thank you!

Cedric · April 16, 2021, 10:26am

Hello @n-soltanzade and @HV321

this is a bug, the number of neighbor is correctly written in the output model, but is not read back when loading the model. As a consequence the default k, 32, is used. I pushed a fix on the gitlab and the problem should be fixed in upcoming OTB versions.

In the meantime, if you need to use KNN with a custom number of neighbor, you can use older versions of OTB that uses OpenCV 2 (the bug appeared during the switch to OpenCV 3), for example OTB 6.6.0. You can find it in the archive page (note that OTB 6.6.1 has the bug).

Cédric

HV321 · April 16, 2021, 11:48am

Thank you Cedric