I have a question that I cannot find in any of the documentation or forum questions.
I am doing a random forest classification using TrainImagesClassifier.
When setting a max training/validation sample size to unlimited (-1 for sampling.mt/sampling.vt), the max sample size is guided by the smallest class even if sample.bm=0. My question is why is this the case? I thought algorithms like random forest worked well with highly imbalanced classes. Is there a way to turn this off? Or am I misinterpreting the output?
output with sampling.mt=-1 sampling.vt=-1 sampling.bm=0
2023-08-28 16:25:35 (INFO) TrainImagesClassifier: Sampling strategy : fit the number of samples based on the smallest class
2023-08-28 16:25:35 (INFO) TrainImagesClassifier: Sampling rates for image 1 : className requiredSamples totalSamples rate
1 1906 136419 0.0139717
2 1906 12516 0.152285
3 1906 56893 0.0335015
4 1906 35371 0.053886
5 1906 177884 0.0107148
6 1906 77607 0.0245596
7 1906 1906 1
8 1906 8907 0.213989
The classification performs well even with this constraint in terms of f1-scores and so on, but with this sampling strategy it makes it hard to compare with other implementations of random forest that do not do this.