Input Vector Data -- Train Vector Classifier

Dear all,

I want to apply Train Vector Classifier in a bunch of segments (polygons) that do not have attributes (mean, stdev). However, I have a set of polygons (train data) that contain attributes (mean,std) acquired by the original image using the zonal statistics tool.

The Train Vector Classifier returns error since there is not a match between the attributes of the 2 input vector data.
So, I applied again zonal statistics tool for all the segments, just to have the attributes, but is it acceptable???
Because both input data will have values from the same image.

Maybe I am wrong, but I would like your opinions!

I look forward to your response.



I am not sure to understand what you are trying to achieve. You want to use two sets for the training, or do you have a training and a validation dataset ? Can you provide the parameters used in this application ?

Does your vector contain the classes associated with the polygon ?

Anyway I don’t see why using the same images for both sets would be a problem. As long as your polygon don’t overlap (in which case some of the data will be used more often in the training, which could lead to some bias in the classifiers). But that really depends on the problem you are trying to solve. If you are trying to classify this particular image (by knowing the classes of a few polygon on this image) then you can use this data (but take care on how you validate the classification !). If you want use the classifier in other images, you might want to add some diversity on the training data.

But maybe you should test what works best in your application.



Hello Cedric,

Thanks for your answer. I have these files:

  1. A segmented vector file with polygons, but attributes such as mean,std do not exist.
  2. A training dataset, with polygons that I chose from the segmentation file, and contain the classes.
  3. An image with 2 bands. Each one correspond to calculated indices (e.g. NDVI, NDWI).

My training dataset contains attributes (mean,std) calculated using the zonal statistics tool. I do this, because I want to base my classification on them

But when I tried to run the Train Vector Classifier algorithm, I get an error since there are not these attributes in the segmentation file.

I thought to compute them (for all the segments) using the zonal statistics tool, and then train the algorithm.
Is it acceptable or I introduce a bias for the same polygons? Can I overcome this problem somehow?



what I don’t understand is how you want to use the segmented vector (1) ? You could train the classifier with the training dataset (2) alone. Do you want to use (1) as an additional training sample, or as a validation sample ? Is both case you need to add the class and feature (mean, std) fields to (1).

If you only want to classify (1) you only need to add (mean, std). The classification should be performed with VectorClassifier

Regarding bias, if you use (1) for validation, make sure the polygons are not the same as the one used for training (2).


1 Like

Thanks Cedric,
it is clear now.