TrainImageClassifier with raster


We regularly run into issues with the training of large area based on vectors. There are similar posts in the usage section, where the suggested solution is to use gdal_polygonize to convert the tif into vector. However, this is just crazy because 1) the trainImageClassifier then convert the vector to tif internally at some stage , 2) with large and fragmented raster, the number of polygon is simply too large and the polygonize (when it works) becomes the bottle neck in terms of processing time and 3) if you simplify the polygons, converting in the two ways might modifiy the dataset.

If needed, I have written some code to train with raster (with different sampling strategies) in my application and I can share it. But I think that you already have all what is needed to do it.




An application to train with image input only without using intermediate ogr files could indeed be useful.

Another case where this would be useful is for the kmeans algorithm (unsupervised classifier). The centroids are computed using points sampled in the image. Internally, this is done by creating a polygon on the image extent with the ImageEnvelope and then performing sampling and training on this polygon.

It might also be applicable to the machine learning dimensionality reduction framework (TrainDimensionalityReduction)

So you have a code that converts raster into ListSample and then use it to train a MachineLearningModel ?



this is quite ad hoc for me, but if it can help you can find some parts here. the first step is to compute the number of pixels of each class in the training raster. Then I randomly sample among the raster (so I read the image twice, which is maybe not that efficient but I couldn’t find a better idea.) otbCounterPersistentFilter.h (4.9 KB) otbCounterPersistentFilter.txx (1.7 KB) otbExtractPersistentFilter.h (6.6 KB)
I use those filter to extract the values in a std::vector<vnl_matrix> and then I convert it to list sample in the main. I know that it would be better to directly return a ListSamples, but it was easier for me with the matrix. Another approach was to write a “dummy image” on one line with the sampled pixels and use all the pixels of this image for training.

		typedef float InputValueType;
		typedef itk::VariableLengthVector<InputValueType> InputSampleType;
		//typedef itk::FixedArray<InputValueType, 6> InputSampleType;
		typedef itk::Statistics::ListSample<InputSampleType> InputListSampleType;

		// Target related typedefs
		typedef int TargetValueType;
		typedef itk::FixedArray<TargetValueType, 1> TargetSampleType;
		typedef itk::Statistics::ListSample<TargetSampleType> TargetListSampleType;

		int sizePixel = inputReader->GetOutput()->GetNumberOfComponentsPerPixel();

		InputListSampleType::Pointer InputListSample = InputListSampleType::New();
		TargetListSampleType::Pointer TargetListSample = TargetListSampleType::New();

		int countclasses(0);

		double maxTransmatValue = transMat.max_value();

		//for (int i(0); i<nbClasses; i++)
		for (int i(0);i<lMatrix.size();i++)
			if (lMatrix[i].rows()>1)
				// Filling the two input training lists
				for (int j = 0; j < lMatrix[i].rows(); ++j)
					InputSampleType sample;

					TargetValueType label = i + 1;

					if ( bNoWeight || ((rand()/(double)RAND_MAX) < (transMat.get(i,0)/maxTransmatValue)*(maxSizeSample/lMatrix[i].rows())) ) // random selection for approximately up to 60000 samples points
						for (int k = 0; k < lMatrix[i].columns(); k++)
							sample[k] = lMatrix[i].get(j,k);