Help: artifacts in semantic segmentation output using OTBTF/Keras tutorial

Hello,

I am trying to apply the semantic segmentation method from the OTBTF/Keras tutorial on my own data and I am encountering some issues.
Here is the tutorial I am following: GitHub - remicres/otbtf_keras_tutorial: How to work with OTBTF and Keras / Tensorflow 2.

I am working with a three-channel RGB image and three land cover classes (water, sand and vegetation). I extracted the patches, performed the training and inference as in the tutorial, modifying only the number of classes and the number of input sources to fit my data.
While the code runs without errors, the output map shows some artifacts, with a grid-like pattern occurring across the entire image.
Here is a screenshot of the output map:

What may be causing this issue and how can I solve it?

Below is the code I used, adapted from the tutorial to fit my data. My data include a three-bands RGB image (GE_aoi1.tif) and a terrain truth raster with three classes (tt.tif).

Sampling script (sampling.py):

import pyotb

labels_img = "/data/tt.tif"
vec_train = "/data/vec_train.geojson"
vec_valid = "/data/vec_valid.geojson"
vec_test = "/data/vec_test.geojson"

pyotb.PatchesSelection({
    "in": labels_img,
    "grid.step": 64,
    "grid.psize": 64,
    "strategy": "split",
    "strategy.split.trainprop": 0.80,
    "strategy.split.validprop": 0.10,
    "strategy.split.testprop": 0.10,
    "outtrain": vec_train,
    "outvalid": vec_valid,
    "outtest": vec_test
})

import os
os.environ["OTB_TF_NSOURCES"] = "2"

ge_img = ("/data/GE_aoi1.tif")
out_pth = "/data/output/"

for vec in [vec_train, vec_valid, vec_test]:
    app_extract = pyotb.PatchesExtraction({
        "source1.il": ge_img,
        "source1.patchsizex": 64,
        "source1.patchsizey": 64,
        "source1.nodata": 0,
        "source2.il": labels_img,
        "source2.patchsizex": 64,
        "source2.patchsizey": 64,
        "vec": vec,
        "field": "id"
    })
    name = vec.replace("vec_", "").replace(".geojson", "")
    out_dict = {
        "source1.out": name + "_ge_patches.tif",
        "source2.out": name + "_labels_patches.tif",
    }
    pixel_type = {
        "source1.out": "int16",
        "source2.out": "uint8",
    }
    ext_fname = "gdal:co:COMPRESS=DEFLATE"
    app_extract.write(out_dict, pixel_type=pixel_type, ext_fname=ext_fname)

Training script (training.py):

from mymetrics import FScore
import otbtf
import tensorflow as tf
import argparse

class_nb = 3             # number of classes
inp_key_ge = "input_ge"  # model input ge
tgt_key = "estimated"    # model target

def create_otbtf_dataset(ge, labels):
    return otbtf.DatasetFromPatchesImages(
        filenames_dict={
            "ge": ge,
            "labels": labels
        }
    )

def dataset_preprocessing_fn(sample):
    return {
        inp_key_ge: sample["ge"],
        tgt_key: otbtf.ops.one_hot(labels=sample["labels"], nb_classes=class_nb)
    }

def create_dataset(ge, labels, batch_size=8):
    otbtf_dataset = create_otbtf_dataset(ge, labels)
    return otbtf_dataset.get_tf_dataset(
        batch_size=batch_size,
        preprocessing_fn=dataset_preprocessing_fn,
        targets_keys=[tgt_key]
    )

def conv(inp, depth, name, strides=2):
    conv_op = tf.keras.layers.Conv2D(
        filters=depth,
        kernel_size=3,
        strides=strides,
        activation="relu",
        padding="same",
        name=name
    )
    return conv_op(inp)

def tconv(inp, depth, name, activation="relu"):
    tconv_op = tf.keras.layers.Conv2DTranspose(
        filters=depth,
        kernel_size=3,
        strides=2,
        activation=activation,
        padding="same",
        name=name
    )
    return tconv_op(inp)

class FCNNModel(otbtf.ModelBase):
    
    def normalize_inputs(self, inputs):
        return {
            inp_key_ge: tf.cast(inputs[inp_key_ge], tf.float32) * 0.01
        }
    
    def get_outputs(self, normalized_inputs):
        norm_inp_ge = normalized_inputs[inp_key_ge]
        
        cv1 = conv(norm_inp_ge, 16, "conv1")
        cv2 = conv(cv1, 32, "conv2") 
        cv3 = conv(cv2, 64, "conv3")
        cv4 = conv(cv3, 64, "conv4")
        cv1t = tconv(cv4, 64, "conv1t") + cv3
        cv2t = tconv(cv1t, 32, "conv2t") + cv2
        cv3t = tconv(cv2t, 16, "conv3t") + cv1
        cv4t = tconv(cv3t, class_nb, "softmax_layer", "softmax")
        
        argmax_op = otbtf.layers.Argmax(name="argmax_layer")
        
        return {tgt_key: cv4t, "estimated_labels": argmax_op(cv4t)}

def train(params, ds_train, ds_valid, ds_test):
    strategy = tf.distribute.MirroredStrategy()
    with strategy.scope():
        model = FCNNModel(dataset_element_spec=ds_train.element_spec)
        
        # Precision and recall for each class
        metrics = [
            cls(class_id=class_id)
            for class_id in range(class_nb)
            for cls in [tf.keras.metrics.Precision, tf.keras.metrics.Recall]
        ]
       
        # F1-Score for each class
        metrics += [
            FScore(class_id=class_id, name=f"fscore_cls{class_id}")
            for class_id in range(class_nb)
        ]
       
        model.compile(
            loss={tgt_key: tf.keras.losses.CategoricalCrossentropy()},
            optimizer=tf.keras.optimizers.Adam(params.learning_rate),
            metrics={tgt_key: metrics}
        )
        model.summary()
        save_best_cb = tf.keras.callbacks.ModelCheckpoint(
            params.model_dir,
            mode="min",
            save_best_only=True,
            monitor="val_loss"
        )
        callbacks = [save_best_cb]
        if params.log_dir:
            callbacks.append(tf.keras.callbacks.TensorBoard(log_dir=params.log_dir))
        if params.ckpt_dir:
            ckpt_cb = tf.keras.callbacks.BackupAndRestore(backup_dir=params.ckpt_dir)
            callbacks.append(ckpt_cb)
       
        # Train the model
        model.fit(
            ds_train,
            epochs=params.epochs,
            validation_data=ds_valid,
            callbacks=callbacks
        )
       
        # Final evaluation on the test dataset
        model.load_weights(params.model_dir)
        values = model.evaluate(ds_test, batch_size=params.batch_size)
        for metric_name, value in zip(model.metrics_names, values):
            print(f"{metric_name}: {100*value:.2f}")


parser = argparse.ArgumentParser(description="Train a FCNN model")
parser.add_argument("--model_dir", required=True, help="model directory")
parser.add_argument("--log_dir", help="log directory")
parser.add_argument("--batch_size", type=int, default=4)
parser.add_argument("--learning_rate", type=float, default=0.0002)
parser.add_argument("--epochs", type=int, default=100)
parser.add_argument("--ckpt_dir", help="Directory for checkpoints")
params = parser.parse_args()
tf.get_logger().setLevel('ERROR')

ds_train = create_dataset(
    ["/data/train_ge_patches.tif"],
    ["/data/train_labels_patches.tif"],
)
ds_train = ds_train.shuffle(buffer_size=100)

ds_valid = create_dataset(
    ["/data/valid_ge_patches.tif"],
    ["/data/valid_labels_patches.tif"],
)

ds_test = create_dataset(
    ["/data/test_ge_patches.tif"],
    ["/data/test_labels_patches.tif"],
)

train(params, ds_train, ds_valid, ds_test)

And, finally, inference script ( inference.py):

import pyotb
import argparse

parser = argparse.ArgumentParser(description="Apply the model")
parser.add_argument("--model_dir", required=True, help="model directory")
params = parser.parse_args()

# Generate the classification map
infer = pyotb.TensorflowModelServe(
  n_sources=1,
  source1_il="/data/GE_aoi1.tif",
  source1_rfieldx=128,
  source1_rfieldy=128,
  source1_placeholder="input_ge",
  model_dir=params.model_dir,
  model_fullyconv=True,
  output_efieldx=64,
  output_efieldy=64,
  output_names="softmax_layer_crop32"
)

infer.write(
  "/data/map_v1.tif"
)

Thanks in advance for any suggestion,
Emilia

Hello Emilia,

  1. It looks like you are using mymetrics which is introduced in another tutorial (this one). Am I right?
  2. The output image looks like the network is not trained properly. Can you share a few details on your setup? (number of samples, evaluation metrics over test dataset)

Thanks

Hello,
Thank you for your reply.
Actually, I solved the problem of the grid-like artifact by modifying the strides of the convolution and deconvolution operations (setting strides=1 instead of strides=2 as it was before).

Regarding your questions:

  1. Yes, I applied the mymetrics script from that tutorial.
  2. My setup included c. 500 training samples and the evaluation metrics used in the tutorial I followed (precision, recall, F-score for each class). Should I modify this? And how can I select the appropriate number of samples? (my image is 500 m x 500 m with a resolution of 0.2 m)

Thanks,
Emilia

There is no general rule. Generally speaking in DL the more (good) data you have, the best!

You still have to keep enough samples for testing and validation purposes, it should be representative enough.

Some people like to create multiple datasets (called “splits”) that enable to measure the variability of the metrics. E.g. you create five “splits”, each one consisting of an independent dataset where you have picked randomly a few % of the samples for training and validation. This way each split is different, enabling to compute the average metric over the splits, and also measure the variability across splits. When you are lucky, you can use a low number of samples in valid/test datasets with a low variability, meaning that your results are steady. Note that the variability can come from your samples (randomly picked) or the training process (e.g. too high learning rate or unstable/random training process outcome)

1 Like