Any idea why SVM isn't separating / missing classes?

wkcmark · January 10, 2023, 11:45am

Any idea why SVM isn’t separating classes? I’ve also carried out the same analysis using NBC and it gives me the results that I would expect.

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Predicted list size : 3315

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: ValidationLabeledListSample size : 3315

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Training performances:

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Confusion matrix (rows = reference labels, columns = produced labels):
[1] [2] [3] [4] [5]
[ 1] 1020 0 0 8 0
[ 2] 82 0 0 565 0
[ 3] 138 0 0 24 0
[ 4] 49 0 0 1236 0
[ 5] 0 0 0 193 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Precision of class [1] vs all: 0.791311

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Recall of class [1] vs all: 0.992218

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: F-score of class [1] vs all: 0.880449

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Precision of class [2] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Recall of class [2] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: F-score of class [2] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Precision of class [3] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Recall of class [3] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: F-score of class [3] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Precision of class [4] vs all: 0.610069

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Recall of class [4] vs all: 0.961868

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: F-score of class [4] vs all: 0.746602

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Precision of class [5] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Recall of class [5] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: F-score of class [5] vs all: 0

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Global performance, Kappa index: 0.502801

2023-01-10 11:42:09 (INFO) TrainVectorClassifier: Execution took 1.106 sec

This is for NBC

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Predicted list size : 3315

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: ValidationLabeledListSample size : 3315

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Training performances:

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Confusion matrix (rows = reference labels, columns = produced labels):
[1] [2] [3] [4] [5]
[ 1] 1019 8 0 1 0
[ 2] 31 519 7 90 0
[ 3] 0 0 162 0 0
[ 4] 0 202 34 1049 0
[ 5] 0 0 0 0 193

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Precision of class [1] vs all: 0.970476

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Recall of class [1] vs all: 0.991245

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: F-score of class [1] vs all: 0.980751

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Precision of class [2] vs all: 0.711934

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Recall of class [2] vs all: 0.802164

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: F-score of class [2] vs all: 0.75436

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Precision of class [3] vs all: 0.79803

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Recall of class [3] vs all: 1

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: F-score of class [3] vs all: 0.887671

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Precision of class [4] vs all: 0.920175

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Recall of class [4] vs all: 0.816342

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: F-score of class [4] vs all: 0.865155

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Precision of class [5] vs all: 1

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Recall of class [5] vs all: 1

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: F-score of class [5] vs all: 1

2023-01-10 11:41:40 (INFO) TrainVectorClassifier: Global performance, Kappa index: 0.843544

julienosman · January 23, 2023, 1:57pm

Dear @wkcmark,

It is hard to see why. Are you using the same dataset for training and validation? I see that they both have the same size. Apparently, class 4 (and also 2) is over represented in your training dataset. This could lead to a bias in the model.