The article certainly describes a method to compare segmentation results, but OTB’s implementation cannot compare segmentation results in practice:
Therefore, the actual application has a misleading name for the user (who is interested on using it with real data, not demo data) and it should rather be named HooverEvaluationSegmentation. And even there with severe limitations because of computational cost (see Very slow HooverCompareSegmentation). I do not know if these limitations are unavoidable because they are intrinsic to the method, or more efficient implementations are possible.