Example usage recm clustering#

Here we will demonstrate how to use evclust to make an relational evidential clustering with the iris dataset. The most common case of relational data is when we have (a matrix of) dissimilarity data, say \(R = \left [ r_{ij} \right ]\), where \(r_{ij}\) is the pairwise dissimilarity (often a distance) between objects \(i\) and \(j\).

Considering that the characteristics of species are represented in the form of a similarity matrix (in this case, using Euclidean distance) or relational data, and that it is possible for a species to belong to multiple groups simultaneously, thereby creating uncertainty in the similarity matrix.

import evclust

print(evclust.__version__)
0.2.1

Imports#

from evclust.recm import recm
from evclust.datasets import load_iris
from evclust.utils import ev_summary, ev_plot, ev_pcaplot

Data#

There is test data in the package. We will use the commonly used IRIS dataset and transform it into relational data by calculating a similarity matrix based on Euclidean distance.

# Import test data
from sklearn.metrics.pairwise import euclidean_distances
df = load_iris()
df=df.drop(['species'], axis = 1) # del label column
dfr = euclidean_distances(df)
print(dfr[:5, :5])
[[0.         0.53851648 0.50990195 0.64807407 0.14142136]
 [0.53851648 0.         0.3        0.33166248 0.60827625]
 [0.50990195 0.3        0.         0.24494897 0.50990195]
 [0.64807407 0.33166248 0.24494897 0.         0.64807407]
 [0.14142136 0.60827625 0.50990195 0.64807407 0.        ]]

RECM#

# Evidential clustering with c=3
model = recm(D=dfr, c=3, disp=False)

Read and Summary the output#

We can summary the output of the ecm model, to see Focal sets or Number of outliers

ev_summary(model)
------ Credal partition ------
3 classes,
150 objects
Generated by recm
Focal sets:
[[0. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 1. 0.]
 [0. 0. 1.]
 [1. 0. 1.]
 [0. 1. 1.]
 [1. 1. 1.]]
Value of the criterion = 31.19
Nonspecificity = 0.12
Number of outliers = 0.00

Plot the creadal partition#

We can now plot the result based on the two features axes using ev_plot function

ev_plot(x=model,X=df) 
../_images/d9066015a127005a4f9ed494e233cd45e84d13dd9dab9e38139f27edd2f8572d.png