evclust.catecm

evclust.catecm#

This module contains the main function for catecm.

A. J. Djiberou Mahamadou, V. Antoine, G. J. Christie and S. Moreno, “Evidential clustering for categorical data,” 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), New Orleans, LA, USA.

Module Contents#

evclust.catecm.catecm(X, c, type='full', alpha=1, beta=2, delta=10, epsi=0.001, maxit=20, disp=True)[source]#

Categorical Evidential c-means algorithm. catecm Evidential clustering for categorical data. The proposed algorithm, referred to as catECM, considers a new dissimilarity measure and introduces an alternating minimization scheme in order to obtain a credal partition.

Parameters:#

X (DataFrame):: Data containing only categorical variables with each variable having more than 1 modality
c (int):: The number of desired clusters.
alpha (float):: Weighting exponent to penalize focal sets with high elements. The value of alpha should be > 1.
beta (float):: The fuzziness weigthing exponent. The default value.
delta (float):: The distance to the empty set i.e. if the distance between an object and a cluster is greater than delta, the object is considered as an outlier.
type (str):: Type of focal sets (“simple”: empty set, singletons, and Omega; “full”: all 2^c subsets of Omega; “pairs”: empty set, singletons, Omega, and all or selected pairs).
epsi (float):: The stop criteria i.e., if the absolute difference between two consecutive inertia is less than epsillon, then the algorithm will stop.
maxit (int):: Maximum number of iterations.
disp (bool):: If True (default), intermediate results are displayed.

Returns:#

The credal partition (an object of class “credpart”).

Example:#

# CATECM clustering
import numpy as np
from evclust.catecm import catecm

df = np.loadtxt("https://archive.ics.uci.edu/ml/machine-learning-databases/soybean/soybean-small.data",
delimiter=",", dtype="O")
soybean = np.delete(df,  df.shape[1] - 1, axis=1)
clus = catecm(soybean, c=4, type='full', alpha=1, beta=2, delta=10,  epsi=1e-3, disp=True)
clus['mass"]

References:#

A. J. Djiberou Mahamadou, V. Antoine, G. J. Christie and S. Moreno, “Evidential clustering for categorical data,” 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), New Orleans, LA, USA.

Note

Keywords : clustering, categorical data, credal partition, evidential c-means, belief functions Preliminary results on three data sets show that cat-ECM is efficient for the analysis of data sets containing outliers and overlapping clusters. Additional validation work needs to be performed to understand how changes to the various parameters of cat-ECM affects the clustering solution, how these results vary with the number of objects in a data set, and how the performance of cat-ECM compares to closed categorical clustering methods. Nevertheless, the ability of cat-ECM to handle categorical data makes it highly useful for the analysis of survey data, which are common in for e.g. health research and which often contain categorical, discrete and continuous data types.

evclust.catecm.catecm_get_dom_vals_and_size(X)[source]#: Get the feature domains and size.

evclust.catecm.catecm_check_params(X)[source]#: Check the correcteness of input parameters.

evclust.catecm.catecm_init_centers_singletons(n_attr_doms, f, c, size_attr_doms)[source]#: Initialize the centers of clusters.

evclust.catecm.catecm_update_centers_focalsets_gt_2(c, f, F, w)[source]#: Update the centers of focal sets with size greater than two.

evclust.catecm.catecm_distance_objects_to_centers(F, f, n, size_attr_doms, _dom_vals, X, w)[source]#: Compute the distance between objects and clusters.

evclust.catecm.catecm_get_credal_partition(alpha, beta, delta, n, f, F, dist)[source]#: Compute the credal partition from the distances between objects and cluster centers.

evclust.catecm.catecm_update_centers_singletons(alpha, beta, f, F, c, size_attr_doms, n_attr_doms, _dom_vals, X, credal_p)[source]#: Update the centers of singletons.

evclust.catecm.catecm_cost(F, dist, beta, alpha, delta, credal_p)[source]#: Compute the cost (intertia) from an iteration.