API Reference
tdamapper.core
Core tools for creating and analyzing Mapper graphs.
The Mapper algorithm is a method for exploring the shape and structure of high-dimensional datasets, by constructing a graph representation called Mapper graph. The algorithm has three main steps:
Filtering: Apply a lens function (also called filter) to map the data points to a lower-dimensional space, such as a scalar value or a 2D plane.
Covering: Arrange the lens space into overlapping open sets, using a cover algorithm such as uniform intervals or balls.
Clustering: Group the data points in each open set into clusters, using a clustering algorithm such as single-linkage or DBSCAN.
The Mapper graph consists of nodes that represent clusters of data points, and edges that connect overlapping clusters (clusters obtained from different open sets can possibly overlap). For more details on the Mapper algorithm and its applications, see
Gurjeet Singh, Facundo Mémoli and Gunnar Carlsson, “Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition”, Eurographics Symposium on Point-Based Graphics, 2007.
This module provides the class tdamapper.core.MapperAlgorithm, which
encapsulates the algorithm and its parameters. The Mapper graph produced by
this module is a NetworkX graph object.
- class tdamapper.core.Cover[source]
Bases:
ParamsMixinAbstract interface for cover algorithms.
This is a naive implementation. Subclasses should override the methods of this class to implement more meaningful cover algorithms.
- apply(X)[source]
Covers the dataset with a single open set.
This is a naive implementation that returns a generator producing a single list containing all the ids if the original dataset. This method should be overridden by subclasses to implement more meaningful cover algorithms.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
A generator of lists of ids.
- Return type:
generator of lists of ints
- class tdamapper.core.FailSafeClustering(clustering=None, verbose=True)[source]
Bases:
ParamsMixinA delegating clustering algorithm that prevents failure.
This class wraps a clustering algorithm and handles any exceptions that may occur during the fitting process. If the clustering algorithm fails, instead of throwing an exception, a single cluster containing all points is returned. This can be useful for robustness and debugging purposes.
- Parameters:
clustering (An estimator compatible with scikit-learn’s clustering interface, typically from
sklearn.cluster.) – A clustering algorithm to delegate to.verbose (bool, optional.) – A flag to log clustering exceptions. Set to True to enable logging, or False to suppress it. Defaults to True.
- class tdamapper.core.MapperAlgorithm(**kwargs)[source]
Bases:
_MapperAlgorithmDEPRECATED: This class is deprecated and will be removed in a future release. Use
tdamapper.learn.MapperAlgorithm.
- class tdamapper.core.Proximity[source]
Bases:
CoverAbstract interface for proximity functions. A proximity function is a function that maps each point into a subset of the dataset that contains the point itself. Every proximity function defines also a covering algorithm based on proximity-netm that is implemented in this class.
Proximity functions, implemented as subclasses of this class, are a convenient way to implement open cover algorithms by using the proximity-net construction. Proximity-net is implemented by function
tdamapper.core.Proximity.apply().Subclasses should override the methods
tdamapper.core.Proximity.fit()andtdamapper.core.Proximity.search()of this class to implement more meaningful proximity functions.- apply(X)[source]
Covers the dataset using proximity-net.
This function applies an iterative algorithm to create the proximity-net. It picks an arbitrary point and forms an open cover calling the proximity function on the chosen point. The points contained in the open cover are then marked as covered, and discarded in the following steps. The procedure is repeated on the leftover points until every point is eventually covered.
This function returns a generator that yields each element of the proximity-net as a list of ids. The ids are the indices of the points in the original dataset.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
A generator of lists of ids.
- Return type:
generator of lists of ints
- fit(X)[source]
Train internal parameters.
This is a naive implementation that should be overridden by subclasses to implement more meaningful proximity functions.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
The object itself.
- Return type:
self
- search(x)[source]
Return a list of neighbors for the query point.
This is a naive implementation that returns all the points in the dataset as neighbors. This method should be overridden by subclasses to implement more meaningful proximity functions.
- Parameters:
x (Any) – A query point for which we want to find neighbors.
- Returns:
A list containing all the indices of the points in the dataset.
- Return type:
list[int]
- class tdamapper.core.TrivialClustering[source]
Bases:
ParamsMixinA clustering algorithm that returns a single cluster.
This class implements a trivial clustering algorithm that assigns all data points to the same cluster. It can be used as an argument of the class
tdamapper.core.MapperAlgorithmto skip clustering in the construction of the Mapper graph.
- class tdamapper.core.TrivialCover[source]
Bases:
CoverCover algorithm that covers data with a single subset containing the whole dataset.
This class creates a single open set that contains all the points in the dataset.
- tdamapper.core.aggregate_graph(X, graph, agg)[source]
Apply an aggregation function to the nodes of a graph.
This function takes a dataset and a graph, and computes an aggregation value for each node of the graph, based on the data points that are associated with that node. The aggregation function can be any callable that takes a list of values and returns a single value, such as sum, mean, max, min, etc.
The function returns a dictionary that maps each node of the graph to its aggregation value. The keys of the dictionary are the nodes of the graph, and the values are the aggregation values.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
graph (
networkx.Graph.) – The graph to apply the aggregation function to.agg (Callable.) – The aggregation function to use.
- Returns:
A dictionary of node-aggregation pairs.
- Return type:
dict
- tdamapper.core.mapper_connected_components(X, y, cover, clustering, n_jobs=1)[source]
Identify the connected components of the Mapper graph.
A connected component is a maximal set of nodes that are reachable from each other by following the edges. This function assigns a unique integer label to each point in the dataset, based on the connected component of the Mapper graph that it belongs to.
This function uses a union-find data structure to efficiently keep track of the connected components as it scans the points of the dataset. This approach should be faster than computing the Mapper graph by first calling
tdamapper.core.mapper_graph()and then callingnetworkx.connected_components()on it.- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
y (array-like of shape (n, k) or list-like of length n) – Lens values for the n points of the dataset.
cover (A class compatible with
tdamapper.core.Cover) – A cover algorithm.clustering (An estimator compatible with scikit-learn’s clustering interface, typically from
sklearn.cluster.) – The clustering algorithm to apply to each subset of the dataset.n_jobs (int) – The maximum number of parallel clustering jobs. This parameter is passed to the constructor of
joblib.Parallel. Defaults to 1.
- Returns:
A list of labels. The label at position i identifies the connected component of the point at position i in the dataset.
- Return type:
list[int]
- tdamapper.core.mapper_graph(X, y, cover, clustering, n_jobs=1)[source]
Create the Mapper graph.
This function first identifies the unique cluster labels that each point of the dataset belongs to. These labels are used to identify the nodes of the Mapper graph. Then the edges of the Mapper graph are created by checking if any two nodes share some points in their corresponding clusters.
This function return the Mapper graph as an object of type
networkx.Graph. Each node has an attribute ‘size’ that stores the number of points contained in its corresponding cluster, and an attribute ‘ids’ that stores the indices of the points in the dataset that are contained in the cluster.- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
y (array-like of shape (n, k) or list-like of length n) – Lens values for the n points of the dataset.
cover (A class compatible with
tdamapper.core.Cover) – A cover algorithm.clustering (An estimator compatible with scikit-learn’s clustering interface, typically from
sklearn.cluster.) – The clustering algorithm to apply to each subset of the dataset.n_jobs (int) – The maximum number of parallel clustering jobs. This parameter is passed to the constructor of
joblib.Parallel. Defaults to 1.
- Returns:
The Mapper graph.
- Return type:
networkx.Graph
- tdamapper.core.mapper_labels(X, y, cover, clustering, n_jobs=1)[source]
Identify the nodes of the Mapper graph.
The function first covers the lens space with overlapping sets, using the cover algorithm provided. Then, for each set, it clusters the points of the dataset that have lens values within that set, using the clustering algorithm provided. The clusters are then labeled with unique integers, starting from zero for each set. The function then adds an offset to the cluster labels, such that the labels are distinct across all sets. The offset is equal to the maximum label of the previous set plus one.
The function returns a list of node labels for each point in the dataset. The list at position i contains the labels of the nodes that the point at position i belongs to. The labels are sorted in ascending order, and there are no duplicates. If i < j, the labels at position i are strictly less than those at position j.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
y (array-like of shape (n, k) or list-like of length n) – Lens values for the n points of the dataset.
cover (A class compatible with
tdamapper.core.Cover) – A cover algorithm.clustering (An estimator compatible with scikit-learn’s clustering interface, typically from
sklearn.cluster.) – A clustering algorithm.n_jobs (int) – The maximum number of parallel clustering jobs. This parameter is passed to the constructor of
joblib.Parallel. Defaults to 1.
- Returns:
A list of node labels for each point in the dataset.
- Return type:
list[list[int]]
tdamapper.cover
Open cover construction for the Mapper algorithm.
An open cover is a collection of open subsets of a dataset whose union spans the whole dataset. Unlike clustering, open subsets do not need to be disjoint. Indeed, the overlaps of the open subsets define the edges of the Mapper graph.
- class tdamapper.cover.BallCover(radius=1.0, metric='euclidean', metric_params=None, kind='flat', leaf_capacity=1, leaf_radius=None, pivoting=None)[source]
Bases:
ProximityCover algorithm based on ball proximity function, which covers data with open balls of fixed radius.
An open ball is a set of points within a specified distance from a center point. This class maps each point to its corresponding open ball with a fixed radius centered on the point itself.
- Parameters:
radius (float) – The radius of the open balls. Must be a positive value. Defaults to 1.0.
metric (str or callable) – The metric used to define the distance between points. Accepts any value compatible with tdamapper.utils.metrics.get_metric. Defaults to ‘euclidean’.
metric_params (dict, optional) – Additional parameters for the metric function, to be passed to tdamapper.utils.metrics.get_metric. Defaults to None.
kind (str) – Specifies whether to use a flat or a hierarchical vantage point tree. Acceptable values are ‘flat’ or ‘hierarchical’. Defaults to ‘flat’.
leaf_capacity (int) – The maximum number of points in a leaf node of the vantage point tree. Must be a positive value. Defaults to 1.
leaf_radius (float, optional) – The radius of the leaf nodes. If not specified, it defaults to the value of radius. Must be a positive value. Defaults to None.
pivoting (str or callable, optional) – The method used for pivoting in the vantage point tree. Acceptable values are None, ‘random’, or ‘furthest’. Defaults to None.
- fit(X)[source]
Train internal parameters.
This method creates a vptree on the dataset in order to perform fast range queries in the func:tdamapper.cover.BallCover.search method.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
The object itself.
- Return type:
self
- class tdamapper.cover.BaseCubicalCover(n_intervals=1, overlap_frac=None, kind='flat', leaf_capacity=1, leaf_radius=None, pivoting=None)[source]
Bases:
object- fit(X)[source]
Train internal parameters.
This method builds an internal
tdamapper.cover.BallCoverattribute that allows efficient queries of the dataset.- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
The object itself.
- Return type:
self
- search(x)[source]
Return a list of neighbors for the query point.
This method takes a target point as input and returns the hypercube whose center is closest to the target point.
- Parameters:
x (Any) – A query point for which we want to find neighbors.
- Returns:
The indices of the neighbors contained in the dataset.
- Return type:
list[int]
- class tdamapper.cover.CubicalCover(n_intervals=1, overlap_frac=None, algorithm='proximity', kind='flat', leaf_capacity=1, leaf_radius=None, pivoting=None)[source]
Bases:
CoverWrapper class for cubical cover algorithms, which cover data with open hypercubes of uniform size and overlap. This class delegates its methods to either
tdamapper.cover.StandardCubicalCoverortdamapper.cover.ProximityCubicalCover, based on the algorithm parameter.A hypercube is a multidimensional generalization of a square or a cube. The size and overlap of the hypercubes are determined by the number of intervals and the overlap fraction parameters.
- Parameters:
n_intervals (int) – The number of intervals to use for each dimension. Must be positive and less than or equal to the length of the dataset. Defaults to 1.
overlap_frac (float) – The fraction of overlap between adjacent intervals on each dimension, must be in the range (0.0, 0.5]. If not specified, the overlap_frac is computed such that the volume of the overlap within each hypercube is half the total volume. Defaults to None.
algorithm (str) – Specifies whether to use standard cubical cover, as in
tdamapper.cover.StandardCubicalCoveror proximity cubical cover, as intdamapper.cover.ProximityCubicalCover. Acceptable values are ‘standard’ or ‘proximity’. Defaults to ‘proximity’.kind (str) – Specifies whether to use a flat or a hierarchical vantage point tree. Acceptable values are ‘flat’ or ‘hierarchical’. Defaults to ‘flat’.
leaf_capacity (int) – The maximum number of points in a leaf node of the vantage point tree. Must be a positive value. Defaults to 1.
leaf_radius (float, optional) – The radius of the leaf nodes. If not specified, it defaults to the value of radius. Must be a positive value. Defaults to None.
pivoting (str or callable, optional) – The method used for pivoting in the vantage point tree. Acceptable values are None, ‘random’, or ‘furthest’. Defaults to None.
- apply(X)[source]
Covers the dataset using hypercubes.
This method delegates to the apply method of the internal cubical cover used.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
A generator of lists of ids.
- Return type:
generator of lists of ints
- fit(X)[source]
Train internal parameters.
This method delegates to the
fit()method of the internal cubical cover used.- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
The object itself.
- Return type:
self
- search(x)[source]
Return a list of neighbors for the query point.
This method delegates to the search method of the internal cubical cover used.
- Parameters:
x (Any) – A query point for which we want to find neighbors.
- Returns:
The indices of the neighbors contained in the dataset.
- Return type:
list[int]
- class tdamapper.cover.KNNCover(neighbors=1, metric='euclidean', metric_params=None, kind='flat', leaf_capacity=None, leaf_radius=0.0, pivoting=None)[source]
Bases:
ProximityCover algorithm based on KNN proximity function, which covers data using k-nearest neighbors (KNN).
This class maps each point to the set of the k nearest neighbors to the point itself.
- Parameters:
neighbors (int) – The number of neighbors to use for the KNN Proximity function, must be positive and less than the length of the dataset. Defaults to 1.
metric (str or callable) – The metric used to define the distance between points. Accepts any value compatible with tdamapper.utils.metrics.get_metric. Defaults to ‘euclidean’.
metric_params (dict, optional) – Additional parameters for the metric function, to be passed to tdamapper.utils.metrics.get_metric. Defaults to None.
kind (str) – Specifies whether to use a flat or a hierarchical vantage point tree. Acceptable values are ‘flat’ or ‘hierarchical’. Defaults to ‘flat’.
leaf_capacity (int, optional) – The maximum number of points in a leaf node of the vantage point tree. If not specified, it defaults to the value of neighbors. Must be a positive value. Defaults to None.
leaf_radius (float) – The radius of the leaf nodes. Must be a positive value. Defaults to 0.0.
pivoting (str or callable, optional) – The method used for pivoting in the vantage point tree. Acceptable values are None, ‘random’, or ‘furthest’. Defaults to None.
- fit(X)[source]
Train internal parameters.
This method creates a vptree on the dataset in order to perform fast KNN queries in the func:tdamapper.cover.BallCover.search method.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
The object itself.
- Return type:
self
- search(x)[source]
Return a list of neighbors for the query point.
This method queries the internal vptree in order to perform fast KNN queries.
- Parameters:
x (Any) – A query point for which we want to find neighbors.
- Returns:
The indices of the neighbors contained in the dataset.
- Return type:
list[int]
- class tdamapper.cover.ProximityCubicalCover(n_intervals=1, overlap_frac=None, kind='flat', leaf_capacity=1, leaf_radius=None, pivoting=None)[source]
Bases:
BaseCubicalCover,ProximityCover algorithm based on the cubical proximity function, which covers data with open hypercubes of uniform size and overlap. The cubical cover is obtained by selecting a subsect of all the hypercubes that intersect the dataset using proximity net (see
tdamapper.core.Proximity). For an open cover containing all the hypercubes interecting the dataset usetdamapper.core.StandardCubicalCover.A hypercube is a multidimensional generalization of a square or a cube. The size and overlap of the hypercubes are determined by the number of intervals and the overlap fraction parameters. This class maps each point to the hypercube with the nearest center.
- Parameters:
n_intervals (int) – The number of intervals to use for each dimension. Must be positive and less than or equal to the length of the dataset. Defaults to 1.
overlap_frac (float) – The fraction of overlap between adjacent intervals on each dimension, must be in the range (0.0, 0.5]. If not specified, the overlap_frac is computed such that the volume of the overlap within each hypercube is half the total volume. Defaults to None.
kind (str) – Specifies whether to use a flat or a hierarchical vantage point tree. Acceptable values are ‘flat’ or ‘hierarchical’. Defaults to ‘flat’.
leaf_capacity (int) – The maximum number of points in a leaf node of the vantage point tree. Must be a positive value. Defaults to 1.
leaf_radius (float, optional) – The radius of the leaf nodes. If not specified, it defaults to the value of radius. Must be a positive value. Defaults to None.
pivoting (str or callable, optional) – The method used for pivoting in the vantage point tree. Acceptable values are None, ‘random’, or ‘furthest’. Defaults to None.
- class tdamapper.cover.StandardCubicalCover(n_intervals=1, overlap_frac=None, kind='flat', leaf_capacity=1, leaf_radius=None, pivoting=None)[source]
Bases:
BaseCubicalCover,CoverCover algorithm based on the standard open cover, which covers data with open hypercubes of uniform size and overlap. The standard cover is obtained by selecting all the hypercubes that intersect the dataset.
A hypercube is a multidimensional generalization of a square or a cube. The size and overlap of the hypercubes are determined by the number of intervals and the overlap fraction parameters. This class maps each point to the hypercube with the nearest center.
- Parameters:
n_intervals (int) – The number of intervals to use for each dimension. Must be positive and less than or equal to the length of the dataset. Defaults to 1.
overlap_frac (float) – The fraction of overlap between adjacent intervals on each dimension, must be in the range (0.0, 0.5]. If not specified, the overlap_frac is computed such that the volume of the overlap within each hypercube is half the total volume. Defaults to None.
kind (str) – Specifies whether to use a flat or a hierarchical vantage point tree. Acceptable values are ‘flat’ or ‘hierarchical’. Defaults to ‘flat’.
leaf_capacity (int) – The maximum number of points in a leaf node of the vantage point tree. Must be a positive value. Defaults to 1.
leaf_radius (float, optional) – The radius of the leaf nodes. If not specified, it defaults to the value of radius. Must be a positive value. Defaults to None.
pivoting (str or callable, optional) – The method used for pivoting in the vantage point tree. Acceptable values are None, ‘random’, or ‘furthest’. Defaults to None.
- apply(X)[source]
Covers the dataset using landmarks.
This function yields all the hypercubes intersecting the dataset.
This function returns a generator that yields each element of the open cover as a list of ids. The ids are the indices of the points in the original dataset.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
- Returns:
A generator of lists of ids.
- Return type:
generator of lists of ints
tdamapper.learn
This module provides classes based on the Mapper algorithm, a technique from topological data analysis (TDA) for extracting insights from complex data. Each class is designed to be compatible with scikit-learn’s estimator APIs, ensuring seamless integration with existing machine learning pipelines.
Users can leverage these classes to explore high-dimensional data, visualize relationships, and uncover meaningful structures in a manner that aligns with scikit-learn’s conventions for estimators.
- class tdamapper.learn.MapperAlgorithm(cover=None, clustering=None, failsafe=True, verbose=True, n_jobs=1)[source]
Bases:
_MapperAlgorithmA class for creating and analyzing Mapper graphs.
This class provides two methods
fit()andfit_transform(). Once fitted, the Mapper graph is stored in the attribute graph_ as anetworkx.Graphobject.This class adopts the same interface as scikit-learn’s estimators for ease and consistency of use. However, it’s important to note that this is not a proper scikit-learn estimator as it does not validata the input in the same way as a scikit-learn estimator is required to do. This class can work with datasets whose elements are arbitrary objects when feasible for the supplied parameters.
- Parameters:
cover (A class compatible with
tdamapper.core.Cover) – A cover algorithm. If no cover is specified,tdamapper.core.TrivialCoveris used, which produces a single open cover containing the whole dataset. Defaults to None.clustering (An estimator compatible with scikit-learn’s clustering interface, typically from
sklearn.cluster.) – The clustering algorithm to apply to each subset of the dataset. If no clustering is specified,tdamapper.core.TrivialClusteringis used, which produces a single cluster for each subset. Defaults to None.failsafe (bool, optional) – A flag that is used to prevent failures. If True, the clustering object is wrapped by
tdamapper.core.FailSafeClustering. Defaults to True.verbose (bool, optional) – A flag that is used for logging, supplied to
tdamapper.core.FailSafeClustering. If True, clustering failures are logged. Set to False to suppress these messages. Defaults to True.n_jobs (int) – The maximum number of parallel clustering jobs. This parameter is passed to the constructor of
joblib.Parallel. Defaults to 1.
- fit(X, y=None)[source]
Create the Mapper graph and store it for later use.
This method stores the result of
tdamapper.core.mapper_graph()in the attribute graph_ and returns a reference to the calling object.- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
y (array-like of shape (n, k) or list-like of length n) – Lens values for the n points of the dataset.
- Returns:
The object itself.
- fit_transform(X, y)[source]
Create the Mapper graph.
This method is equivalent to calling
tdamapper.core.mapper_graph().- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
y (array-like of shape (n, k) or list-like of length n) – Lens values for the n points of the dataset.
- Returns:
The Mapper graph.
- Return type:
networkx.Graph
- class tdamapper.learn.MapperClustering(cover=None, clustering=None, n_jobs=1)[source]
Bases:
_MapperClusteringA clustering algorithm based on the Mapper graph.
The Mapper algorithm constructs a graph from a dataset, where each node represents a cluster of points and each edge represents an overlap between clusters. Each point in the dataset belongs to one or more nodes in the graph. These nodes are therefore all connected and share the same connected component in the Mapper graph. This class builds clusters of points according to their connected component in the Mapper graph.
This class does not compute the Mapper graph but calls the function
tdamapper.core.mapper_connected_components(), which is faster.- Parameters:
cover (A class compatible with
tdamapper.core.Cover) – A cover algorithm.clustering (A class compatible with scikit-learn estimators from
sklearn.cluster) – The clustering algorithm to apply to each subset of the dataset.n_jobs (int) – The maximum number of parallel clustering jobs. This parameter is passed to the constructor of
joblib.Parallel. Defaults to 1.
tdamapper.clustering
Clustering tools based on the Mapper algorithm.
- class tdamapper.clustering.FailSafeClustering(**kwargs)[source]
Bases:
FailSafeClusteringDEPRECATED: This class is deprecated and will be removed in a future release. Use
tdamapper.core.FailSafeClustering.
- class tdamapper.clustering.MapperClustering(**kwargs)[source]
Bases:
_MapperClusteringDEPRECATED: This class is deprecated and will be removed in a future release. Use
tdamapper.learn.MapperClustering.
- class tdamapper.clustering.TrivialClustering(**kwargs)[source]
Bases:
TrivialClusteringDEPRECATED: This class is deprecated and will be removed in a future release. Use
tdamapper.core.TrivialClustering.
tdamapper.utils.metrics
Utilities for computing metrics.
This module provides functions to calculate various distance metrics. A metric, or distance function, is a function that maps two points to a double value, representing the “distance” between them. For a function to qualify as a valid metric, it must satisfy the following properties:
- Symmetry: The distance between two points is the same regardless of the
order, i.e.: \(d(x, y) = d(y, x)\) for all x and y.
- Positivity: The distance between two distinct points is always positive,
i.e.: \(d(x, y) > 0\) for all distinct x and y, and \(d(x, x) = 0\) for every x.
- Triangle inequality: The distance between two points is less than or equal
to the sum of the distances from a third point, i.e.: \(d(x, z) \leq d(x, y) + d(y, z)\) for all points x, y, z.
Supported distance metrics include: - Euclidean: The square root of the sum of squared differences between the components of vectors. - Minkowski: A generalization of the Euclidean and Chebyshev distances, parameterized by an order p. - Chebyshev: The maximum absolute difference between the components of vectors. - Cosine: A distance on unit vectors based on cosine similarity.
- tdamapper.utils.metrics.chebyshev()[source]
Return the Chebyshev distance function for vectors.
The Chebyshev distance is defined as the maximum absolute difference between the components of the vectors.
- Returns:
The Chebyshev distance function.
- Return type:
callable
- tdamapper.utils.metrics.cosine()[source]
Return the cosine distance function for vectors.
The cosine similarity between the input vectors ranges from -1.0 to 1.0. - A value of 1.0 indicates that the vectors are in the same direction. - A value of 0.0 indicates orthogonality (the vectors are perpendicular). - A value of -1.0 indicates that the vectors are diametrically opposed.
The cosine distance is derived from the cosine similarity \(s\) and is defined as: \(d(x, y) = \sqrt{2 \cdot (1 - s(x, y))}\)
This definition ensures that the cosine distance satisfies the triangle inequality on unit vectors.
- Returns:
The cosine distance function.
- Return type:
callable
- tdamapper.utils.metrics.euclidean()[source]
Return the Euclidean distance function for vectors.
The Euclidean distance is defined as the square root of the sum of the squared differences between the components of the vectors.
- Returns:
The Euclidean distance function.
- Return type:
callable
- tdamapper.utils.metrics.get_metric(metric, **kwargs)[source]
Return a distance function based on the specified string or callable.
- Parameters:
metric (str or callable) – The metric to use. If a callable function is provided, it is returned directly. Otherwise, predefined metric names returned by get_supported_metrics() are supported.
kwargs (dict) – Additional keyword arguments (e.g., ‘p’ for Minkowski distance).
- Returns:
The selected distance metric function.
- Return type:
callable
- Raises:
ValueError – If an invalid metric string is provided.
- tdamapper.utils.metrics.get_supported_metrics()[source]
Return a list of supported metric names.
- Returns:
A list of supported metric names.
- Return type:
list of str
- tdamapper.utils.metrics.manhattan()[source]
Return the Manhattan distance function for vectors.
The Manhattan distance is defined as the sum of the absolute differences between the components of the vectors.
- Returns:
The Manhattan distance function.
- Return type:
callable
- tdamapper.utils.metrics.minkowski(p)[source]
Return the Minkowski distance function for order p on vectors.
The Minkowski distance is a generalization of the Euclidean and Chebyshev distances. When p = 1, it is equivalent to the Manhattan distance, and when p = 2, it is equivalent to the Euclidean distance. When p is infinite, it is equivalent to the Chebyshev distance.
- Parameters:
p (int) – The order of the Minkowski distance.
- Returns:
The Minkowski distance function.
- Return type:
callable
tdamapper.utils.vptree
A module for fast knn and range searches, depending only on a given metric
- class tdamapper.utils.vptree.VPTree(X, metric='euclidean', metric_params=None, kind='flat', leaf_capacity=1, leaf_radius=0.0, pivoting=None)[source]
Bases:
objectA Vantage Point Tree, or vp-tree, for fast range-queries and knn-queries.
- Parameters:
X (array-like of shape (n, m) or list-like of length n) – A dataset of n points.
metric (str or callable) – The metric used to define the distance between points. Accepts any value compatible with tdamapper.utils.metrics.get_metric. Defaults to ‘euclidean’.
metric_params (dict, optional) – Additional parameters for the metric function, to be passed to tdamapper.utils.metrics.get_metric. Defaults to None.
kind (str) – Specifies whether to use a flat or a hierarchical vantage point tree. Acceptable values are ‘flat’ or ‘hierarchical’. Defaults to ‘flat’.
leaf_capacity (int) – The maximum number of points in a leaf node of the vantage point tree. Must be a positive value. Defaults to 1.
leaf_radius (float) – The radius of the leaf nodes. Must be a positive value. Defaults to 0.0.
pivoting (str or callable, optional) – The method used for pivoting in the vantage point tree. Acceptable values are None, ‘random’, or ‘furthest’. Defaults to None.
- ball_search(point, eps, inclusive=True)[source]
Perform a ball search in the Vantage Point Tree.
This method searches for all points within a specified radius from a given point.
- Parameters:
point (objet, list, or array-like) – The query point from which to search for neighbors.
eps (float) – The radius within which to search for neighbors. Must be positive.
inclusive (bool) – Whether to include points exactly at the distance eps from point. Defaults to True.
- Returns:
A list of points within the specified radius from the given query point.
- Return type:
list
- knn_search(point, k)[source]
Perform a k-nearest neighbors search in the Vantage Point Tree.
This method searches for the k-nearest neighbors to a given query point.
- Parameters:
point (objet, list, or array-like) – The point from which to search for nearest neighbors.
k (int) – The number of nearest neighbors to search for. Must be positive.
- Returns:
A list of the k-nearest neighbors to the given query point.
- Return type:
list
tdamapper.plot
This module provides functionalities to visualize the Mapper graph.
- class tdamapper.plot.MapperLayoutInteractive(**kwargs)[source]
Bases:
objectDEPRECATED: This class is deprecated and will be removed in a future release. Use
tdamapper.plot.MapperPlot.Class for generating and visualizing the Mapper graph.
This class creates a metric embedding of the Mapper graph in 2D or 3D and converts it into a Plotly figure suitable for interactive display.
- Parameters:
graph (
networkx.Graph, required) – The precomputed Mapper graph to be embedded. This can be obtained by callingtdamapper.core.mapper_graph()ortdamapper.core.MapperAlgorithm.fit_transform().dim (int) – The dimension of the graph embedding (2 or 3).
seed (int, optional (default: 42)) – The random seed used to construct the graph embedding.
iterations (int, optional (default: 50)) – The number of iterations used to construct the graph embedding.
colors (array-like of shape (n,) or list-like of size n) – An array of values that determine the color of each node in the graph, useful for highlighting different features of the data.
agg (Callable, optional) – A function used to aggregate the colors array over the points within a single node. The final color of each node is obtained by mapping the aggregated value with the colormap cmap. Defaults to numpy.nanmean.
title (str, optional) – The title to be displayed alongside the figure.
width (int, optional (default: 512)) – The desired width of the figure in pixels.
height (int, optional (default: 512)) – The desired height of the figure in pixels.
cmap (str, optional) – The name of a colormap used to map color data values, aggregated by agg, to actual RGBA colors.
- plot()[source]
Plot the Mapper graph.
- Returns:
An interactive Plotly figure that can be displayed on screen and notebooks. For 3D embeddings, the figure requires a WebGL context to be shown.
- Return type:
plotly.graph_objects.Figure
- update(seed=None, iterations=None, colors=None, agg=None, title=None, width=None, height=None, cmap=None)[source]
Update the figure.
This method modifies the figure returned by the plot function. After calling this method, the figure will be updated according to the supplied parameters.
- Parameters:
seed (int, optional) – The random seed used to construct the graph embedding.
iterations (int, optional) – The number of iterations used to construct the graph embedding.
colors (array-like of shape (n,) or list-like of size n) – An array of values that determine the color of each node in the graph, useful for highlighting different features of the data.
agg (Callable, optional) – A function used to aggregate the colors data when multiple points are mapped to a single node. The final color of each node is obtained by mapping the aggregated value with the colormap cmap. Defaults to None.
title (str, optional) – The title to be displayed alongside the figure.
width (int, optional) – The desired width of the figure in pixels.
height (int, optional) – The desired height of the figure in pixels.
cmap (str, optional) – The name of a colormap used to map color data values, aggregated by agg, to actual RGBA colors.
- class tdamapper.plot.MapperLayoutStatic(**kwargs)[source]
Bases:
objectDEPRECATED: This class is deprecated and will be removed in a future release. Use
tdamapper.plot.MapperPlot.Class for generating and visualizing the Mapper graph.
This class creates a metric embedding of the Mapper graph in 2D and converts it into a matplotlib figure suitable for static display.
- Parameters:
graph (
networkx.Graph, required) – The precomputed Mapper graph to be embedded. This can be obtained by callingtdamapper.core.mapper_graph()ortdamapper.core.MapperAlgorithm.fit_transform().dim (int) – The dimension of the graph embedding (only 2 is supported, for compatibility).
seed (int, optional (default: 42)) – The random seed used to construct the graph embedding.
iterations (int, optional (default: 50)) – The number of iterations used to construct the graph embedding.
colors (array-like of shape (n,) or list-like of size n) – An array of values that determine the color of each node in the graph, useful for highlighting different features of the data.
agg (Callable, optional) – A function used to aggregate the colors array over the points within a single node. The final color of each node is obtained by mapping the aggregated value with the colormap cmap. Defaults to numpy.nanmean.
title (str, optional) – The title to be displayed alongside the figure.
width (int, optional (default: 512)) – The desired width of the figure in pixels.
height (int, optional (default: 512)) – The desired height of the figure in pixels.
cmap (str, optional) – The name of a colormap used to map color data values, aggregated by agg, to actual RGBA colors.
- class tdamapper.plot.MapperPlot(graph, dim, iterations=50, seed=None)[source]
Bases:
objectClass for generating and visualizing the Mapper graph.
This class creates a metric embedding of the Mapper graph in 2D or 3D and converts it into a plot.
- Parameters:
graph (
networkx.Graph, required) – The precomputed Mapper graph to be embedded. This can be obtained by callingtdamapper.core.mapper_graph()ortdamapper.core.MapperAlgorithm.fit_transform().dim (int) – The dimension of the graph embedding (2 or 3).
iterations (int, optional) – The number of iterations used to construct the graph embedding. Defaults to 50.
seed (int, optional) – The random seed used to construct the graph embedding. Defaults to None.
- plot_matplotlib(colors, agg=<function nanmean>, title=None, width=512, height=512, cmap='jet')[source]
Draw a static plot using Matplotlib.
- Parameters:
colors (array-like of shape (n,) or list-like of size n) – An array of values that determine the color of each node in the graph, useful for highlighting different features of the data.
agg (Callable, optional) – A function used to aggregate the colors array over the points within a single node. The final color of each node is obtained by mapping the aggregated value with the colormap cmap. Defaults to numpy.nanmean.
title (str, optional) – The title to be displayed alongside the figure.
width (int, optional) – The desired width of the figure in pixels. Defaults to 512.
height (int, optional) – The desired height of the figure in pixels. Defaults to 512
cmap (str, optional) – The name of a colormap used to map colors data values, aggregated by agg, to actual RGBA colors. Defaults to ‘jet’.
- Returns:
A static matplotlib figure that can be displayed on screen and notebooks.
- Return type:
matplotlib.figure.Figure,matplotlib.axes.Axes
- plot_plotly(colors, agg=<function nanmean>, title=None, width=512, height=512, cmap='jet')[source]
Draw an interactive plot using Plotly.
- Parameters:
colors (array-like of shape (n,) or list-like of size n) – An array of values that determine the color of each node in the graph, useful for highlighting different features of the data.
agg (Callable, optional) – A function used to aggregate the colors array over the points within a single node. The final color of each node is obtained by mapping the aggregated value with the colormap cmap. Defaults to numpy.nanmean.
title (str, optional) – The title to be displayed alongside the figure.
width (int, optional) – The desired width of the figure in pixels. Defaults to 512.
height (int, optional) – The desired height of the figure in pixels. Defaults to 512.
cmap (str, optional) – The name of a colormap used to map colors data values, aggregated by agg, to actual RGBA colors. Defaults to ‘jet’.
- Returns:
An interactive Plotly figure that can be displayed on screen and notebooks. For 3D embeddings, the figure requires a WebGL context to be shown.
- Return type:
plotly.graph_objects.Figure
- plot_plotly_update(fig, colors=None, agg=None, title=None, width=None, height=None, cmap=None)[source]
Draw an interactive plot using Plotly on a previously rendered figure.
This is typically faster than calling MapperPlot.plot_plotly on a new set of parameters.
- Parameters:
fig (
plotly.graph_objects.Figure) – A Plotly Figure object obtained by calling the method MapperPlot.plot_plotly.colors (array-like of shape (n,) or list-like of size n, optional) – An array of values that determine the color of each node in the graph, useful for highlighting different features of the data. Defaults to None.
agg (Callable, optional) – A function used to aggregate the colors array over the points within a single node. The final color of each node is obtained by mapping the aggregated value with the colormap cmap. Defaults to None.
title (str, optional) – The title to be displayed alongside the figure. Defaults to None.
width (int, optional) – The desired width of the figure in pixels. Defaults to None.
height (int, optional) – The desired height of the figure in pixels. Defaults to None.
cmap (str, optional) – The name of a colormap used to map colors data values, aggregated by agg, to actual RGBA colors. Defaults to None.
- Returns:
An interactive Plotly figure that can be displayed on screen and notebooks. For 3D embeddings, the figure requires a WebGL context to be shown.
- Return type:
plotly.graph_objects.Figure
- plot_pyvis(output_file, colors, agg=<function nanmean>, title=None, width=512, height=512, cmap='jet')[source]
Draw an interactive HTML plot using PyVis.
- Parameters:
output_file (str) – The path where the html file is written.
colors (array-like of shape (n,) or list-like of size n) – An array of values that determine the color of each node in the graph, useful for highlighting different features of the data.
agg (Callable, optional) – A function used to aggregate the colors array over the points within a single node. The final color of each node is obtained by mapping the aggregated value with the colormap cmap. Defaults to numpy.nanmean.
title (str, optional) – The title to be displayed alongside the figure. Defaults to None.
width (int, optional) – The desired width of the figure in pixels. Defaults to 512.
height (int, optional) – The desired height of the figure in pixels. Defaults to 512.
cmap (str, optional) – The name of a colormap used to map colors data values, aggregated by agg, to actual RGBA colors. Defaults to ‘jet’.