_cluster.py module

class wrappers._cluster.Cluster(infer_obj, configs_cluster, cluster_file_name=None)

Bases: object

A class for performing clustering on the latent space representation of data from a PREFFECT inference object.

Parameters:
  • infer_obj (Inference) -- The inference PREFFECT instance. Found in the PREFFECT object (preffect_object.inference_dict[inference_key]).

  • configs_cluster (dict) -- Configuration settings for the clustering run, which includes various operational parameters.

  • cluster_file_name (str, optional) -- Optional name for the cluster file. If not provided, it is taken from the configs_cluster dictionary.

cluster_counts(color_by='leiden', cluster_omega=False, umap_nneighbors=10, cluster_aim=5)

Extract the estimated counts (the mu of the gene-sample NB) from the Inference object, apply Leiden clustering (targeting up to 5 clusters), and visualize the results using UMAP.

This method:

  1. Retrieves an AnnData object containing the estimated counts for each gene-sample pair.

  2. Constructs a neighborhood graph and computes a UMAP embedding.

  3. Iteratively reduces the Leiden resolution until five or fewer clusters are obtained (or a minimum resolution is reached).

  4. Plots UMAP projections colored either by the specified column (color_by) or, by additional attributes such as batch or subtype (if available).

Parameters:
  • color_by (str, optional) -- Column name in adata.obs by which to color the UMAP plot. Defaults to 'leiden'.

  • cluster_omega (bool, optional) -- Whether to cluster the omega parameter. Defaults to False.

  • umap_nneighbors (int, optional) -- Number of neighbors to use for UMAP embedding. Defaults to 10.

  • cluster_aim (int, optional) -- Target number of clusters to aim for during Leiden clustering. Defaults to 5.

cluster_latent_space(color_by='leiden', umap_nneighbors=10, cluster_aim=5)

Extract the latent representation of the data from the parent Inference object, apply Leiden clustering (targeting up to 5 clusters), and visualize the results using UMAP.

This method:

  1. Retrieves an AnnData object containing the latent space representation.

  2. Constructs a neighborhood graph and computes a UMAP embedding.

  3. Iteratively reduces the Leiden resolution until five or fewer clusters are obtained (or a minimum resolution is reached).

  4. Plots UMAP projections colored either by the specified column (color_by) or, if present, by additional attributes such as batch or subtype.

Parameters:
  • color_by (str, optional) -- Column name in adata.obs by which to color the UMAP plot. Defaults to 'leiden'.

  • umap_nneighbors (int, optional) -- Number of neighbors to use for UMAP embedding. Defaults to 10.

  • cluster_aim (int, optional) -- Target number of clusters to aim for during Leiden clustering. Defaults to 5.

cluster_true_counts(color_by='leiden', umap_nneighbors=10, cluster_aim=5)

Extract the estimated counts (the mu of the gene-sample NB) from the Inference object, apply Leiden clustering (targeting up to 5 clusters), and visualize the results using UMAP.

This method:

  1. Retrieves an AnnData object containing the estimated counts for each gene-sample pair.

  2. Constructs a neighborhood graph and computes a UMAP embedding.

  3. Iteratively reduces the Leiden resolution until five or fewer clusters are obtained (or a minimum resolution is reached).

  4. Plots UMAP projections colored either by the specified column (color_by) or, by additional attributes such as batch or subtype (if available).

Parameters:
  • color_by (str, optional) -- Column name in adata.obs by which to color the UMAP plot. Defaults to 'leiden'.

  • cluster_omega (bool, optional) -- Whether to cluster the omega parameter. Defaults to False.

  • umap_nneighbors (int, optional) -- Number of neighbors to use for UMAP embedding. Defaults to 10.

  • cluster_aim (int, optional) -- Target number of clusters to aim for during Leiden clustering. Defaults to 5.

register_cluster()

Register the current cluster instance with the parent Inference object.

This method checks if a cluster with the same name as self.cluster_file_name is already registered in self.parent.clusters. If not found, it deep-copies the current cluster and stores it under self.cluster_file_name.

Raises:

PreffectError -- If the cluster name already exists and overwrite permission is set to False.