shannonca.dimred.reduce(X, n_comps=50, iters=1, nbhds=None, nbhd_size=15, metric='euclidean', model='wilcoxon', keep_scores=False, keep_loadings=False, keep_all_iters=False, verbose=False, n_tests='auto', seed=10, chunk_size=None, **kwargs)

Compute an SCA reduction of the input data

  • X (numpy.ndarray | scipy.spmatrix) – (num cells)x(num_genes)-sized array or sparse matrix to be dimensionality-reduced.

  • n_comps (int) – Desired dimensionality of the reduction. Default 50

  • iters (int) – Number of iterations of SCA. More iterations usually strengthens signal, stabilizing around 3-5

  • nbhd_size (int) – Size of neighborhoods used to assess the local expression of a gene. Should be smaller than the smallest subpopulation; default is 15.

  • model (str) – Model used to test for local enrichment of genes, used to compute information scores. One of [“wilcoxon”,”binomial”,”ttest”], default “wilcoxon” (recommended).

  • nbhds (numpy.ndarray | list) – Optional - if k-neighborhoods of points are already determined, they can be specified here as a (num_cells)*k array or list. Otherwise, they will be computed from the PCA embedding. Default None

  • metric (str) – Metric used to compute k-nearest neighbor graphs for SCA score computation. Default “euclidean”. See sklearn.neighbors.DistanceMetric for list of choices.

  • keep_scores (bool) – if True, keep and return the information score matrix. Default False.

  • keep_loadings (bool) – If True, returns loadings of each gene in each metagene as a dense matrix. Default False.

  • verbose (bool) – If True, print progress. Default False

  • n_tests (str | int) – Effective number of independent genes per cell, use for FWER multiple testing correction. Set to “auto” to automatically determine by bootstrapping. Default “auto”.

  • kwargs – Other arguments to be passed to the chosen scorer


If return_scores or return_loadings are both false, a (n cells)x(n_comps)-dimensional array of reduced features. Otherwise, a dictionary with keys ‘reduction’, ‘scores’ and/or ‘loadings’.

Return type:

numpy.ndarray | dict