2

I want to subset anndata on basis of clusters, but i am not able to understand how to do it.

I am running scVelo pipeline, and in that i ran tl.louvain function to cluster cells on basis of louvain. I got around 32 clusters, of which cluster 2 and 4 is of my interest, and i have to run the pipeline further on these clusters only. (Initially i had the loom file which i read in scVelo, so i have now the anndata.)

I tried using adata.obs["louvain"] which gave me the cluster information, but i need to write a new anndata with only 2 clusters and process further.

Please help on how to subset anndata. Any help is highly appreciated. (Being very new to it, i am finding it difficult to get)

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
sidrah maryam
  • 45
  • 2
  • 8
  • Give a short reproducible example of your code. With a small example data frame. It is much easier to undestand the question this way and your chance of getting help are much higher. – Elias Sep 16 '20 at 09:00
  • @Elias The data originally was in a loom file, with observation layer, having columns. after louvain clustering, got a new column with cluster names. So i wanted to have specific columne number from that. mask3 = (adata.obs["louvain"] == "1") | (adata.obs["louvain"] == "2") final3 = adata[mask3].copy() It helped. – sidrah maryam Oct 15 '20 at 09:12
  • @StupidWolf sure, will keep in check the next time. – sidrah maryam Oct 15 '20 at 09:14

2 Answers2

3

If your adata.obs has a "louvain" column that I'd expect after running tl.louvain, you could do the subsetting as adata[adata.obs["louvain"] == "2"] if you want to obtain one cluster and adata[adata.obs['louvain'].isin(['2', '4'])] for obtaining cluster 2 & 4.

puermaris
  • 48
  • 2
  • Thank you so much for the help. Yes it got solved that way too. mask3 = (adata.obs["louvain"] == "1") | (adata.obs["louvain"] == "2") final3 = adata[mask3].copy() also did the job. – sidrah maryam Oct 15 '20 at 09:13
0

Feel free to use this function I wrote for my work.

import AnnData
import numpy as np

def cluster_sampled(adata: AnnData, clusters: list, n_samples: int) -> AnnData:
    """Sample n_samples randomly from each louvain cluster from the provided clusters

    Parameters
    ----------
    adata
        AnnData object
    clusters
        List of clusters to sample from
    n_samples
        Number of samples to take from each cluster

    Returns
    -------
    AnnData
        Annotated data matrix with sampled cells from the clusters
    """
    l = []
    adata_cluster_sampled = adata[adata.obs["louvain"].isin(clusters), :].copy()
    for k, v in adata_cluster_sampled.obs.groupby("louvain").indices.items():
        l.append(np.random.choice(v, n_samples, replace=False))
    return adata_cluster_sampled[np.concatenate(l)]