1

I´m not sure if the title was well picked, sorry for that. If this was already covered please let me know where I couldn´t find it. For an analysis that I am doing, I am working in JupyterLab mainly scanpy. I want to see the number of cells that are coexpressing certain genes in a leiden clustering. So far I was trying with pandas crosstab function and I get the number for each cluster. However, I have two conditions and there I´m struggling to separate the samples to get the cell counts separately.

The code I am using to get the total cell number which works fine.

pd.crosstab(adata_proc.obs['leiden_r05'], adata_proc.obs['CoEx'])

The code where I am struggling to get the numbers for the samples. I know that the aggfunc = ','.join is not the correct way but this is to explain what the problem is.

pd.crosstab(adata_proc.obs['leiden_r05'], adata_proc.obs['CoEx'], adata_proc.obs['sample'], aggfunc = ','.join)

I can get the name of the conditions out in the table but I don´t want this. I want the numbers for the 2 conditions. How is this possible? Maybe there is a way to do this in a separate function?

enter image description here

mozway
  • 194,879
  • 13
  • 39
  • 75
Greenline
  • 13
  • 5
  • 2
    please provide the input dataset as text, and the matching expected output as text – mozway Dec 19 '21 at 10:37
  • What do you mean by dataset as text? The output would be like: CoEx True Sample WT KO Leinde_r05 0 150 50 1 70 80 etc. – Greenline Dec 19 '21 at 11:11
  • 2
    please read [how to make reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Dec 19 '21 at 11:12
  • Thanks for the suggestion, unfortunately this isn´t working for me. – Greenline Dec 19 '21 at 12:15
  • Found a solution... Unfortunately not in the same table but for now it is doing the job. This displays the cell count for the KO samples. `pd.crosstab(adata_proc[np.in1d(adata_proc.obs['sample'], ['KO'])].obs['leiden_r05'], adata_proc[np.in1d(adata_proc.obs['sample'], ['KO'])].obs['CoEx'])` – Greenline Dec 19 '21 at 13:26

1 Answers1

0

Edit: Using crosstab, you'll need to add the 'CoEx' column to the index, and use the 'sample' as the column of interest:

pd.crosstab(index=[adata_proc.obs['leiden_r05'],adata_proc.obs['CoEx']], columns=[adata_proc.obs['sample']])

I suggest using the .groupby function:

adata_proc.obs.groupby(['leiden_r05','CoEx'])["sample"].value_counts()

Another option (a bit of an abuse) is the pivot_table interface. In your case it be:

pd.pivot_table(adata_proc.obs, index=["leiden_r05"], columns=["CoEx","sample"],values='barcode',  aggfunc=len, fill_value=0)

*The 'values' argument is here only to reduce the amounts of columns, an artifact of using an unfit method

  • The co-expression column just gives true or false for each cell. With that information you should be able to show it for WT and KO condition. I am not sure where you get the `values = 'barcode'` ? – Greenline Dec 20 '21 at 15:11
  • I've edited my answer. If i understand correctly, you want to know for each cluster, and for each co-expression status how many cells are KO and how many are WT (but i think I misunderstood). The item in the values parameter really doesn't matter here, as the aggregation function is just used to count the lines. – YotamW Constantini Dec 21 '21 at 21:02
  • Thanks for the effort, and sorry for the late response. If I run the code I get an error: ValueError: Grouper for 'leiden_r05' not 1-dimensional – Greenline Dec 22 '21 at 10:22
  • According to this question - could it be that you have two 'leiden_r05' columns? https://stackoverflow.com/questions/43298192/valueerror-grouper-for-something-not-1-dimensional – YotamW Constantini Dec 23 '21 at 12:02
  • Final edit, I hope this finds you well! – YotamW Constantini Dec 28 '21 at 16:33
  • Did it work? I tried running myself on scanpy objects and it works nicely for me – YotamW Constantini Jan 04 '22 at 07:16
  • Sorry for the late response, This was due to Christmas and new year. Thank you so much! It worked now. – Greenline Jan 05 '22 at 11:56
  • Very happy to hear @Greenline, you are invited to mark the answer as helpful :) – YotamW Constantini Feb 01 '22 at 16:49