2

I have a concatenated single-cell RNAseq anndata with

obs: 'Age', 'EPNsubtype', 'Region', 'Subclass', 
'Taxonomy_group', 'Tissue', 'batch', 'pheno', 'sample', 
'subtype', 'treatment', 'n_genes', 'percent_mito', 
'n_counts', 'leiden'

And I want to create another obs 'Sex' for the different 'samples'

I know I can create a new obs with

adata.obs["sex"] = "female"

but how would I do it for particular sample categories and not the entire set?

Thanks!

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
gmcr1
  • 21
  • 1
  • 2

1 Answers1

4

So the adata.obs (or the adata.var) attribute of the AnnData is a pandas.DataFrame. So you can use them as such.

For example, imagine that the adata.obs contains the information on the cells labeled AACT, AACG and AACC. Also imagine that the dataframe contains the information of the Age and the Tissue. The dataframe will contain 3 indexes which correspond to the labels of the cells and two columns which corresponds to Age and Tissue.

adata.obs
  Index   Age  Tissue
  AACT    26   Lung
  AACG    40   Lung
  AACC    34   Lung

Now, like you said, if you type adata.obs['sex'] = 'female', it will create a new column called sex with the string "female" for every index of the dataframe.

adata.obs
  Index   Age  Tissue  sex
  AACT    26   Lung    female
  AACG    40   Lung    female
  AACC    34   Lung    female

Imagine that the cell AACC and AACG actually comes from a "male" patient, you could write:

male_patients = ['AACC', 'AACG']
adata.obs.loc[male_patients, 'sex'] = 'male'

which would result in :

adata.obs
  Index   Age  Tissue  sex
  AACT    26   Lung    female
  AACG    40   Lung    male
  AACC    34   Lung    male

Note that I have used the .loc\[\] attribute to access to specific element of the dataframe by using the index name (['AACC', 'AACG']) and the columns names ('sex').

I suggest you follow some tutorial to learn how to work with python pandas DataFrame (example: (link)).

DoRemy95
  • 614
  • 3
  • 19