So the adata.obs
(or the adata.var
) attribute of the AnnData is a pandas.DataFrame. So you can use them as such.
For example, imagine that the adata.obs
contains the information on the cells labeled AACT
, AACG
and AACC
. Also imagine that the dataframe contains the information of the Age and the Tissue. The dataframe will contain 3 indexes which correspond to the labels of the cells and two columns which corresponds to Age
and Tissue
.
adata.obs
Index Age Tissue
AACT 26 Lung
AACG 40 Lung
AACC 34 Lung
Now, like you said, if you type adata.obs['sex'] = 'female'
, it will create a new column called sex
with the string "female" for every index of the dataframe.
adata.obs
Index Age Tissue sex
AACT 26 Lung female
AACG 40 Lung female
AACC 34 Lung female
Imagine that the cell AACC
and AACG
actually comes from a "male" patient, you could write:
male_patients = ['AACC', 'AACG']
adata.obs.loc[male_patients, 'sex'] = 'male'
which would result in :
adata.obs
Index Age Tissue sex
AACT 26 Lung female
AACG 40 Lung male
AACC 34 Lung male
Note that I have used the .loc\[\]
attribute to access to specific element of the dataframe by using the index name (['AACC', 'AACG']
) and the columns names ('sex'
).
I suggest you follow some tutorial to learn how to work with python pandas DataFrame (example: (link)).