0

I have an AnnData object that has two columns: one with barcodes and another with cell types, like this:

barcodes cell_type

AAACGAACAGGATGTG-1 anterior pharynx

AAACGAAGTTAGGAGC-1 epithelium, ductal cells

AAACGAAGTTAGGAGC-1 NaN

In order to filter out cell types that I do not want, I am using the following command:

adata = adata[adata.obs['cell_type'] != 'leukocytes']

However, I want to get rid of the NaN values as well.

I have tried the following options, which have not worked

adata = adata[adata.obs['cell_type'] != 'NaN']


adata = adata[adata.obs['cell_type'] != np.nan]

I then used:

scATAC_adata_raw.obs.dropna(how="any")

which did the filtering but did not save it in the AnnData object.

Could you help me out filtering the NaN values out of the AnnData object? Thanks a lot

Progman
  • 16,827
  • 6
  • 33
  • 48

2 Answers2

0

How about nan_to_num()?

This function replaces Not a Number to 0

If you only wanna drop NaN you might use a dropna()

  • My goal is to drop the NaN, so I used the second option that you suggested. The issue is that when I use it with this command "scATAC_adata.obs.dropna(how="any")" and then I check the AnnData object with "scATAC_adata", the NaN filtering was not saved and the object has the exact same obs and vars as before the filtering. – Maria Pereira May 01 '23 at 13:10
  • When I deal with machine learning or deep learning dataframe, I never delete the column that which is a NaN, I replace them with 0. Everyone does this in the most professional curses, because deleting a value from the data frame can make a problem in your model. –  May 01 '23 at 22:55
  • See [this](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-a-certain-column-is-nan/13413845#13413845), it would be helpful. –  May 01 '23 at 23:04
0

A simple way to distinguish NaN from other values is that NaN!=NaN is true (at least in Python). This is also why adata.obs['cell_type'] != np.nan can only ever be true, nothing is equal to NaN. So you can always get a mask of whether something is NaN by testing this:

adata = adata[adata.obs['cell_type'] == adata.obs['cell_type']]

This will sort out all NaN values. As somebody else suggested, you could also use the pandas functionality here:

adata = adata[adata.obs['cell_type'].notna()]