Remove NaN values from AnnData object

Question

I have an AnnData object that has two columns: one with barcodes and another with cell types, like this:

barcodes cell_type

AAACGAACAGGATGTG-1 anterior pharynx

AAACGAAGTTAGGAGC-1 epithelium, ductal cells

AAACGAAGTTAGGAGC-1 NaN

In order to filter out cell types that I do not want, I am using the following command:

adata = adata[adata.obs['cell_type'] != 'leukocytes']

However, I want to get rid of the NaN values as well.

I have tried the following options, which have not worked

adata = adata[adata.obs['cell_type'] != 'NaN']


adata = adata[adata.obs['cell_type'] != np.nan]

I then used:

scATAC_adata_raw.obs.dropna(how="any")

which did the filtering but did not save it in the AnnData object.

Could you help me out filtering the NaN values out of the AnnData object? Thanks a lot

@Progman I am using python – Maria Pereira May 01 '23 at 12:08 — Maria Pereira, May 01 '23 at 12:08

score 0 · Answer 1 · 2023-05-01T13:02:14.347

0

How about nan_to_num()?

This function replaces Not a Number to 0

If you only wanna drop NaN you might use a dropna()

edited May 01 '23 at 13:02

answered May 01 '23 at 12:53

My goal is to drop the NaN, so I used the second option that you suggested. The issue is that when I use it with this command "scATAC_adata.obs.dropna(how="any")" and then I check the AnnData object with "scATAC_adata", the NaN filtering was not saved and the object has the exact same obs and vars as before the filtering. – Maria Pereira May 01 '23 at 13:10
When I deal with machine learning or deep learning dataframe, I never delete the column that which is a NaN, I replace them with 0. Everyone does this in the most professional curses, because deleting a value from the data frame can make a problem in your model. – May 01 '23 at 22:55
See [this](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-a-certain-column-is-nan/13413845#13413845), it would be helpful. – May 01 '23 at 23:04

score 0 · Answer 2 · answered Jun 13 '23 at 13:29

A simple way to distinguish NaN from other values is that NaN!=NaN is true (at least in Python). This is also why adata.obs['cell_type'] != np.nan can only ever be true, nothing is equal to NaN. So you can always get a mask of whether something is NaN by testing this:

adata = adata[adata.obs['cell_type'] == adata.obs['cell_type']]

This will sort out all NaN values. As somebody else suggested, you could also use the pandas functionality here:

adata = adata[adata.obs['cell_type'].notna()]

Remove NaN values from AnnData object

2 Answers2