How can I remove other data points based on the condition in multiple other columns in pandas?

Question

I do not think this is a duplicate of this question, as that question requires adding new data. I do not need to add new data, I need to select exisiting data based on a series of conditions.

I have a dataframe which looks like this:

signal,vaccine_dosage,vaccine_brand
10,0,Na
15,1,AZ
20,2,PF
30,3,AZ
10,0,Na
20,2,AZ
20,2,AZ

I need a new dataframe with only the 2 dosage signals from AZ. In R, I can do something like this:

file_input <- read.csv(file.choose())
two_dosage <- as.data.frame(
     as.numeric(ifelse(file_input$vaccine_dosage == 2 & 
     file_input$vaccine_brand == 'AZ', file_input$signal, NA)))

Which would give a dataframe like this:

signal
Na
Na
Na
Na
Na
20
20

I need to recreate this with Pandas, but I don't really know where to begin. How would you recreate this?

You want to write output as new data frame or add a output column to existing dataframe itself? — udaykumar gajavalli, Aug 16 '22 at 12:48

jezrael · Answer 1 · 2022-08-16T06:52:43.150

-1

Use read_csv with mask by to_numeric with numpy.where:

df = pd.read_csv(file)
    
mask = pd.to_numeric(df['vaccine_dosage']).eq(2) & df['vaccine_brand'].eq('AZ')
df['signal'] = np.where(mask, df['signal'], np.nan)

Or with DataFrame.loc:

df.loc[~mask, 'signal'] = np.nan

edited Aug 16 '22 at 06:52

answered Aug 16 '22 at 06:46

jezrael

822,522
95
1,334
1,252

How can I remove other data points based on the condition in multiple other columns in pandas?

1 Answers1