-1

I do not think this is a duplicate of this question, as that question requires adding new data. I do not need to add new data, I need to select exisiting data based on a series of conditions.

I have a dataframe which looks like this:

signal,vaccine_dosage,vaccine_brand
10,0,Na
15,1,AZ
20,2,PF
30,3,AZ
10,0,Na
20,2,AZ
20,2,AZ

I need a new dataframe with only the 2 dosage signals from AZ. In R, I can do something like this:

file_input <- read.csv(file.choose())
two_dosage <- as.data.frame(
     as.numeric(ifelse(file_input$vaccine_dosage == 2 & 
     file_input$vaccine_brand == 'AZ', file_input$signal, NA)))

Which would give a dataframe like this:

signal
Na
Na
Na
Na
Na
20
20

I need to recreate this with Pandas, but I don't really know where to begin. How would you recreate this?

Deez
  • 89
  • 6

1 Answers1

-1

Use read_csv with mask by to_numeric with numpy.where:

df = pd.read_csv(file)
    
mask = pd.to_numeric(df['vaccine_dosage']).eq(2) & df['vaccine_brand'].eq('AZ')
df['signal'] = np.where(mask, df['signal'], np.nan)

Or with DataFrame.loc:

df.loc[~mask, 'signal'] = np.nan
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252