For a problem, I need to restrict dataframe to two columns, filter them for desired values, drop Nan so that a correlation can be found. I know the rows need to be a certain amount and I am getting more rows then I should
numpy and pandas imported
df = pd.read_csv("assets/NISPUF17.csv")
#restricting to wanted columns
wantedcolumns = df[["HAD_CPOX","P_NUMVRC"]]
#filter columns
had_cpox = wantedcolumns[(wantedcolumns["HAD_CPOX"] >=1)&(wantedcolumns["HAD_CPOX"]<=2)]
cpox_vax = wantedcolumns[wantedcolumns["P_NUMVRC"] >=1.0]
#drop missing values
wantedcolumns.dropna()
#print number of rows
print(len(wantedcolumns))
corr, pval=stats.pearsonr(column 1, column 2)
What I have tried: wantedcolumns[had_cpox & cpox_vax] - traceback