0

For a problem, I need to restrict dataframe to two columns, filter them for desired values, drop Nan so that a correlation can be found. I know the rows need to be a certain amount and I am getting more rows then I should

numpy and pandas imported

df = pd.read_csv("assets/NISPUF17.csv")

#restricting to wanted columns
wantedcolumns = df[["HAD_CPOX","P_NUMVRC"]]

#filter columns

had_cpox = wantedcolumns[(wantedcolumns["HAD_CPOX"] >=1)&(wantedcolumns["HAD_CPOX"]<=2)]
cpox_vax = wantedcolumns[wantedcolumns["P_NUMVRC"] >=1.0]

#drop missing values
wantedcolumns.dropna()

#print number of rows
print(len(wantedcolumns))

corr, pval=stats.pearsonr(column 1, column 2)

What I have tried: wantedcolumns[had_cpox & cpox_vax] - traceback

  • What's the problem with your current code? What happens when you run it, and what did you expect to happen instead? Any errors? See [ask]. – Robert Jun 08 '22 at 15:45
  • Which part are you having trouble with? Limiting the number of columns? Filtering? Or dropping NaN's? Please ask one question per question - please read [mre]. It helps us if we don't have to guess or create example data to recreate your problem or test our solutions. Please always include a minimal example of any data being operated on, it can even be fake data - [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii Jun 08 '22 at 16:07
  • Always include the complete Traceback - formatted as code. – wwii Jun 08 '22 at 16:09
  • Not sure what the problem is, but by looking at your code maybe the problem is in dropna step. Try using wantedcolumns.dropna(inplace=True) or wantedcolumns=wantedcolumns.dropna() – nogmos Jun 09 '22 at 11:14
  • Provide us a small sample of your dataframe with "df[:10].to_dict()" (check it contains NaN values), then copy/paste the result in your question. – Drakax Jun 09 '22 at 22:33

0 Answers0