0

I have a dataframe, called PORResult, of daily temperatures where rows are years and each column is a day (121 rows x 365 columns). I also have an array, called Percentile_90, of a threshold temperature for each day (length=365). For every day for every year in the PORResult dataframe I want to find out if the value for that day is higher than the value for that day in the Percentile_90 array. The results of which I want to store in a new dataframe, called Count (121rows x 365 columns). To start, the Count dataframe is full of zeros, but if the daily value in PORResult is greater than the daily value in Percentile_90. I want to change the daily value in Count to 1.

This is what I'm starting with:

for i in range(len(PORResult)):
    if PORResult.loc[i] > Percentile_90[i]:
        CountResult[i]+=1

But when I try this I get KeyError:0. What else can I try?

Velocibadgery
  • 3,670
  • 2
  • 12
  • 17
Megan Martin
  • 221
  • 1
  • 9
  • 1
    For one thing, if you're using a numeric index you want `iloc` not `loc`, which is for named indexing. For another thing, this should be able to be done with broadcasting so you don;t have to loop through the dataframe, but it would help to see a small sample of your input df and array and your expected output to make a [mcve] so we can better understand and replicate your issue. See also [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Nov 05 '21 at 19:05

1 Answers1

0

(Edited:) Depending on your data structure, I think

CountResult = PORResult.gt(Percentile_90,axis=0).astype(int)

should do the trick. Generally, the toolset provided in pandas is sufficient that for-looping over a dataframe is unnecessary (as well as remarkably inefficient).

Joshua Voskamp
  • 1,855
  • 1
  • 10
  • 13