2

The code I have works but it is very slow. I have a large dataframe based on days and a smaller dataframe that is the day data but averaged into weekly/monthly/yearly intervals. I am moving the change in direction "Turning Point" from the yearly (tempTimeScale) to the daily dataframe based on when the it changed on the day of that year rather than at the start/end of the year.

Is there a way to make it run faster?

import numpy as np

d = {"Turning Point Up": [10, np.nan, np.nan, 17, np.nan]}
dailyData = pd.DataFrame(data=d)

y = {"Turning Point Up": [17]}
tempTimeScale = pd.DataFrame(data=y)
tempTimeScale

def align(additive):
  for indexD, rowD in dailyData.iterrows():
    for indexY, rowY in tempTimeScale.iterrows():
      if rowD["Turning Point Up"]==rowY["Turning Point Up"]:
        dailyData.at[indexD,"Turning Point Up Y"]=rowY["Turning Point Up"]
o = {"Turning Point Up": [10, np.nan, np.nan, 17, np.nan], "Turning Point Up Y": [np.nan, np.nan, np.nan, 17, np.nan]}
exampleoutput = pd.DataFrame(data=o)
exampleoutput
  • `iterrows` is slow and should be avoided because it has to convert each row to a pandas `Series`. – Kraigolas Jun 16 '21 at 13:45
  • 1
    [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – It_is_Chris Jun 16 '21 at 13:45
  • Show a chunk of each of your dataframes. It will help to [create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). And read [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask). – aneroid Jun 16 '21 at 14:10

2 Answers2

1

Try:

m = dailyData["Turning Point Up"].isin(tempTimeScale["Turning Point Up"])
dailyData["Turning Point Up Y"] = dailyData.loc[m, "Turning Point Up"]
print(dailyData)

Prints:

   Turning Point Up  Turning Point Up Y
0              10.0                 NaN
1               NaN                 NaN
2               NaN                 NaN
3              17.0                17.0
4               NaN                 NaN
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

Based on your expected output of:

   Turning Point Up  Turning Point Up Y
0              10.0                 NaN
1               NaN                 NaN
2               NaN                 NaN
3              17.0                17.0
4               NaN                 NaN

you're setting Turning Point Up Y to the same value as Turning Point Up when there's a match, and NaN otherwise. Is this what you actually want or do you want some sort of "indicator" for which values match in both? If it's the latter, then use the method in @Andrej's answer to set a flag indicating that:

dailyData["Turning Point Up Y"] = dailyData["Turning Point Up"].isin(tempTimeScale["Turning Point Up"])

Result:

   Turning Point Up  Turning Point Up Y
0              10.0               False
1               NaN               False
2               NaN               False
3              17.0                True
4               NaN               False
aneroid
  • 12,983
  • 3
  • 36
  • 66