0

I want to consider only the hourly temperature values of a particular day which are greater than the respective daily threshold values and replace the other values with a NaN value.

For example, the pandas series values are as follows

hours = pd.date_range("2018-01-01", periods=120, freq="H")
temperature = pd.Series(range(len(hours)), index=hours)

days = pd.date_range("2018-01-01", periods=5, freq="d")
daily_treshold = pd.Series([5,10,6,25,30], index=days)

Now I want to replace the hourly temperature values on the first day which are less than 5, second-day values which are less than 10 and so on.

How can I achieve this using pandas groupby and apply. Thanks.

Jithu
  • 55
  • 1
  • 7
  • maybe if you will have `groupby()` then you could use `zip()` to work with every group separatelly - ie, `for group, temp in zip(groups, [5,10,6,25,30]): ...` and then you could try to use `group[ group["temperature"] < temp ] = temp` – furas Mar 04 '21 at 11:54

1 Answers1

0

Here is an easy understanding double loop version to do what you want. pandas.Series.iteritems() returns (index, value) tuples of the Series:

import numpy as np
import pandas as pd

hours = pd.date_range("2018-01-01", periods=120, freq="H")
temperature = pd.Series(range(len(hours)), index=hours)

days = pd.date_range("2018-01-01", periods=5, freq="d")
daily_treshold = pd.Series([5,10,6,25,30], index=days)

for day_index, treshold in daily_treshold.iteritems():
    for hour_index, temp in temperature.iteritems():
        if day_index.date() == hour_index.date():
            if temp < treshold:
                temperature[hour_index] = np.NaN

print(temperature)

It's impossible to get index of pandas.Series when using pandas.Series.apply(). While the date of temperature and daily_treshold are different, we need do some change to compare them. For convenience, I change temperature to pandas.Dataframe.

Here is the code to show how to use apply function on temperature:

import numpy as np
import pandas as pd

hours = pd.date_range("2018-01-01", periods=120, freq="H")

# temperature = pd.Series(range(len(hours)), index=hours)
temperature = pd.DataFrame({'hour': hours,
                            'temp': range(len(hours))})

days = pd.date_range("2018-01-01", periods=5, freq="d")
daily_treshold = pd.Series([5,10,6,25,30], index=days)


def apply_replace(row, daily_treshold):
    treshold = daily_treshold[row['hour'].strftime('%Y-%m-%d')]

    if row['temp'] < treshold:
        return np.NaN
    else:
        return row['temp']

temperature['after_replace'] = temperature.apply(apply_replace, axis=1, args=(daily_treshold,))
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52