0

I have a huge dataframe where I have to identify and mark rows as "Actuals" & "Forecast" depending upon the current date.

Sample df:

Date        Description
2021-01-01  A
2021-01-02  B
2021-01-03  B
2021-01-04  B
2021-01-05  A
2021-01-06  A
2021-01-07  A
2021-01-08  B
2021-01-09  B
2021-01-10  A

I am able to filter the dataframe based on the below code:

today = dt.date.today().strftime("%Y-%m-%d")
df[(df['Date'] < today)]

But what I actually want is to keep all rows and add a column based on the current date and segregate the rows as actuals (if they are before or on the current date) and forecast (if they are after the current date)

For example, if the current date is 5th Jan 2021 then:

Expected df:

Date        Description     Data_Residue
2021-01-01  A               Actuals
2021-01-02  B               Actuals
2021-01-03  B               Actuals
2021-01-04  B               Actuals
2021-01-05  A               Actuals
2021-01-06  A               Forecast
2021-01-07  A               Forecast
2021-01-08  B               Forecast
2021-01-09  B               Forecast
2021-01-10  A               Forecast

Any suggestions are appreciated. I have a huge dataframe so I was hoping not to use IF STATEMENT.

RSM
  • 645
  • 10
  • 25

2 Answers2

3

Use np.where:

df['Data_Residue'] = np.where(df['Date'] <= '2021-01-05', 'Actuals', 'Forecast')
print(df)

# Output
         Date Description Data_Residue
0  2021-01-01           A      Actuals
1  2021-01-02           B      Actuals
2  2021-01-03           B      Actuals
3  2021-01-04           B      Actuals
4  2021-01-05           A      Actuals
5  2021-01-06           A     Forecast
6  2021-01-07           A     Forecast
7  2021-01-08           B     Forecast
8  2021-01-09           B     Forecast
9  2021-01-10           A     Forecast
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

This is how I would do it. apply() method in pandas is more efficient than a loop over the dataframe so hopefully it will be fast enough for your case:

df["Data_residue"] = df.apply(lambda row: "Actuals" if row['date'] < today else "Forecast", axis=1)

Please let me know if that answer you request.

Apply can be used to apply any function over rows or columns to your DataFrame. Here I used a lambda function as it is a quite simple task to perform on each row but you can also define a proper function.
You can find more info on here on apply()

Ssayan
  • 938
  • 5
  • 12