0

I am experimenting with "flaging" some data with a 1 or 0 in a separated df column based on a condition, but could use some tips...

EDIT, this question is NOT looking up data in a dataframe but is attempting to look for a solution modify values in the dataframe for each row based on row conditions.

Made up data:

import pandas as pd
import numpy as np


rows,cols = 8760,3
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='1T') 
df = pd.DataFrame(data, columns=['cooling_sig','heating_sig','economizer_sig'], index=tidx)

This is some extra parameters and columns for my application:

# params for air handling unit (ahu)
ahu_min_oa = .2

# make columns out of params
df['ahu_min_oa'] = ahu_min_oa
df['heating_mode'] = 0
df['econ_mode'] = 0
df['econ+mech_cooling'] = 0
df['mech_cooling'] = 0

A function to process the data but it doesn't work. Any better practices greatly appreciated other than hammering through each row of the dataframe. I am trying "flag" a mode with a value of 1 based on a condition. For example, for each row in the data the heating_mode would be True or 1 if the heating_sig is greater than zero.

def data_preprocess(dataframe):
    
    for index, row in dataframe.iterrows():
        
        # OS1, the AHU is heating
        if row.heating_sig > 0:
            row['heating_mode'] = 1

        # OS2, the AHU is using free cooling only
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig == 0:
            row['econ_mode'] = 1

        # OS3, the AHU is using free and mechanical cooling
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig > 0:
            row['econ+mech_cooling'] = 1

        # OS4, the AHU is using mechanical cooling only
        if row.economizer_sig <= row.ahu_min_oa and row.cooling_sig > 0:
            row['mech_cooling'] = 1

        return dataframe

Sorry probably sort of a strange application and question but thanks for any tips. My attempt at Flagging some data isnt working, all of the value_counts() are zero.

df['heating_mode'].value_counts()
df['mech_cooling'].value_counts()
df['econ_mode'].value_counts()
df['econ+mech_cooling'].value_counts()
bbartling
  • 3,288
  • 9
  • 43
  • 88
  • Does this answer your question? [Updating value in iterrow for pandas](https://stackoverflow.com/questions/25478528/updating-value-in-iterrow-for-pandas) – ImSo3K Nov 15 '21 at 15:15
  • 1
    The real question is why do you need iterrows? This will likely be more efficient to use vectorial code – mozway Nov 15 '21 at 15:17
  • Would you have any tips to look at other than iterrows? Not a lot of wisdom here ! – bbartling Nov 15 '21 at 15:18
  • 1
    For example `df['heating_mode'] = df['heating_sig'].gt(0).astype(int)`. – Quang Hoang Nov 15 '21 at 15:19
  • 1
    Then `df['econ_mode'] = df['econimizer_sig'].gt(df['ahu_min_oa']) & df['cooling_sig'].eq(0)`, and so on... – Quang Hoang Nov 15 '21 at 15:20

2 Answers2

1

You don't need to (and shouldn't) iterate over your DataFrame.

Instead, try:

df.loc[df["heating_sig"].eq(1), "heating_mode"] = 1
df.loc[df["economizer_sig"].gt(df["ahu_min_oa"]) & df["cooling_sig"].eq(0), "econ_mode"] = 1
df.loc[df["economizer_sig"].gt(df["ahu_min_oa"]) & df["cooling_sig"].gt(0), "econ+mech_cooling"] = 1
df.loc[df["economizer_sig"].le(df["ahu_min_oa"]) & df["cooling_sig"].gt(0), "mech_cooling"] = 1
not_speshal
  • 22,093
  • 2
  • 15
  • 30
0

There might be more efficient ways of doing the same, but if you really need to use iterrows(), then follow the following approach:

def data_preprocess(dataframe):
    for index, row in dataframe.iterrows():
        # OS1, the AHU is heating
        if row.heating_sig > 0:
            dataframe.at[index, 'heating_mode'] = 1

        # OS2, the AHU is using free cooling only
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig == 0:
            dataframe.at[index, 'econ_mode'] = 1

        # OS3, the AHU is using free and mechanical cooling
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig > 0:
            dataframe.at[index, 'econ+mech_cooling'] = 1

        # OS4, the AHU is using mechanical cooling only
        if row.economizer_sig <= row.ahu_min_oa and row.cooling_sig > 0:
            dataframe.at[index, 'mech_cooling'] = 1

    return dataframe
Sadman Sakib
  • 557
  • 3
  • 10