Efficient way to conditionally modify columns in a pandas dataframe, row by row

Question

I have a dataframe that looks like this:

length      code1    code2    code3
4            0         1        1
8            1         1        0
7            1         0        0

I want to write a function that checks the value in length. If the value is >= 7, I want to add 1 to the value present in code2 and code3. What is the best way to do this? So far, I have:

def char_count_pred(df):
    
    
    if df.length >= 7:
           df.code2 += 1
           df.code3 += 1

    return df


master_df = char_count_pred(master_df)

I understand I need to build a loop to iterate over each row, but I am confused on the most efficient way to loop through rows of and performing tasks on multiple columns.

edit

When trying the solutions below, I get the same errors:

When I try the script as is....


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2889             try:
-> 2890                 return self._engine.get_loc(key)
   2891             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: True

When I set the script to = a variable...


  File "<ipython-input-140-9f2f40a5bb96>", line 3
    df = df.loc[df.length>=7]+=1
                                                                                   ^
SyntaxError: invalid syntax

score 3 · Answer 1 · answered Mar 10 '20 at 19:59

3

df.loc[df.length >=7, ['code1','code2']]+=1

answered Mar 10 '20 at 19:59

Chris

15,819
3
24
37

This solution throws the errors listed above, any ideas? – connor449 Mar 10 '20 at 20:35

score 2 · Answer 2 · answered Mar 10 '20 at 19:59

2

df.loc[df['length']>=7, 'code2':] += 1

Use .loc to search for rows greater than or equal to 7, then select the correct columns and add 1

answered Mar 10 '20 at 19:59

Ben Pap

2,549
1
8
17

coco18 · Accepted Answer · 2020-03-11T05:51:17.183

1

I hope this will help:

EDIT
This is everything, which i am using: Definition of your datafrme:

df = pd.DataFrame(columns=["length","code1","code2","code3"],
                  data=[[4,0,1,1],
                        [8,1,1,0],
                        [7,1,0,0]])

Definition of the function:

def char_count_pred(df):
    for col in df.columns:
        df[col].loc[df[col]>7]+=1
char_count_pred(df)

Everything works, I don't know, where the problem is.

edited Mar 11 '20 at 05:51

answered Mar 10 '20 at 20:03

coco18

836
8
18

This solution throws the errors listed above, any ideas? – connor449 Mar 10 '20 at 20:36
got it working. FYI on line 3 of your function, the section [col] needs to be ['length']. Thanks! Accepting your solution. – connor449 Mar 11 '20 at 16:17

Thomas Kimber · Answer 4 · 2020-03-10T20:07:15.643

You could perform an apply which would likely be close to the most efficient way of modifying your columns based on some function.

There's an answer here you could take a look at, or - try something like this as a template for your specific use-case:

master_df["code2"] = master_df.apply(lambda x : x["code2"] + 1 if x["length"] >= 7 else x["code2"], axis=1)

Which will update your "code2" field by applying a function (in this case an anonymous lamba function, but could equally be some named function as per your def) the only limitation being that it's simpler if those functions target a single column at a time.

There are methods for updating/generating results to update multiple columns at once, but it might be simpler to start of updating single columns at a time.

Efficient way to conditionally modify columns in a pandas dataframe, row by row

edit

4 Answers4