0

I have a dataframe that looks like this:

length      code1    code2    code3
4            0         1        1
8            1         1        0
7            1         0        0

I want to write a function that checks the value in length. If the value is >= 7, I want to add 1 to the value present in code2 and code3. What is the best way to do this? So far, I have:

def char_count_pred(df):
    
    
    if df.length >= 7:
           df.code2 += 1
           df.code3 += 1

    return df


master_df = char_count_pred(master_df)

I understand I need to build a loop to iterate over each row, but I am confused on the most efficient way to loop through rows of and performing tasks on multiple columns.

edit

When trying the solutions below, I get the same errors:

When I try the script as is....


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2889             try:
-> 2890                 return self._engine.get_loc(key)
   2891             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: True

When I set the script to = a variable...


  File "<ipython-input-140-9f2f40a5bb96>", line 3
    df = df.loc[df.length>=7]+=1
                                                                                   ^
SyntaxError: invalid syntax
Community
  • 1
  • 1
connor449
  • 1,549
  • 2
  • 18
  • 49

4 Answers4

3
df.loc[df.length >=7, ['code1','code2']]+=1
Chris
  • 15,819
  • 3
  • 24
  • 37
2
df.loc[df['length']>=7, 'code2':] += 1

Use .loc to search for rows greater than or equal to 7, then select the correct columns and add 1

Ben Pap
  • 2,549
  • 1
  • 8
  • 17
1

I hope this will help:

EDIT
This is everything, which i am using: Definition of your datafrme:

df = pd.DataFrame(columns=["length","code1","code2","code3"],
                  data=[[4,0,1,1],
                        [8,1,1,0],
                        [7,1,0,0]])

Definition of the function:

def char_count_pred(df):
    for col in df.columns:
        df[col].loc[df[col]>7]+=1
char_count_pred(df)

Everything works, I don't know, where the problem is.

coco18
  • 836
  • 8
  • 18
0

You could perform an apply which would likely be close to the most efficient way of modifying your columns based on some function.

There's an answer here you could take a look at, or - try something like this as a template for your specific use-case:

master_df["code2"] = master_df.apply(lambda x : x["code2"] + 1 if x["length"] >= 7 else x["code2"], axis=1)

Which will update your "code2" field by applying a function (in this case an anonymous lamba function, but could equally be some named function as per your def) the only limitation being that it's simpler if those functions target a single column at a time.

There are methods for updating/generating results to update multiple columns at once, but it might be simpler to start of updating single columns at a time.

Thomas Kimber
  • 10,601
  • 3
  • 25
  • 42