3

I have a pandas dataframe that has 2 columns. I want to loop through it's rows and based on a string from column 2 I would like to add a string in a newly created 3th column. I tried:

for i in df.index:
    if df.ix[i]['Column2']==variable1:
        df['Column3'] = variable2
    elif df.ix[i]['Column2']==variable3:
        df['Column3'] = variable4

print(df)

But the resulting dataframe has in column 3 only Variable2.

Any ideas how else I could do this?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Andrei Cozma
  • 950
  • 3
  • 9
  • 14
  • You can use `for i, r in df.iterrows()` to get `i` as the index and `r` as a Series of row values. This makes writing the if, elif conditions easier `if r.Column2==variable1:` – Little Bobby Tables Oct 05 '16 at 13:31
  • 1
    I have just realised you main issue here is that you are setting the entire column Column3 as equal to either variable2 and variable4 for ALL `i`. You get only variable2 as variable1 is the last element in Column2. In other words just use `df.ix[i, 'Column3'] = variable2` and `df.ix[i, 'Column3'] = variable4`. – Little Bobby Tables Oct 05 '16 at 13:39

3 Answers3

3

You can also try this (if you want to keep the for loop you use) :

new_column = []

for i in df.index:
    if df.ix[i]['Column2']==variable1:
        new_column.append(variable2)
    elif df.ix[i]['Column2']==variable3:
        new_column.append(variable4)
    else : #if both conditions not verified
        new_column.append(other_variable)

df['Column3'] = new_column
MMF
  • 5,750
  • 3
  • 16
  • 20
2

I think you can use double numpy.where, what is faster as loop:

df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                np.where(df['Column2']==variable3, variable4))

And if need add variable if both conditions are False:

df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                np.where(df['Column2']==variable3, variable4, variable5))

Sample:

df = pd.DataFrame({'Column2':[1,2,4,3]})
print (df)
   Column2
0        1
1        2
2        4
3        3

variable1 = 1
variable2 = 2
variable3 = 3
variable4 = 4
variable5 = 5

df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                np.where(df['Column2']==variable3, variable4, variable5))

print (df)
   Column2  Column3
0        1        2
1        2        5
2        4        5
3        3        4

Another solution, thanks Jon Clements:

df['Column4'] = df.Column2.map({variable1: variable2, variable3:variable4}).fillna(variable5)
print (df)
   Column2  Column3  Column4
0        1        2      2.0
1        2        5      5.0
2        4        5      5.0
3        3        4      4.0
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Or possibly: `df.Column2.map({1: 'foo', 3: 'bar'}).fillna('')` - then you've got your conditions (non-nested in a one-op dict) and the `fillna` as a default value – Jon Clements Oct 05 '16 at 12:35
  • Yes, only converting to `float` is not nice, but it works nice if variables are `float` or `string`. If `int`, only `.astype(int)` is necessary. Thanks you. – jezrael Oct 05 '16 at 12:48
  • thanks Jezrael it did work very well. If I am not too cheeky and this might be very difficult, but how can I now replace variable5 with either variable 2 or 4 equitably taking into account that not both variable currently have the same number of rows. In your example if Column 4 would be – Andrei Cozma Oct 05 '16 at 14:35
  • 2.0 5.0 5.0 4.0 4.0 4.0. In this case the 2 5's should become 2's – Andrei Cozma Oct 05 '16 at 14:35
  • @Andei Cozma - I am off my PC. So I think you can ask another question. Small advice check [this](http://stackoverflow.com/a/20159305/2901002) and dont forget add desired output. – jezrael Oct 05 '16 at 15:28
2

Firstly, there is no need to loop through each and every index, just use pandas built in boolean indexing. First line here, we gather all of the values in Column2 that are the same as variable1 and set the same row in Column3 to be variable2

df.ix[df.Column2==variable1, 'Column3'] = variable2
df.ix[df.Column2==variable3, 'Column3'] = variable4

A simple example would be

import pandas as pd

df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog']})
print(df)

    Animal
0   dog
1   fish
2   fish
3   dog

df.ix[df.Animal=='dog', 'Colour'] = 'brown'
df.ix[df.Animal=='fish', 'Colour'] = 'silver'
print(df)

    Animal  Colour
0   dog     brown
1   fish    silver
2   fish    silver
3   dog     brown

The above method can be build on very easily using multiple conditions like & and | to boolean index.

df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog'], 'Age': [1, 3, 2, 10]})
print(df)

   Age Animal
0    1    dog
1    3   fish
2    2   fish
3   10    dog

df.ix[(df.Animal=='dog') & (df.Age > 8), 'Colour'] = 'grey' # old dogs go grey
df.ix[(df.Animal=='dog') & (df.Age <= 8), 'Colour'] = 'brown'
df.ix[df.Animal=='fish', 'Colour'] = 'silver'
print(df)

   Age Animal  Colour
0    1    dog   brown
1    3   fish  silver
2    2   fish  silver
3   10    dog    grey
Little Bobby Tables
  • 4,466
  • 4
  • 29
  • 46