0

I have the following data frame

enter image description here

df = pd.DataFrame([['1','aa', 'eee', 'text 1', 'text 1'], ['1', 'aa', 'fff', 'text 2'], ['1', 'aa', 'ggg', 'text 3'],
               ['2','aa', 'eee', 'text 4', 'text 4'], ['2', 'aa', 'fff', 'text 5'], ['2', 'aa', 'ggg', 'text 6'],
               ['3','bb', 'hhh', 'text 7', 'text 7'], ['3', 'bb', 'jjj', 'text 8'], ['3', 'bb', 'kkk', 'text 9'],
               ['3', 'bb', 'mmm', 'text 10'], ['4','bb', 'hhh', 'text 11', 'text 11'], ['4', 'bb', 'jjj', 'text 12'],
               ['4', 'bb', 'kkk', 'text 13'], ['4', 'bb', 'mmm', 'text 14'], ['5','aa', 'eee', 'text 15', 'text 15'],
               ['5', 'aa', 'fff', 'text 16'], ['5', 'aa', 'ggg', 'text 17']], columns=['foo', 'bar','name_input','string', 'Feature 1'])

Now, I need to add a new column based on three conditions and value must be used from two rows.

The conditions to follow is

  • if bar = aa and name_input = fff and name_input = ggg , then new value in column should be text 2 + text 3

In the end, I am aiming to have my output as following,

enter image description here

df = pd.DataFrame([['1', 'aa', 'eee', 'text 1', 'text 1', 'text 2 + text 3'], ['1', 'aa', 'fff', 'text 2'], ['1', 'aa', 'ggg', 'text 3'],
               ['2', 'aa', 'eee', 'text 4', 'text 4', 'text 5 + text 6'], ['2', 'aa', 'fff', 'text 5'], ['2', 'aa', 'ggg', 'text 6'],
               ['3', 'bb', 'hhh', 'text 7', 'text 7', 'text 8 + text 9'], ['3', 'bb', 'jjj', 'text 8'], ['3', 'bb', 'kkk', 'text 9'],
               ['3', 'bb', 'mmm', 'text 10'], ['4', 'bb', 'hhh', 'text 11', 'text 11', 'text 12 + text 13'], ['4', 'bb', 'jjj', 'text 12'],
               ['4', 'bb', 'kkk', 'text 13'], ['4', 'bb', 'mmm', 'text 14'], ['5','aa', 'eee', 'text 15', 'text 15', 'text 16 + text 17'],
               ['5', 'aa', 'fff', 'text 16'], ['5', 'aa', 'ggg', 'text 17']], columns=['foo', 'bar', 'name_input', 'string', 'Feature 1', 'Feature 2'])

I tried using the below:

df_merge1.loc[(df_merge1['bar'] == 'aa') & (df_merge1['name_input'] == 'fff') & (df_merge1['name_input'] == 'ggg'), 'Feature 2'] = df_merge1['string'].values[1] + df_merge1['string'].values[2]

However, I am not able to populate the values from the string column to the new column. Seems like im missing something.

Any help is much appreciated!

Shri
  • 51
  • 3
  • 11

1 Answers1

1

Use pandas shift to get the next row values. Numpy.where does the conditional selection

condition1 = ((df.bar=="aa") &
              (df.name_input.shift(-1)=="fff") &
              (df.name_input.shift(-2)=="ggg")
             )

condition2 = ((df.bar=="bb") & 
              (df.name_input.shift(-1)=="jjj") &
              (df.name_input.shift(-2)=="kkk"))

outcome = df.string.shift(-1) + '+' + df.string.shift(-2)

df.assign(feature2 = np.where(condition1|condition2, outcome,''))

    foo bar name_input  string  Feature 1   feature2
0   1   aa     eee      text 1  text 1  text 2+text 3
1   1   aa     fff      text 2  None    
2   1   aa     ggg      text 3  None    
3   2   aa     eee      text 4  text 4  text 5+text 6
4   2   aa     fff      text 5  None    
5   2   aa     ggg      text 6  None    
6   3   bb     hhh      text 7  text 7  text 8+text 9
7   3   bb     jjj      text 8  None    
8   3   bb     kkk      text 9  None    
9   3   bb     mmm      text 10 None    
10  4   bb     hhh      text 11 text 11 text 12+text 13
11  4   bb     jjj      text 12 None    
12  4   bb     kkk      text 13 None    
13  4   bb     mmm      text 14 None    
14  5   aa     eee      text 15 text 15 text 16+text 17
15  5   aa     fff      text 16 None    
16  5   aa     ggg      text 17 None    
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • Thank you. It worked. A follow up question - how do i drop rows with "None". I tried "df.dropna(axis=0, how='any')". It didnt work – Shri Jan 24 '20 at 11:54
  • Do accept if it is the right answer for you. pandas dropna does the trick, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html df.dropna(how='any',axis=0). – sammywemmy Jan 24 '20 at 11:57
  • it worked! df.dropna(how='any',axis=0, inplace= True) Thank you. – Shri Jan 24 '20 at 12:06