0

Given a dataframe of text type, I want to create a new column consisting on the concatenation of all the columns for a specific row (just like in the picture) enter image description here.

The dataframe of this example is given by:

  df= pd.DataFrame({'A':['hello word','how are u doing'],'B':['hey!','im doing great'],'C':['lol','lmao']})

This can be done with df['Joined] = df['A']+df['B']+df['C'], but I want this to work for different number of columns. I solved the problem in two different ways:

1.

 df['Joined']=''
    for col in du.columns:
        df['Joined']+= ' '+ df[col]
  1. l = list() for index, row in du.iterrows(): l.append(' '.join([x for x in row])) du['Joined'] = l

I would like to know if there is a more elegant way to do this (and more efficient)

Román
  • 101
  • 7

2 Answers2

1

Try:

df['Joined'] = df.apply(' '.join, axis=1)
print(df)

# Output
                 A               B     C                               Joined
0       hello word            hey!   lol                  hello word hey! lol
1  how are u doing  im doing great  lmao  how are u doing im doing great lmao

Update

If I want the column 'Joined' to be a list of the words instead of the concatenation

df['Joined'] = df.apply(list, axis=1)
print(df)

# Output
                 A               B     C                                   Joined
0       hello word            hey!   lol                  [hello word, hey!, lol]
1  how are u doing  im doing great  lmao  [how are u doing, im doing great, lmao]
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • If I want the column 'Joined' to be a list of the words instead of the concatenation. Is there a way like yours in one step? Obviously this solves the problem df['Joined'] = df.apply(' '.join, axis=1) df['Joined'] = df['Joined'].apply(lambda x: x.split()) But I want it in one apply if possible – Román Jan 02 '22 at 02:38
  • I noticed that after applying list over axis=1 I should tokenize my words. This is what I did ``` def ldaparser(df): data_words = df.apply(list, axis=1) data_words = data_words.apply(lambda corpus : [word for sentence in corpus for word in sentence.split()]) return data_words ``` – Román Jan 05 '22 at 15:32
0

You can do what you want to do with a lambda function

df['joined'] = df.apply(lambda r: ' '.join(str(r[col]) for col in df.columns), axis=1)
ekrall
  • 192
  • 8