0

I have a pandas dataframe as shown here. There are many more columns in that frame that are not important concerning the task.

id    pos      value       sente
1     a         I           21
2     b         have        21
3     b         a           21
4     a         cat         21
5     d         !           21
1     a         My          22
2     a         cat         22
3     b         is          22
4     a         cute        22
5     d         .           22

I now want to group all rows where sente=sente and join the words in value to form a sentence in a list. So the output should look something like this (a list full of strings seperated by comma) :

["I have a cat!", "My cat is cute."]

I suppose the first step is to use groupby("sente")

fill = (df.groupby("sente").apply(lambda df: df["value"].values)).reset_index().rename(columns={0: "content"})

fill = [word for word in fill["content"]

However doing so I get this output:

print(fill):

[array(['I','have','a','cat','!'],dtype=object), array(['My','cat','is','cute','.'],dtype=object)]

Is there any way to join all words in a sentence without labeling them as a seperate string and to remove the array and dtype part?

Mi.
  • 510
  • 1
  • 4
  • 20

1 Answers1

3

You need join all values without last by space and then append it:

L = (df.groupby("sente")['value']
       .apply(lambda x: ' '.join(x.iloc[:-1]) + x.iloc[-1])
       .tolist())
print (L)
['I have a cat!', 'My cat is cute.']

because else unnecessary space before ! and .:

print (df.groupby("sente")['value'].apply(' '.join).tolist())
['I have a cat !', 'My cat is cute .']
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This seems to work. Thank you! Certain values in the value column have quotation marks. Is there a way to get these as strings? – Mi. May 03 '18 at 12:03
  • Is possible remove them first by strip like `df['value'] = df['value'].str.strip('"“')` ? – jezrael May 03 '18 at 12:05
  • It working if `"value"` -> `value`, but if quotation marks are separately, then `strip` remove it. – jezrael May 03 '18 at 12:07
  • Yes, this did work. Is there any way to apply to it multiple columns (not only ["values"]) in one go? What should I change for that? – Mi. May 07 '18 at 13:47
  • 1
    @ThelMi - You can try change `apply` to `agg` and omit `['value']`. – jezrael May 07 '18 at 15:00