41

I need to combine multiple rows into a single row, that would be simple concat with space

    View of my dataframe:
  tempx        value
0  picture1         1.5
1  picture555       1.5
2  picture255       1.5
3  picture365       1.5
4  picture112       1.5

I want the dataframe to be converted like this: (space separated) tempx values

  Expected output:
  tempx                                                       value
  0     picture1 picture555 picture255 picture365 picture112  1.5

  or
  as a python dict
  {1.5:{picture1 picture555 picture255 picture365 picture112}}

What I have tried :

 df_test['tempx']=df_test['tempx'].str.cat(sep=' ')

this works but it combines the rows in all the columns like this:

      tempx        value
0  picture1 picture555 picture255 picture365 picture112 1.5
1  picture1 picture555 picture255 picture365 picture112 1.5
2  picture1 picture555 picture255 picture365 picture112 1.5
3  picture1 picture555 picture255 picture365 picture112 1.5
4  picture1 picture555 picture255 picture365 picture112 1.5

Is there any elegant solution?

Sandeep Raikar
  • 491
  • 1
  • 6
  • 13
  • also if there is a solution to conditionally combine based on value column – Sandeep Raikar Apr 03 '16 at 23:56
  • What is your expected output, can you edit and example into your question? Do you want to "group by" the value column, so you join the picture names for within each value? – Marius Apr 04 '16 at 00:48
  • I have applied grouby using pandas, next step I would like to do is to have a single row for each value attribute. please check the expected output – Sandeep Raikar Apr 04 '16 at 02:31

1 Answers1

83

You can use groupby and apply function join :

print df.groupby('value')['tempx'].apply(' '.join).reset_index()
   value                                              tempx
0    1.5  picture1 picture555 picture255 picture365 pict...
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 3
    @jezrael hi, is there a way to merge more than one column? instead of `tempx` i want to merge also more columns how to do that? I am trying `df.groupby('value')['tempx','second_column','third_column'].apply(' '.join).reset_index()` but I am receiving only groupped names of columns – sygneto Nov 13 '19 at 07:17
  • 10
    @sygneto - Use `df.groupby('value')['tempx','second_column','third_column'].agg(' '.join).reset_index()` – jezrael Nov 13 '19 at 07:19
  • 1
    thank you, i forgot again about .agg ^^, good to have you here – sygneto Nov 13 '19 at 07:21
  • for me, the call for multiple columns raises the `FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.` and does not include the second column, am I doing something wrong? – Ivo Feb 24 '21 at 09:52
  • 3
    @Ivo use `[]` like `df.groupby('value')[['tempx','second_column','third_column']].agg(' '.join).reset_index()` – jezrael Feb 24 '21 at 11:19
  • @jezrael After applying df.groupby(), I am not able fetch all the columns in output. Basically I want output like groupby_column, merge_column, column_2, column_3 – Prafulla Nov 16 '21 at 09:27
  • @Prafulla - depends of data, need multiple agreagation? Need aggregate first values for another columns? Best create new question - [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jezrael Nov 16 '21 at 09:29