Grouping strings on the pandas dataframe

Question

I have the following dataframe with information from weather stations:

      import pandas as pd
      import numpy as np

      df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                                  '2089', '2089', '8974'], 
                         'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                             'speedometer', 'Pluviometer', 'speedometer', 
                                             'Pluviometer']})

I would like to group the instruments from each of the weather stations.

I tried to use groupby, along with the sum () function, as follows:

      df_New = df.groupby('Code Weather Station', as_index=False)['Instrumentation'].sum()

The result is as expected. However, I wish there were spaces among the instruments.

      print(df_New)

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analogspeedometerincidence-sun
            2089             speedometerPluviometerspeedometer
            8974             Pluviometer

I would like the output to be:

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analog speedometer incidence-sun
            2089             speedometer Pluviometer speedometer
            8974             Pluviometer

Thank you.

try `df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x))` — Partha Mandal, May 22 '20 at 12:34
Does this answer your question? [Concatenate strings from several rows using Pandas groupby](https://stackoverflow.com/questions/27298178/concatenate-strings-from-several-rows-using-pandas-groupby) — Partha Mandal, May 22 '20 at 12:36
I tried: df_New = df.groupby('Code Weather Station', as_index=False)['Instrumentation'].apply(lambda x: ' '.join(x)) . But the return is not a dataframe type. Do you have any suggestion? — Jane Borges, May 22 '20 at 12:46
I also tried: df_New = pd.DataFrame(df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x))) . But indexing by column name is awkward. — Jane Borges, May 22 '20 at 12:48

score 1 · Accepted Answer · answered May 22 '20 at 12:54

1

Oh! Do a reset_index() like:

df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x)).reset_index()

answered May 22 '20 at 12:54

Partha Mandal

1,391
8
14

tuhinsharma121 · Answer 2 · 2020-05-22T13:47:56.773

0

you should avoid apply as its inefficient. You can try this:-

import pandas as pd
import numpy as np

df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                          '2089', '2089', '8974'], 
                 'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                     'speedometer', 'Pluviometer', 'speedometer', 
                                     'Pluviometer']})

def process(x):
    return " ".join(x)

df_new = df.groupby('Code Weather Station').agg({
        'Instrumentation': [('Instrumentation', process)]
    })
df_new.columns = df_new.columns.droplevel()
df_new

edited May 22 '20 at 13:47

answered May 22 '20 at 12:56

tuhinsharma121

186
2
9

`.agg` is more efficient when you have `cython` optimized in-built functions, AFAIK. How is it more efficient for custom functions? Any links you can share? – Partha Mandal May 22 '20 at 13:09
yeah true. its always recommended to avoid ```apply``` because its just a python for loop, instead use ```map``` which is a vectorized implementation and way faster than ```apply```. ```agg``` uses ```map``` internally (you could check pandas github). But there are situations where ```apply``` cannot be avoided, (eg. handling multiple columns at the same time). But for handling a single column there is no use of using ```apply```. Hope this helps. – tuhinsharma121 May 22 '20 at 13:35

Grouping strings on the pandas dataframe

2 Answers2