-1

I have the following dataframe with information from weather stations:

      import pandas as pd
      import numpy as np

      df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                                  '2089', '2089', '8974'], 
                         'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                             'speedometer', 'Pluviometer', 'speedometer', 
                                             'Pluviometer']})

I would like to group the instruments from each of the weather stations.

I tried to use groupby, along with the sum () function, as follows:

      df_New = df.groupby('Code Weather Station', as_index=False)['Instrumentation'].sum()

The result is as expected. However, I wish there were spaces among the instruments.

      print(df_New)

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analogspeedometerincidence-sun
            2089             speedometerPluviometerspeedometer
            8974             Pluviometer

I would like the output to be:

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analog speedometer incidence-sun
            2089             speedometer Pluviometer speedometer
            8974             Pluviometer

Thank you.

Jane Borges
  • 552
  • 5
  • 14
  • 1
    try `df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x))` – Partha Mandal May 22 '20 at 12:34
  • 1
    Does this answer your question? [Concatenate strings from several rows using Pandas groupby](https://stackoverflow.com/questions/27298178/concatenate-strings-from-several-rows-using-pandas-groupby) – Partha Mandal May 22 '20 at 12:36
  • I tried: df_New = df.groupby('Code Weather Station', as_index=False)['Instrumentation'].apply(lambda x: ' '.join(x)) . But the return is not a dataframe type. Do you have any suggestion? – Jane Borges May 22 '20 at 12:46
  • I also tried: df_New = pd.DataFrame(df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x))) . But indexing by column name is awkward. – Jane Borges May 22 '20 at 12:48

2 Answers2

1

Oh! Do a reset_index() like:

df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x)).reset_index()

Partha Mandal
  • 1,391
  • 8
  • 14
0

you should avoid apply as its inefficient. You can try this:-

import pandas as pd
import numpy as np

df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                          '2089', '2089', '8974'], 
                 'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                     'speedometer', 'Pluviometer', 'speedometer', 
                                     'Pluviometer']})

def process(x):
    return " ".join(x)

df_new = df.groupby('Code Weather Station').agg({
        'Instrumentation': [('Instrumentation', process)]
    })
df_new.columns = df_new.columns.droplevel()
df_new
tuhinsharma121
  • 186
  • 2
  • 9
  • `.agg` is more efficient when you have `cython` optimized in-built functions, AFAIK. How is it more efficient for custom functions? Any links you can share? – Partha Mandal May 22 '20 at 13:09
  • yeah true. its always recommended to avoid ```apply``` because its just a python for loop, instead use ```map``` which is a vectorized implementation and way faster than ```apply```. ```agg``` uses ```map``` internally (you could check pandas github). But there are situations where ```apply``` cannot be avoided, (eg. handling multiple columns at the same time). But for handling a single column there is no use of using ```apply```. Hope this helps. – tuhinsharma121 May 22 '20 at 13:35