1

I have the following Dataframe: Original Dataframe I want the following output: output Dataframe

I have tried using groupby on "Container" column (and sum and other columns) but it only gives the first row as output.

I am very new to python and pandas. and not sure if am doing it correct. Some of the answer of slimier questions are too advanced for me to understand. I am just wondering if i can get the output with just 2/3 lines of coding.

syedmfk
  • 111
  • 1
  • 1
  • 4

4 Answers4

0

I believe you could groupby and sum like below. The dropna will drop the NaN/empty values in your DataFrame.

df.dropna().groupby(['Container']).sum()
Darren Christopher
  • 3,893
  • 4
  • 20
  • 37
0
import pandas as pd

d =   [['CXRU',195, 1650,65,23000],
       ['BEAU',140, 26, 3, 575], 
       ['NaN', 140, 60 , 8, 1153]]
df=pd.DataFrame(mylist,columns=['Container','SB No', 'Pkgs', 'CBM','Weight'])
df

enter image description here

sel= df['Container']!='NaN'
df[sel]

enter image description here

roadrunner66
  • 7,772
  • 4
  • 32
  • 38
0

import pandas as pd

df = pd.DataFrame({'id':['aaa', 'aaa', 'bbb', 'ccc', 'bbb', 'NaN', 'NaN', 'aaa', 'NaN'], 'values':[1,2,3,4,5,6,7,8,9]})

df

for i in range(len(df)):

if df.iloc[i,0] == "NaN":

    df.iloc[i,0] = df.iloc[i-1,0]

df.groupby('id').sum()

0

Expected result exactly as the one you showed as "Output Dataframe": first "NaN" values in "Container" column of your Original Dataframe must replaced with the corresponding immediate upper value. I added more "NaN" values to exemplify:

Original DataFrame:

df
       Container   SB No  Pkgs  CBM  Weight
257  CXRU1219452  195375  1650   65   23000
259  BEAU4883430  140801    26    3     575
260          NaN  140868    60    8    1153
261          NaN  140824    11    1     197
262          NaN  140851   253   32    4793
263          NaN  140645    14    1     278
264          NaN  140723     5    0      71
265          NaN  140741     1    0      22
266          NaN  140768     5    0      93
268  SZLU9366565  189355  1800   65   23000
259  ZBCD1234567  100000   100   10    1000
260          NaN  100000   100   10    1000
261          NaN  100000   100   10    1000
262          NaN  100000   100   10    1000

Use "fillna" function with method "ffill" as suggested by [https://stackoverflow.com/a/27905350/6057650][1]

Then you will get "Container" column without "NaN" values:

df=df.fillna(method='ffill')
df    
       Container   SB No  Pkgs  CBM  Weight
257  CXRU1219452  195375  1650   65   23000
259  BEAU4883430  140801    26    3     575
260  BEAU4883430  140868    60    8    1153
261  BEAU4883430  140824    11    1     197
262  BEAU4883430  140851   253   32    4793
263  BEAU4883430  140645    14    1     278
264  BEAU4883430  140723     5    0      71
265  BEAU4883430  140741     1    0      22
266  BEAU4883430  140768     5    0      93
268  SZLU9366565  189355  1800   65   23000
259  ZBCD1234567  100000   100   10    1000
260  ZBCD1234567  100000   100   10    1000
261  ZBCD1234567  100000   100   10    1000
262  ZBCD1234567  100000   100   10    1000

Now you can get the expected "Output DataFrame" using groupby:

df.groupby(['Container']).sum()

               SB No  Pkgs  CBM  Weight
Container                              
BEAU4883430  1126221   375   45    7182
CXRU1219452   195375  1650   65   23000
SZLU9366565   189355  1800   65   23000
ZBCD1234567   400000   400   40    4000
Khristhian
  • 26
  • 5