1

I recently asked about how to merge Dataframes, the thing is i need this issue in adition to what i previously asked. (Previous question: Python pandas, build a dataframe from 2 dataframes with these properties )

i'm in need to solve this issue.

I need to build a whole dataframe from two dataframes, but include only certain info from a second Dataframe if required.

EXAMPLE:

DF1:

        MATERIAL_N°    Description    DATE DUE
    0   123123300      Lightbulb X    01/05/2018
    1   220466         Lightbulb Y    04/04/2018
    2   220000         Lightbulb Z    07/07/2018
    3   1241241        Lightbulb A    02/01/2019
    4   7775447        Lightbulb B    02/01/2019

DF2:

            BG GROUP    MATERIAL N° TRANSIT TIME   QUANTITY
        0   9001        123123300   45D            40
        1   9002        220466      30D            50
        2   9004        220000      30D            120
        3   9003        44124       20D            110
        4   9000        2512222     15D            170
        5   9002        1241241     40D            10
        6   9001        123123300   45D            60

EXPECTED RESULT DF3:

    MATERIAL N° Description     DATE DUE    BG GROUP TRANSIT TIME  INCOMING
0   123123300   Lightbulb X     01/05/2018  9001.0        45D      100
1   220466      Lightbulb Y     04/04/2018  9002.0        30D      50
2   220000      Lightbulb Z     07/07/2018  9004.0        30D      120
3   1241241     Lightbulb A     02/01/2019  9002.0        40D      10
4   7775447     Lightbulb B     02/01/2019  NaN           NaN      NaN

The result i expect to get is the sum of incoming loads of products acording to DF2 after the merge of these two dataframes (Bear in mind that i am adding quantities for each material number)

I've tried via functions outside the DF or pandas, but im sure there is a much more simple way to deal with this that Pandas module covers.

PD: This community is awesome, i really appreciate its feedback, it really helps with my personal projects.

Kudos

Javilg
  • 11
  • 1

1 Answers1

0

You can do the left merge then re-create the quantity columns using transform and then drop duplicates

df3 = df1.merge(df2, how = 'left')
df3['QUANTITY'] = df3.groupby('MATERIAL N°').QUANTITY.transform('sum')
df3.drop_duplicates(subset = 'MATERIAL N°', keep = 'last', inplace = True)

You get

    MATERIAL N° Description DATE DUE BG GROUP TRANSIT TIME  QUANTITY
1   123123300   Lightbulb X 01/05/2018  9001.0    45D       100.0
2   220466      Lightbulb Y 04/04/2018  9002.0    30D       50.0
3   220000      Lightbulb Z 07/07/2018  9004.0    30D       120.0
4   1241241     Lightbulb A 02/01/2019  9002.0    40D       10.0
5   7775447     Lightbulb B 02/01/2019  NaN       NaN       NaN
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • Thanks again Vaishali for answering my doubts. If you were in Buenos Aires, id totally employ you haha. Cheers – Javilg Sep 23 '17 at 00:05
  • Stack overflow way of appreciating one's effort is by upvoting and accepting the answer:) – Vaishali Sep 23 '17 at 01:15