1

I have two datasets, df1 and df2, where:

I would like to perform an 'inner-join' on the date and name columns however, I wish to re-order some of the columns as well.

df1

name    freeG   totalG  sku      date
a       4       10      hi       10/10/2020
b       10      20      hi       10/10/2020
c       8       10      hello    10/10/2020
d       1       5       hello    10/10/2020
e       1       5       hey      10/10/2020
f       2       5       hey      10/10/2020
g       1       5       hey      10/10/2020

df2

total   freeS   usedS   name   date  
10      5       5       a      10/10/2020
20      10      10      b      10/10/2020
30      5       25      c      10/10/2020
40      10      30      d      10/10/2020
50      10      40      e      10/10/2020
10      5       5       f      10/10/2020
10      5       5       g      10/10/2020

Desired Outcome

date           name freeG   totalG  sku     usedS   freeS   total  
10/10/2020      a   4       10      hi      5       5       10
10/10/2020      b   10      20      hi      10      10      20
10/10/2020      c   8       10      hello   25      5       30
10/10/2020      d   1       5       hello   30      10      40
10/10/2020      e   1       5       hey     40      10      50
10/10/2020      f   2       5       hey     5       5       10
10/10/2020      g   1       5       hey     5       5       10

This is what I am doing

import pandas as pd
import numpy as np


new = pd.concat([df1, df2], axis=1, join='inner', inner_on=['date','name'])

//change order of columns

new1 = new[['date', 'name', 'freeG', 'totalG', 'sku', 'usedS', 'freeS', 'total']]

However, I am not able to join using the above syntax. Any suggestion is appreciated

Lynn
  • 4,292
  • 5
  • 21
  • 44
  • 1
    " I am not able to join using the above syntax" please describe what is wrong with your current attempt. Is the output wrong? Is there an error? Detail will help us know how to help you better – G. Anderson Dec 01 '20 at 22:15
  • sure - inner_on = syntax is not correct. I am researching this now – Lynn Dec 01 '20 at 22:17
  • 1
    Rather than using concat if you want to join, why not just use [dataframe.join()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html)? – G. Anderson Dec 01 '20 at 22:18
  • ok thank you @G.Anderson - I will try this - The documentation has only one column place to join on I believe. Could I add more than one? – Lynn Dec 01 '20 at 22:22
  • 1
    Also helpful: [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – G. Anderson Dec 01 '20 at 22:22

1 Answers1

2
new = pd.merge(df1, df2, how='inner',on=['name', 'date'])#merge
new['total']=new.freeS.add(new.usedS)#compute  total

new1 = new[['date', 'name', 'freeG', 'totalG', 'sku', 'usedS', 'freeS', 'total']]   # align columns
new1
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • 1
    This works thank you @wwnde - I was not sure of the correct syntax to use when joining. I was using 'inner_on' . Thank you – Lynn Dec 01 '20 at 22:25
  • Oh Hi @Wwnde now I am getting this error: SyntaxError: unexpected EOF while parsing - maybe I need double quotes? its pointing to this part: on=['name', 'date']) – Lynn Dec 02 '20 at 06:51
  • I figured it out- I removed the brackets. I was only joining on one column just now. I did: join= pd.merge(df1, df2 how='inner', on= 'Name') - perhaps when you join on more than one column you use []? thank you – Lynn Dec 02 '20 at 06:58