3

Coming from R and trying to simulate dplyr with dfply package in Python. Need some help.

I have two questions here and please help.

  1. How I join two datasets if those join columns have different names?
  2. Is there way I join more than column? As per documentatioon, I can join by only one column.

https://github.com/kieferk/dfply#joining

I like dfply package but it is lacking critical functionality. Thanks for your help. Or help me any other packages in Python like R dplyr

www
  • 38,575
  • 12
  • 48
  • 84
Murali
  • 579
  • 1
  • 6
  • 20

1 Answers1

3

dfply package is build on the great pandas package in python. Its documentation serves mostly to guide you towards its underlying functionalities. If you go to its github repo and find the join.py file you can see the underlying implementation of various joins relies on df.merge function of pandas.

so to answer your questions (hopefully it is not too late):

  1. How I join two datasets if those join columns have different names?

    df>> inner_join(other, by=('A_c1','B_c1'))
    
  2. Is there way I join more than column? As per documentatioon, I can join by only one column.

    df>> inner_join(other, by=[('A_c1','B_c1'),('A_c2','B_c2')])
    

one thing I need to add is, at the time of this writing, Oct 2018, you have to install the develop version of the package, which have the multi-column join functionality added.

PaulDong
  • 711
  • 7
  • 19