-1

I have a data frame like as below

Company_Candidate_Dataframe

Candidate_Detail_Dataframe

Merged Dataframe

I am trying to get Merged DataFrame

I am able to merge Company_Candidate_Dataframe and Candidate_Detail_Dataframe. But that is not what I exactly need.

I also thought of creating a dictionary from Company_Candidate_Dataframe and feed to Candidate_Detail_Dataframe.apply(some lambda)

What should be a better way of doing this?

Edit: Data as text

    Candidate Google Netflix Microsoft Amazon Paypal

    John 0 0 1 0 0
    Mary 1 0 0 0 0
    Ravenshaw 0 1 0 0 0
    Lohan 1 0 0 0 0
    Nile 0 0 0 1 0

-------------------------

Candidate Id Address Score

John 1 ABC 13
Mary 2 XYZ 24
Ravenshaw 3 A1B1 90
Lohan 4 Lqw 102
Nile  5 Arou 23
tausif
  • 672
  • 1
  • 6
  • 15
  • 1
    please paste in your data as text into your question – gyx-hh Oct 02 '18 at 14:10
  • Can a candidate work at multiple companies? – ALollz Oct 02 '18 at 14:19
  • @gyx-hh added text data. Thanks for asking edits. – tausif Oct 02 '18 at 18:09
  • @ALollz in this case, No. However, it becomes interesting for othere data set if it has duplicate entries for a probable index? – tausif Oct 02 '18 at 18:10
  • @PankajJoshi Thanks Pankaj. The solution you suggested works. Please see my comment to your anwser. – tausif Oct 02 '18 at 18:15
  • @PankajJoshi It becomes interesting if we have a duplicate candidate. I don't think idxmax handles that. – tausif Oct 02 '18 at 19:01
  • You are right, it will not work it assumes only one company. In case of duplicate we will have to use df.max(axis=1) and then np.where(df.values == rowmax[:,None]) so obtain all. – quest Oct 02 '18 at 19:31

1 Answers1

4

Here is how I will go:

# Creating the dataframes
df1 = pd.DataFrame({'Candidate':['John', 'Mary', 'Ravenshaw', 'Lohan', 'Nile'], 'Google':[0,1,0,1,0], 'Netflix':[0,0,1,0,0], 'Microsoft':[1,0,0,0,0], 'Amazon':[0,0,0, 0, 1],'Paypal':[0,0,0,0,0]})
df1.set_index('Candidate', inplace=True)

df2 = pd.DataFrame({'Id':range(1,6), 'Address':['ABC', 'XYZ', 'A1B1', 'Lqw', 'Arou'], 'Score':[13,24,90,102,23], 'Candidate':['John', 'Mary', 'Ravenshaw', 'Lohan', 'Nile']})
df2.set_index('Candidate', inplace=True)

# Obtaining the company dataframe
df3 = df1.idxmax(axis=1)
df3.name = 'Company'

# Joining
df2.join(df3)

Hope this helps.

To make the task easy for guys answering I would request you to please post minimal working code.

quest
  • 3,576
  • 2
  • 16
  • 26
  • This works. Thanks. I am new to this whole thing and read interesting stuff about 'idxmax' https://stackoverflow.com/questions/10202570/pandas-dataframe-find-row-where-values-for-column-is-maximal. Also, I got stumbled on unPivot and melt in the Pandas. I am going through https://stackoverflow.com/questions/18259067/unpivot-pandas-data. Will update the question with working solution based out of these methods. – tausif Oct 02 '18 at 18:12
  • Python Data Science Handbook by JVP is an excellent resource. https://www.amazon.com/Python-Data-Science-Handbook-Essential/dp/1491912057/ref=sr_1_3?ie=UTF8&qid=1538504365&sr=8-3&keywords=python+data+science+handbook – quest Oct 02 '18 at 18:21