1

I have 3 df like df1, df2, df3 and they read a .csv file. I want to combine df1 & df2 and then have the result do a lookup on df3 and pull the values that are the same in a certain key that's both in df3 and result. So basically a vlookup.

This is how I concatenated df1&df2:

result = pd.concat(frames,keys=['data1','data2'])

I performed the merge like this:

result = pd.merge(result, df3, left_on='ProducerID', right_index=True, how='left', sort=False)

But my results are these:

        Key_x   otherColumns    Key_y   otherColumns
L   0   5763460 NaN             NaN     NaN
L   1   5926058 NaN             NaN     NaN
L   2   5052233 NaN             NaN     NaN
L   3   5526565 NaN             NaN     NaN
L   4   5059073 NaN             NaN     NaN
L   5   5975975 NaN             NaN     NaN

I should mention, key is not in the index. I abbreviated the data. key is just a column in the middle of my data.

And these are snippets of what my data files look like:

df1                 
Column 1    Column 2    Column 3    Column n…   Key     Column …n
data1       data        data        data        5763460 data
data2       data        data        data        5926058 data
data1       data        data        data        5052233 data

df2                 
Column 1    Column 2    Column 3    Column n…   Key     Column …n
data1       data        data        data        5763460 data
data1       data        data        data        5926058 data
data2       data        data        data        5052233 data

df3                 
Key     Column 1    Column 2            
5763460 data        data            
5926058 data        data            
5052233 data        data

result                  
Column 1    Column 2    Column 3    Column n…   Key     Column …n
data1       data        data        data        5763460 data
data1       data        data        data        5926058 data
data2       data        data        data        5052233 data

So result will look just like df1 just larger and grouped by the contents of Column 1 which are data1 and data2

rubito
  • 327
  • 4
  • 15
  • 1
    [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) ;) – MaxU - stand with Ukraine Sep 29 '16 at 18:37
  • @MaxU how about now?? :3 – rubito Sep 29 '16 at 20:05
  • It's still not very clear... What are `['genericName','otherName']`? `df1 == df2`, so if people would use them for answering your question they would have 3 duplicates as the result of `pd.concat(...)` and so on... The key word in my first comment was - `reproducible`. You should try your code against your sample DFs and you didn't specify your desired data set ... ;) – MaxU - stand with Ukraine Sep 29 '16 at 20:11
  • Those are just placeholders, I'm using them there to group by those names. So I end up with a datafile grouped by name1 and then name2. I edited the question to reflect this comment – rubito Sep 29 '16 at 20:13
  • Usually if posters don't provide reproducible input data sets and their desired data set, then it ends with the following comment coming from the Original Poster (OP) - it's not what I need / was expecting to get / etc... Look, people will invest their time in order to help you to find an answer, so put some effort into reproducible samples / examples. No offends, OK ? ;) – MaxU - stand with Ukraine Sep 29 '16 at 20:17
  • I got you man! I hope I didn't sound like I'm complaining. I guess I was just trying to make it a brief question. I thought maybe I was missing some syntax and didn't think it was gonna be more involved than that. – rubito Sep 29 '16 at 20:20
  • looking at your resulting DF - you don't seem to have matching rows in your "right" DF (in the `pd.merge()`) hence NaNs – MaxU - stand with Ukraine Sep 29 '16 at 20:23
  • But that's the thing, I know for a fact that the key is in both the `result` and `df3` dataframes. So I should be able to pull the values that are in `df3['column1']`. I thought this was how you do a vlookup in pandas? Maybe I'm going about this the wrong way – rubito Sep 29 '16 at 20:31
  • 1
    Maybe... We would need to see a __reproducible__ data sets in order to give you a qualified answer. Good luck! – MaxU - stand with Ukraine Sep 29 '16 at 20:35

0 Answers0