I have two excel databases and they each have one shared column identifier; they both have SKU product identifiers. I am trying to find a way to match the SKU numbers from database 1 to database 2 and then print out the product information that is in database 1. So, for example, if database 1 contains:
SKU Price Name
0 SKU-001 54 John
1 SKU-002 6 Mike
2 SKU-003 5 Dan
And if database 2 contains:
SKU Price Name
0 SKU-004 50 Dave
1 SKU-003 23 Ted
2 SKU-050 43 Kevin
I am trying to find the SKU in each database that matches. Both databases contain SKU-003 and therefore I am trying to combine that row with SKU-003. I have used the merge database in pandas:
pd.merge(db1, db2, on='SKU')
and the output:
SKU Price_x Name_x Price_y Name_y
0 SKU-003 23 Ted 5 Dan
This is correct. However, when I try it with the 2 databases in question it is not working correctly. The first database has 3715 rows and the second has about 2800. The second database has the actual live/active listings. The first database has all of the SKU's that I had ever listed. I need to match the active listings in Database 2 with the active listings in Database 1. I can do that with the SKU because both databases have that matching characteristic. Both databases have different columns besides that identifier (the SKU). I do not think the merge database works for this as I have tried:
pd.merge(db3, db4, on='SKU', how='left')
the output from that code, for example, shows a book that is no longer active and not in the database 2. It should not show up because I only need active listings from the one database to match up with the listings in the other database-via SKU. I would show the output but there are 75 columns and 3715 rows in that database.
Is there a way to match up the SKU from database 2 to database 1 and show only the data/rows from the database that contains all of the listings?
Also, is there a way to delete NaN rows?
Also, when I merge databases, the column merged on does not show up in the output (in this case the SKU). How can I show that information as well for each row?