0

I have two excel databases and they each have one shared column identifier; they both have SKU product identifiers. I am trying to find a way to match the SKU numbers from database 1 to database 2 and then print out the product information that is in database 1. So, for example, if database 1 contains:

           SKU  Price  Name
    0  SKU-001  54     John
    1  SKU-002  6      Mike
    2  SKU-003  5      Dan

And if database 2 contains:

          SKU  Price   Name
    0  SKU-004     50   Dave
    1  SKU-003     23   Ted
    2  SKU-050     43   Kevin

I am trying to find the SKU in each database that matches. Both databases contain SKU-003 and therefore I am trying to combine that row with SKU-003. I have used the merge database in pandas:

 pd.merge(db1, db2, on='SKU')

 and the output:

     SKU    Price_x Name_x  Price_y Name_y
0   SKU-003   23        Ted  5       Dan

This is correct. However, when I try it with the 2 databases in question it is not working correctly. The first database has 3715 rows and the second has about 2800. The second database has the actual live/active listings. The first database has all of the SKU's that I had ever listed. I need to match the active listings in Database 2 with the active listings in Database 1. I can do that with the SKU because both databases have that matching characteristic. Both databases have different columns besides that identifier (the SKU). I do not think the merge database works for this as I have tried:

     pd.merge(db3, db4, on='SKU', how='left')

the output from that code, for example, shows a book that is no longer active and not in the database 2. It should not show up because I only need active listings from the one database to match up with the listings in the other database-via SKU. I would show the output but there are 75 columns and 3715 rows in that database.

Is there a way to match up the SKU from database 2 to database 1 and show only the data/rows from the database that contains all of the listings?

Also, is there a way to delete NaN rows?

Also, when I merge databases, the column merged on does not show up in the output (in this case the SKU). How can I show that information as well for each row?

jumpman23
  • 19
  • 5
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Naveed Oct 03 '22 at 12:29

0 Answers0