1

I'm trying to create a program that needs to search a csv file for matching values in another csv file.

Here is what I have so far:

import pandas as pd
import numpy as np

listings = pd.read_csv("data/listings.csv")
inventoryValue = pd.read_csv("data/inventoryValue.csv")

#get rid of rows with empty values in column 'Item Number'

listings['Item Number'].replace('', np.nan, inplace=True)
listings.dropna(subset=['Item Number'], inplace=True)

#get rid of rows with empty values in column 'AvgCost'

inventoryValue['Avg Cost'].replace('', np.nan, inplace=True)
inventoryValue.dropna(subset=['Avg Cost'], inplace=True)


#here how can I search for all of the rows in inventoryValue[Item Number] based on Listings[Item Number]

So basically I need to use Item Number column in listings to find rows with matching Item Number in inventoryValue, from there I can get the columns I need in Inventory Value and save the file.

Any help is much appreciated!

Jonathan Zier
  • 150
  • 2
  • 14

1 Answers1

2

I believe what you want can be achieved using isin. This method is used to filter data frames by selecting rows with having a particular value in a particular column.

In your case, you can create a list that contains all the unique values of Listings['Item Number'], and then check which of the elements are present in inventoryValue['Item Number'], and return back a reduced dataframe:

my_list = listings['Item Number'].unique().tolist()
new_inventoryValue = inventoryValue[inventoryValue['Item Number'].isin(my_list)]

Which will return back a smaller dataframe (row-wise), with all the columns, but your 'Iterm Number'' column will have only the elements in my_list.

sophocles
  • 13,593
  • 3
  • 14
  • 33
  • Yes that seems to be what I need! Do you have any idea why new_inventoryValue has around 2000 less rows than the original listings.csv? – Jonathan Zier Feb 04 '21 at 17:02
  • That is a question that can potentially have many answers. One thing you could do is check the common elements in the 2 columns, if there are many common elements, then the resulting df should be a bit smaller, if there are many differences, then the resulting df should have much less rows. You can get the common elements using: ```[i for i in inventoryValue['Item Number'].unique() if i in listings['Item Number'].unique()]``` – sophocles Feb 04 '21 at 17:05
  • That makes sense, what confuses me is that inventoryValue.csv should contain all of the Item Numbers as the listings.csv, so the resulting frame should be the same as listings. Edit: there are also no duplicates – Jonathan Zier Feb 04 '21 at 17:12
  • Yes exactly, if they contain the same item numbers the resulting dataframe should be the same. Can you post a sample of your data so that I can have a look? I am afraid I can't help more without looking at it. Check for inconsistencies in spelling, extra spaces, etc..? – sophocles Feb 04 '21 at 17:16
  • I found the problem in the data, so all seems to be working smooth now :) Thank you very much for your help. I'm more familiar with javascript and php, I figured that this project with csv's would be ideal to get started with python. – Jonathan Zier Feb 04 '21 at 17:26
  • I also transitioned from R to Python a year ago, I know the feeling. Glad I could help. Happy learning. – sophocles Feb 04 '21 at 17:28
  • Do you know how I can have new_inventoryValue include the rest of the columns from listings? – Jonathan Zier Feb 04 '21 at 18:39
  • I think you mean merge the 2 datasets together, as they were separated before. Check out this link: https://stackoverflow.com/questions/53645882/pandas-merging-101. It should be enough to follow and get your desired outcome. If you can't, let me know and I'll try to help you out. Thanks. – sophocles Feb 04 '21 at 19:02