0

I have a series of containing zip codes like

zip_codes = pd.Series(['10001', '1020', '98068'])

Now I have to compare it against a master table containing all the US zip codes and create a a Boolean series stating if a match is found or not.

zip_master = pd.DataFrame([['98292', 'Lake Ketchum'], ['98068', 'Roslyn'], ['99013', 99013]], columns=['Zip Code', 'City Name'])

Is there a vectorised way to do this? I looked into series string methods here, but could not figure out if its the right thing to use.

EDIT 1: As per the comments - we can use the dataframe method isin

So my main initial question is answered. I would like to extend this question little further. Is it possible to partial string matching in a vectorised way. Say I have a series containing city names and I want to match it against City Name of zip master. ? I have seen string matching being used like here- But its not vectorised. Is there any efficient vectorised method for it

Should I use some other technique like cacheing or data base to get this done?

Mithun Manohar
  • 516
  • 1
  • 6
  • 18
  • 1
    You can check `isin`. – BENY Oct 08 '18 at 03:18
  • 1
    `zip_codes.isin(zip_master['Zip Code'])` – Alexander Oct 08 '18 at 03:20
  • Sorry for my ignorance. This exactly what I want ! Do you have any suggestions for partial string matching. Say I have a series from containing city names and I want to match it against City Name of zip master. Is there any efficient vectorised method for it ? I have seen fuzzy matching being used - is there a vectorised implementation for the same in pandas? – Mithun Manohar Oct 08 '18 at 03:30
  • @NisseEngström I have fixed it – Mithun Manohar Oct 08 '18 at 03:38
  • for second requirement something like `zip_master.loc([zip_master['Roslyn']==98068)]` – Karn Kumar Oct 08 '18 at 04:08
  • or `zip_master[zip_master['City Name'].str.contains("Roslyn")==True]` – Karn Kumar Oct 08 '18 at 04:14
  • @pygo It would not match for 'New Jersy' if master zip record has 'New Jersey'. My data is coming from a user form, and may contain misspellings. – Mithun Manohar Oct 08 '18 at 05:08
  • you can use `regex` like `str.startswith(^Ne*y$)` as it will meet New or NEW and ending with Jersey or Jersy, i have not tested ths but it works like this. – Karn Kumar Oct 08 '18 at 05:12
  • Yeah. That will work if I have only few comparisons to make. But i have thousands of rows that need to be compared against the master and hard-coded regex approach might not be the best solution. – Mithun Manohar Oct 08 '18 at 05:14

0 Answers0