0

I currently have a list of store names and would like to find out if there are different variations of store names. I currently have about 50 or so names. I tried to use Fuzzy, but I deleted the code because it was not working well. The data does not have any duplicates. However, I would like to learn how to code it if it did have duplicates. What code can I use to double-check if there are different variations and duplicates?

This was the code I used.

import sqldf

q = """
    SELECT DISTINCT store_name
    FROM df
    ORDER BY store_name asc

"""

unique_sn = sqldf.run(q)
print(unique_sn.iloc[:,0:])

Here is the list.

                                           store_name
0                         3351 - Albuquerque, NM (XF)
1                            3352 - Lakewood, CO (XF)
2                    3353 - Colorado Springs, CO (XF)
3                            3354 - Thornton, CO (XF)
4                          3355 - Las Cruces, NM (XF)
5                             3356 - Boulder, CO (XF)
6                          3357 - Centennial, CO (XF)
7                              3358 - Denver, CO (XF)
8                            3359 - Loveland, CO (XF)
9                              3360 - Arvada, CO (XF)
10                           3361 - Longmont, CO (XF)
11                             3362 - Pueblo, CO (XF)
12                       3363 - Fort Collins, CO (XF)
13  3364 - Barnes Marketplace - Colorado Springs, ...
14         3365 - Gardens on Havana - Aurora, CO (XF)
15    3367 - Animas Valley Mall - Farmington, NM (XF)
16          3368 - Prairie Center - Brighton, CO (XF)
17          3369 - Plaza Santa Fe - Santa Fe, NM (XF)
18  3370 - Promenade at Castle Rock - Castle Rock,...
19               3371 - Crown Point - Parker, CO (XF)
20   3372 - The Shops at NorthCreek - Denver, CO (XF)
21  3373 - Orchard Town Center - Westminster, CO (XF)
22  3374 -Shops at Walnut Creek -Westminster, CO (XF)
23                               3403 - Park City, UT
24                               3453 - Orem, UT (XF)
25                     3454 - Tucson - River, AZ (XF)
26                             3455 - Draper, UT (XF)
27                            3456 - Layton2, UT (XF)
28                     3457 - Salt Lake City, UT (XF)
29             3458 - Academy Square - Logan, UT (XF)
30         3459 - Arizona Pavilions - Tucson, AZ (XF)
31       3460 - Jordan Landing - West Jordan, UT (XF)
32             3461 - Fashion Plaza - Murray, UT (XF)
33        3463 - Summit Place - Silverthorne, CO (XF)
34  3464 - Glenwood Meadows - Glenwood Springs, CO...
35       59000 - Southwest Plaza - Littleton, CO (XF)
36   59001 - Applewood Village - Wheat Ridge, CO (XF)
37                 59002 - Greeley - Greeley, CO (XF)
38     59003 - Northfield Stapleton - Denver, CO (XF)
39  59008 - Southglenn/Cherry Hills - Greenwood Vi...
40             59009 - South Aurora - Aurora, CO (XF)
41  59011 - River Point at Sheridan - Sheridan, CO...
42  59031 - Hunter's Crossing - American Fork, UT ...
43      59032 - Sugarhouse  - Salt Lake City, UT (XF)
44  59033 -Mountain View Village -  Riverton, UT (XF)
45  59034 - Highbury Centre - West Valley City, UT...
46            59038 -  Diamond Plaza - Ogden, UT (XF)
47  59046 - Broadmoor Towne Center - Colorado Spri...
48                   59055 - Albuquerque, NM (Uptown)
49          59056 - Albuquerque, NM (Cottonwood) (BP)
Alan Siu
  • 1
  • 1
  • 1
    What are you asking? How to find fuzzy duplicates or how to verify them? Does your data set have confirmed duplicates that you can use as a test case? Please clarify. – Woodford Nov 23 '22 at 20:01
  • 1
    How were you using fuzzy? Posting that might be more constructive. It'd also help to make a [mre] with minimal data and your desired output. See also [How to make good reproducible pandas examples](/q/20109391/4518341). BTW, welcome to Stack Overflow! Check out the [tour], and [ask] if you want more tips, like how to write a good title. – wjandrea Nov 23 '22 at 20:08
  • The data has no confirmed duplicates. However, I am trying to find a code on how to see if there are duplicates. Also, I really didn't know how to use fuzzy so any suggestions would be great! – Alan Siu Nov 23 '22 at 21:07
  • @Woodford I forgot to tag you. – Alan Siu Nov 24 '22 at 06:24

0 Answers0