0

Ignore this question. It is completely different than the actual question I needed to ask. For the people who already answered the question, I'm sorry. Hopefully this will help someone in the future, though.

Read the new thread here: Opening files found from os.listdir() and comparing lines inside?

Basically, I'm running os.listdir() to get a listing of files, and then trying to compare if two different files have similar names. How would I go about this?

Basically, the code is currently this:

config_dir = "/etc/netctl/"

profiles = os.listdir(config_dir)
for i in profiles:
    if os.path.isfile(config_dir + i):
        if i in i:
            print "True"
    else:
        pass

I'm not sure what I would use to check for similarities in the names, though. However, I know "if i in i" is just checking for the same word... but I don't know how I would go about saving the last one...

I also tried:

i2 = ""
profiles = os.listdir(config_dir)
for i in profiles:
    if os.path.isfile(config_dir + i):
        if i2 == "":
            i2 = i
            print i2
        elif i2 == i:
            continue
        if i2 in i:
            print "true"
    else:
        pass

I think I might be overthinking this, though. This is the output of os.listdir:

['hooks', 'interfaces', 'examples', 'ddwrt', 'MomAndKids_wifiz', 'backups', 'MomAndKids']

The files are ddwrt MomAndKids_wifiz and MomAndKids. Basically, I want it to detect that the names "MomAndKids" and "MomAndKids_wifiz" is similar, and then return True.

Community
  • 1
  • 1
Cody Dostal
  • 311
  • 1
  • 3
  • 13
  • "similar" or "one contains the other" and should it be case insensitive? – Inbar Rose May 21 '13 at 08:23
  • Well, this is where it gets complicated. Basically, people can manually make their own profiles for this program and name it whatever they want (so, in essence, they could name it asdfasdfasf) but my program will always generate $NetworkSSID_wifiz. So I just realized this may not work fully. However, in every file, there is "ESSID='$NetworkSSID'". So I would actually need to open the file and compare those lines for each individual file. Editing main question to reflect that. – Cody Dostal May 21 '13 at 08:32
  • This is a drastic change of the question, completely invalidating all existing answers. Please revert the edit and ask a new question. – user4815162342 May 21 '13 at 08:36
  • You can also *delete* a question if no longer relevant... – user4815162342 May 21 '13 at 08:45

1 Answers1

1

This should do it:

from difflib import SequenceMatcher
from glob import glob
from os import path

config_dir = '/etc/netctl'
min_ratio = 0.90 # 90%

profiles = dict((i, {'full_path': v, 'matches': [], 'file_name': path.splitext(path.split(v)[-1])[0]}) for (i, v) in enumerate(glob(config_dir + '/*.*')))

for K, V in profiles.items():
    sm = SequenceMatcher(a=V['file_name'], b='')
    for k, v in profiles.items():
        if K == k or k in V['matches']:
            continue
        sm.set_seq2(v['file_name'])
        if sm.ratio() > min_ratio:
            V['matches'].append(k)
            v['matches'].append(K)

# display the output
for k, v in profiles.items():
    print k, v
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131