How to extract extentions and save them without repetition?

Question

I have this code:

    x.write("FolderName\t\"%s\"\nPackName\t\"%s\"\n"% (pakedfolder, filename))
    x.write("\nList\t%s\n{\n" % (comptype))
    for root, dirs, files in os.walk("ymir work"):
       for file in files:
           file = path.splitext(file)[1]
           x.write("\t\"%s\"\n" %(file))
    x.write("}\n")
    x.write("\nList\tFilelist\n{\n")

which generates exactly what I want, but the problem is that there is repetition in the code like this:

".dds"
".mse"
".mse"
".mse"
".mse"
".mse"
".mse"
".dds"
".dds"
".dds"

possible duplicate of [Fastest way to uniqify a list in Python](http://stackoverflow.com/questions/2527405/fastest-way-to-uniqify-a-list-in-python) — ivan_pozdeev, Aug 08 '15 at 15:08
didn't work .... i got out put like this ("['s', 'd', '.']") — Huşȁm Đȁßabñęh, Aug 08 '15 at 15:44
OP, You need to use set on a list, not on a single string value... — Ayy, Aug 08 '15 at 16:21
now i don't want to open new topic but can you tell me how to get the first dir name in path like this "locale\de\ui\costume_bg.jpg" — Huşȁm Đȁßabñęh, Aug 08 '15 at 21:44
What do you exactly want to get? "locale\" or just a dirname. Check following functions os.path.basename(), os.path.dirname(), os.path.split(). — Dalen, Aug 10 '15 at 23:45
thank you for your comment but i don't need to get the first directory in path anymore but i'll save your answer for future use... oh and btw i wannted just locale ... but my problem was that if the path is longer how would the function work i mean some times the path is like this "locale\de\ui\costume_bg.jpg"and some times like this "ymir work\ui\pattern\thinboardcircle\thinboard_corner_leftbottom_circle.tga" and some times very small path just tow directorys — Huşȁm Đȁßabñęh, Aug 11 '15 at 00:53

score 1 · Accepted Answer · edited Aug 11 '15 at 10:41

1

There are more possible solutions and approaches to solve this problem.

Most people (and on SO as well) agree that using a dict is the right way.

steveb here for example. :D

Some would argue that a set() would be more convenient and natural way, but most tests I saw and I did myself show that, for some reason, using a dict() is slightly faster. As for why, nobody really knows. Also this may difer from Python version to Python version.

Dictionaries and sets use hashes to access data and that makes them faster than lists ( O(1) ). To check whether an item is in a list, an iteration is performed over a list, and in worst case number of iterations grow with the list.

To learn more on the subject, I suggest you to examine related questions, especially the one mentioned as possible duplicate.

So, I agree with steveb and propose the following code:

chkdict = {} # A dictionary that we'll use to check for existance of an entry (whether is extension already processed or not)
setdef = chkdict.setdefault # Extracting a pointer of a method out of an instance may lead to faster access, thus improving performance a little
# Recurse through a directory:
for root, dirs, files in os.walk("ymir work"):
    # Loop through all files in currently examined directory:
    for file in files:
        ext = path.splitext(file) # Get an extension of a file
        # If file has no extension or file is named ".bashrc" or ".ds_store" for instance, then ignore it, otherwise write it to x:
        if ext[0] and ext[1]: ext = ext[1].lower()
        else: continue
        if not ext in chkdict:
            # D.setdefault(k[, d]) does: D.get(k, d), also set D[k] = d if k not in D
            # You decide whether to use my method with dict.setdefault(k, k)
            # Or you can write ext separately and then do: chkdict[ext] = None
            # Second solution might even be faster as setdefault() will check for existance again
            # But to be certain you should run the timeit test
            x.write("\t\"%s\"\n" % setdef(ext, ext))
            #x.write("\t\"%s\"\n" % ext)
            #chkdict[ext] = None
del chkdict # If you're not inside a function, better to free the memory as soon as you can (if you don't need the data stored there any longer)

I use this algorithm on large amount of data and it performs very well.

edited Aug 11 '15 at 10:41

Bulat

6,869
1
29
52

answered Aug 08 '15 at 16:18

Dalen

4,128
1
17
35

it helps to have some explanation in addition to the code. – Bulat Aug 08 '15 at 19:13
@Bulat : If by your comment you mean there is not enough information, then I can only explain that I extracted setdefault from dictionary instance to speed up access to the function, and that I used it to add an extension to the dict in combination with returning this same extension to write it in. See help(dict.setdefault). What is there more to add? Code is neat and clean and it does what is required with pretty good performance. – Dalen Aug 08 '15 at 22:00
http://meta.stackoverflow.com/questions/256359/flag-try-this-code-answers-as-very-low-quality?lq=1 – Bulat Aug 08 '15 at 22:53
General idea on SO as understand it is that answers with some explanation of what exactly solves the problem and why are more helpful than just code. You would get more credit if you explain your solution. – Bulat Aug 08 '15 at 22:57
@Bulat : You're free to add more info. Didn't I explained enough? stevieb already explained use of dictionary, why repeat that. As you can see OP accepted my answer. And I think he understand how it works, else he would ask for more explanation. I am not here in persew of reputation, I want to help people and to receive help if I need some. – Dalen Aug 10 '15 at 21:58
I am glad that solution worked, just think of people who see this answer without explanation, I don't think it is as helpful as it can be. – Bulat Aug 10 '15 at 22:02
Please edit my post or tell me what exactly you wish me to explain more and I'll do it. I was in a hurry, saw that stevieb explained dicts, so I think that this is enough. Thought that stuff with dict.setdefault() might be a problem and decided to edit the post later. But as you asked I explained in a comment. You're free to down vote me if you think that I am a stubbern donkey, I won't get angry. – Dalen Aug 10 '15 at 22:03
No, it is not about down-voting. I am reviewing answers and guideline of the review process is to let you know that answer can be improved, which I did. It is entirely up to you what you do with this, nothing personal ) – Bulat Aug 10 '15 at 22:05
That is much better ) now I understand the direction of your answer. – Bulat Aug 11 '15 at 10:44

score 0 · Answer 2 · answered Aug 08 '15 at 16:03

0

Use a dict. Here's an example...

files = ['this.txt', 'that.txt', 'file.dat', 'a.dat', 'b.exe']

exts = {}

for file in files:
    exts[file.split('.')[1]] = 1

for ext in exts:
    print(ext)

Output:

dat
txt
exe

answered Aug 08 '15 at 16:03

stevieb

9,065
3
26
36

Hey, no problem! As long as your problem gets solved, we all learn new ways to do things. That's at least my objective to belonging to sites such as this ;) – stevieb Aug 08 '15 at 22:09

How to extract extentions and save them without repetition?

2 Answers2