How to sort unique words in order of appearance?

Question

restart = True
while restart == True:
    option = input("Would you like to compress or decompress this file?\nIf you would like to compress type c \nIf you would like to decompress type d.\n").lower()

    if option == 'c':

        text = input("Please type the text you would like to compress.\n")
        text = text.split()
        for count,word in enumerate(text):

            if text.count(word) < 2:
                order.append (max(order)+1)

            else:
                order.append (text.index(word)+1)



        print (uniqueWords)
        print (order)
        break
    elif option == 'd':
        pass

    else:
        print("Sorry that was not an option")

For part of my assignment I need to identify unique words and send them to a text file. I understand how to write text to a text file I do not understand how I can order this code appropriately so it reproduces in a text file (if I was to input "the world of the flowers is a small world to be in":

the,world,of,flowers,is,a,small,to,be,in 

1, 2, 3, 1, 5, 6, 7, 8, 2, 9, 10

The top line stating the unique words and the second line showing the order of the words in order to be later decompressed. I have no issue with the decompression or the sorting of the numbers but only the unique words being in order. Any assistance would be much appreciated!

So you have problems with writing the text file or ordering the words? — Tenzin, Jan 10 '17 at 19:10
if you understand how to write text to a file what are you actually having trouble with? — Tadhg McDonald-Jensen, Jan 10 '17 at 19:12
no, literally just the sorting so I can write it to the text file in order because at the moment it is appearing like {'flowers', 'small', 'be', 'of', 'world', 'in', 'the', 'to', 'a', 'is'} [1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10] — Sam, Jan 10 '17 at 19:13
but I need it to be {'the','world','of','flowers','is','a','small','to','be','in'} instead of some random order — Sam, Jan 10 '17 at 19:15
yh sorry @RolandSmith this is my first question on this platform so I am not that sure on how it works, but I'll be sure to change my question — Sam, Jan 10 '17 at 19:23
@Sam If you want to present your solution (after the feedback through the answers) please post it as an answer instead of updating your question (I've rolled it back, I hope I haven't thrown important other changes away). Otherwise it's also perfectly acceptable to just accept the most helpful answer. — MSeifert, Jan 10 '17 at 19:32

score 2 · Answer 1 · answered Jan 10 '17 at 19:13

2

text = "the world of the flowers is a small world to be in"
words = text.split()
unique_ordered = []
for word in words:
    if word not in unique_ordered:
        unique_ordered.append(word)

answered Jan 10 '17 at 19:13

rofls

4,993
3
27
37

Shijo · Answer 2 · 2017-01-10T19:24:12.553

1

from collections import OrderedDict
text = "the world of the flowers is a small world to be in"
words = text.split()
print list(OrderedDict.fromkeys(words))

output

['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

edited Jan 10 '17 at 19:24

answered Jan 10 '17 at 19:16

Shijo

9,313
3
19
31

This will not work. First of all, dictionaries don't preserve or store order. You'd need an ordered dict for that. Note that your output does not match the requested output, which is "the,world,of,flowers,is,a,small,to,be,in" – rofls Jan 10 '17 at 19:18
2

Thanks for mentioning that , just corrected the answer :) – Shijo Jan 10 '17 at 19:23

MSeifert · Answer 3 · 2017-01-10T19:29:40.787

That's an interesting problem, in fact it can be solved using a dictionary to keep the index of the first occurence and to check if it was already encountered:

string = "the world of the flowers is a small world to be in"

dct = {}
words = []
indices = []
idx = 1
for substring in string.split():
    # Check if you've seen it already.
    if substring in dct:
        # Already seen it, so append the index of the first occurence
        indices.append(dct[substring])
    else:
        # Add it to the dictionary with the index and just append the word and index
        dct[substring] = idx
        words.append(substring)
        indices.append(idx)
        idx += 1


>>> print(words)
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
>>> print(indices)
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]

If you don't want the indices there are also some external modules that have such a function to get the unique words in order of appearance:

>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

>>> from more_itertools import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

>>> from toolz import unique
>>> list(unique(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

thanks for the help however I used a dictionary in another iteration of when I tried this code however my teacher at school asked me to just use 2 lists instead of using a dictionary, I assume because it is not necessary for GCSE, but thanks for the help nonetheless however in the dictionary iteration I had to use (sorted(zip(dictionary.values(),dictionary.keys())) in order to sort it — Sam, Jan 10 '17 at 19:30

score 0 · Answer 4 · edited May 23 '17 at 12:33

To remove the duplicate entries from list whilst preserving the order, you my check How do you remove duplicates from a list in whilst preserving order?'s answers. For example:

my_sentence = "the world of the flowers is a small world to be in"
wordlist = my_sentence.split()

# Accepted approach in linked post 
def get_ordered_unique(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

unique_list = get_ordered_unique(wordlist)
# where `unique_list` holds:
#     ['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

Then in order to print the position of word, you may list.index() with list comprehension expression as:

>>> [unique_list.index(word)+1 for word in wordlist]
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]

How to sort unique words in order of appearance?

4 Answers4