0
restart = True
while restart == True:
    option = input("Would you like to compress or decompress this file?\nIf you would like to compress type c \nIf you would like to decompress type d.\n").lower()

    if option == 'c':

        text = input("Please type the text you would like to compress.\n")
        text = text.split()
        for count,word in enumerate(text):

            if text.count(word) < 2:
                order.append (max(order)+1)

            else:
                order.append (text.index(word)+1)



        print (uniqueWords)
        print (order)
        break
    elif option == 'd':
        pass

    else:
        print("Sorry that was not an option")

For part of my assignment I need to identify unique words and send them to a text file. I understand how to write text to a text file I do not understand how I can order this code appropriately so it reproduces in a text file (if I was to input "the world of the flowers is a small world to be in":

the,world,of,flowers,is,a,small,to,be,in 

1, 2, 3, 1, 5, 6, 7, 8, 2, 9, 10 

The top line stating the unique words and the second line showing the order of the words in order to be later decompressed. I have no issue with the decompression or the sorting of the numbers but only the unique words being in order. Any assistance would be much appreciated!

DSM
  • 342,061
  • 65
  • 592
  • 494
Sam
  • 11
  • 3
  • So you have problems with writing the text file or ordering the words? – Tenzin Jan 10 '17 at 19:10
  • if you understand how to write text to a file what are you actually having trouble with? – Tadhg McDonald-Jensen Jan 10 '17 at 19:12
  • no, literally just the sorting so I can write it to the text file in order because at the moment it is appearing like {'flowers', 'small', 'be', 'of', 'world', 'in', 'the', 'to', 'a', 'is'} [1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10] – Sam Jan 10 '17 at 19:13
  • but I need it to be {'the','world','of','flowers','is','a','small','to','be','in'} instead of some random order – Sam Jan 10 '17 at 19:15
  • 1
    The curly brackets `{}` denote a `set`. Sets are unordered. – Patrick Haugh Jan 10 '17 at 19:16
  • @Sam Please edit your question to address comments. – Roland Smith Jan 10 '17 at 19:20
  • yh sorry @RolandSmith this is my first question on this platform so I am not that sure on how it works, but I'll be sure to change my question – Sam Jan 10 '17 at 19:23
  • @Sam If you want to present your solution (after the feedback through the answers) please post it as an answer instead of updating your question (I've rolled it back, I hope I haven't thrown important other changes away). Otherwise it's also perfectly acceptable to just accept the most helpful answer. – MSeifert Jan 10 '17 at 19:32
  • ok will do @MSeifert sorry again! – Sam Jan 10 '17 at 19:34

4 Answers4

2
text = "the world of the flowers is a small world to be in"
words = text.split()
unique_ordered = []
for word in words:
    if word not in unique_ordered:
        unique_ordered.append(word)
rofls
  • 4,993
  • 3
  • 27
  • 37
1
from collections import OrderedDict
text = "the world of the flowers is a small world to be in"
words = text.split()
print list(OrderedDict.fromkeys(words))

output

['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
Shijo
  • 9,313
  • 3
  • 19
  • 31
  • This will not work. First of all, dictionaries don't preserve or store order. You'd need an ordered dict for that. Note that your output does not match the requested output, which is "the,world,of,flowers,is,a,small,to,be,in" – rofls Jan 10 '17 at 19:18
  • 2
    Thanks for mentioning that , just corrected the answer :) – Shijo Jan 10 '17 at 19:23
0

That's an interesting problem, in fact it can be solved using a dictionary to keep the index of the first occurence and to check if it was already encountered:

string = "the world of the flowers is a small world to be in"

dct = {}
words = []
indices = []
idx = 1
for substring in string.split():
    # Check if you've seen it already.
    if substring in dct:
        # Already seen it, so append the index of the first occurence
        indices.append(dct[substring])
    else:
        # Add it to the dictionary with the index and just append the word and index
        dct[substring] = idx
        words.append(substring)
        indices.append(idx)
        idx += 1


>>> print(words)
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
>>> print(indices)
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]

If you don't want the indices there are also some external modules that have such a function to get the unique words in order of appearance:

>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

>>> from more_itertools import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

>>> from toolz import unique
>>> list(unique(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • thanks for the help however I used a dictionary in another iteration of when I tried this code however my teacher at school asked me to just use 2 lists instead of using a dictionary, I assume because it is not necessary for GCSE, but thanks for the help nonetheless however in the dictionary iteration I had to use (sorted(zip(dictionary.values(),dictionary.keys())) in order to sort it – Sam Jan 10 '17 at 19:30
0

To remove the duplicate entries from list whilst preserving the order, you my check How do you remove duplicates from a list in whilst preserving order?'s answers. For example:

my_sentence = "the world of the flowers is a small world to be in"
wordlist = my_sentence.split()

# Accepted approach in linked post 
def get_ordered_unique(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

unique_list = get_ordered_unique(wordlist)
# where `unique_list` holds:
#     ['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

Then in order to print the position of word, you may list.index() with list comprehension expression as:

>>> [unique_list.index(word)+1 for word in wordlist]
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]
Community
  • 1
  • 1
Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126