Dictionary and position list back to sentence

Question

I've managed to get my program to store a sentence or two into a dictionary and at the same time create a word position list.

What I need to do now is recreate the original sentence just from the dictionary and the position list. I've done lots of searches but the results I'm getting are either not what I need or are to confusing and beyond me.

Any help would be much appreciated, thanks.

Here is my code so far:

sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           

print ('This is the sentence:', sentence)       

punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]         

for punct in punctuation:                    

    sentence = sentence.replace(punct," %s" % punct)            

print ('This is the sentence with spaces before the punctuations:', sentence)         

words_list = sentence.split()           

print ('A list of the words in the sentence:', words_list)         

dictionary = {}             

word_pos_list = []      

counter = 0                

for word in words_list:                     

    if word not in dictionary:              
        counter += 1                        
        dictionary[word] = counter          

    word_pos_list.append(dictionary[word])      

print ('The positions of the words in the sentence are:', word_pos_list)

John

Do you have some example input and output? Can you provide the code? — Wayne Werner, Jan 21 '16 at 20:27
Please note that dictionaries don't hold a sort: http://stackoverflow.com/questions/15479928/why-is-the-order-in-python-dictionaries-and-sets-arbitrary — n1c9, Jan 21 '16 at 20:28
I understand that you can't use a simple dictionary to do what the OP is trying to do, but I don't understand why all the downvotes if its purely an exercise trying to get to something else. — Kelvin, Jan 21 '16 at 20:47

score 0 · Answer 1 · answered Jan 21 '16 at 20:46

While, as mentioned in comments, dictionaries are not sorted datastructures, if you are breaking up a sentence and indexing it into a dictionary and are trying to put it back together, you can try to use an OrderedDict from the collections library to do what you're doing.

That said, this is without any sort of further background or knowledge of how you are splitting your sentence (punctuation etc, I suggest looking into NLTP if you are doing any sort of natural language processing(NLP)).

from collections import OrderedDict
In [182]: def index_sentence(s):
.....:       return {s.split(' ').index(i): i for i in s.split(' ')}
.....:

In [183]: def build_sentence_from_dict(d):
.....:       return ' '.join(OrderedDict(d).values())
.....:

In [184]: s
Out[184]: 'See spot jump over the brown fox.'

In [185]: id = index_sentence(s)

In [186]: id
Out[186]: {0: 'See', 1: 'spot', 2: 'jump', 3: 'over', 4: 'the', 5: 'brown', 6: 'fox.'}

In [187]: build_sentence_from_dict(id)
Out[187]: 'See spot jump over the brown fox.'

In [188]:

These routines have the glitch that if the same word is used twice in a sentence, one instance will be lost. E.g. try "See my dog spot jump over my brown fox." — cdlane, Jan 21 '16 at 21:53
I'll fix that as soon as I have the opportunity. It's encapsulated in the index_sentence function because how it's broken up isn't the focus of the answer. — Kelvin, Jan 21 '16 at 22:57

score 0 · Answer 2 · answered Jan 21 '16 at 21:37

To reconstruct from your list you have to reverse the location mapping:

# reconstruct
reversed_dictionary = {x:y for y, x in dictionary.items()}
print(' '.join(reversed_dictionary[x] for x in word_pos_list))

This can be done more nicely using a defaultdict (dictionary with predifined default value, in your case a list of locations for the word):

#!/usr/bin/env python3.4

from collections import defaultdict

# preprocessing
sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           
punctuation = ['()?:;,.!/"\'']         
for punct in punctuation:                    
    sentence = sentence.replace(punct," %s" % punct)

# using defaultdict this time
word_to_locations = defaultdict(list)
for part in enumerate(sentence.split()):
    word_to_locations[part[1]].append(part[0])

# word -> list of locations
print(word_to_locations)

# location -> word
location_to_word = dict((y, x) for x in word_to_locations for y in word_to_locations[x])
print(location_to_word)

# reconstruct
print(' '.join(location_to_word[x] for x in range(len(location_to_word))))

score 0 · Answer 3 · answered Jan 21 '16 at 21:47

It's not the randomness of dictionary keys that's the problem here, it's the failure to record every position at which a word was seen, duplicate or not. The following does that and then unwinds the dictionary to produce the original sentence, sans punctuation:

from collections import defaultdict

sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           

print ('This is the sentence:', sentence)       

punctuation = set('()?:;\\,.!/"\'')  

sentence = ''.join(character for character in sentence if character not in punctuation)

print ('This is the sentence with no punctuation:', sentence)

words = sentence.split()

print('A list of the words in the sentence:', words)         

dictionary = defaultdict(list)            

last_word_position = 0   

for word in words:                     

    last_word_position += 1                        

    dictionary[word].append(last_word_position)         

print('A list of unique words in the sentence and their positions:', dictionary.items())         

# Now the tricky bit to unwind our random dictionary:

sentence = []

for position in range(1, last_word_position + 1):
    sentence.extend([word for word, positions in dictionary.items() if position in positions])

print(*sentence)

The output of the various print() statements:

This is the sentence: This Sentence is a very, very good sentence. Did you like my very good sentence?
This is the sentence with no punctuation: This Sentence is a very very good sentence Did you like my very good sentence
A list of the words in the sentence: ['This', 'Sentence', 'is', 'a', 'very', 'very', 'good', 'sentence', 'Did', 'you', 'like', 'my', 'very', 'good', 'sentence']
A list of unique words in the sentence and their positions: dict_items([('Sentence', [2]), ('is', [3]), ('a', [4]), ('very', [5, 6, 13]), ('This', [1]), ('my', [12]), ('Did', [9]), ('good', [7, 14]), ('you', [10]), ('sentence', [8, 15]), ('like', [11])])
This Sentence is a very very good sentence Did you like my very good sentence

Dictionary and position list back to sentence

3 Answers3