-1

I've managed to get my program to store a sentence or two into a dictionary and at the same time create a word position list.

What I need to do now is recreate the original sentence just from the dictionary and the position list. I've done lots of searches but the results I'm getting are either not what I need or are to confusing and beyond me.

Any help would be much appreciated, thanks.

Here is my code so far:

sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           

print ('This is the sentence:', sentence)       

punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]         

for punct in punctuation:                    

    sentence = sentence.replace(punct," %s" % punct)            

print ('This is the sentence with spaces before the punctuations:', sentence)         

words_list = sentence.split()           

print ('A list of the words in the sentence:', words_list)         

dictionary = {}             

word_pos_list = []      

counter = 0                

for word in words_list:                     

    if word not in dictionary:              
        counter += 1                        
        dictionary[word] = counter          

    word_pos_list.append(dictionary[word])      

print ('The positions of the words in the sentence are:', word_pos_list)  

John

3 Answers3

0

While, as mentioned in comments, dictionaries are not sorted datastructures, if you are breaking up a sentence and indexing it into a dictionary and are trying to put it back together, you can try to use an OrderedDict from the collections library to do what you're doing.

That said, this is without any sort of further background or knowledge of how you are splitting your sentence (punctuation etc, I suggest looking into NLTP if you are doing any sort of natural language processing(NLP)).

from collections import OrderedDict
In [182]: def index_sentence(s):
.....:       return {s.split(' ').index(i): i for i in s.split(' ')}
.....:

In [183]: def build_sentence_from_dict(d):
.....:       return ' '.join(OrderedDict(d).values())
.....:

In [184]: s
Out[184]: 'See spot jump over the brown fox.'

In [185]: id = index_sentence(s)

In [186]: id
Out[186]: {0: 'See', 1: 'spot', 2: 'jump', 3: 'over', 4: 'the', 5: 'brown', 6: 'fox.'}

In [187]: build_sentence_from_dict(id)
Out[187]: 'See spot jump over the brown fox.'

In [188]:
Kelvin
  • 1,357
  • 2
  • 11
  • 22
  • These routines have the glitch that if the same word is used twice in a sentence, one instance will be lost. E.g. try "See my dog spot jump over my brown fox." – cdlane Jan 21 '16 at 21:53
  • I'll fix that as soon as I have the opportunity. It's encapsulated in the index_sentence function because how it's broken up isn't the focus of the answer. – Kelvin Jan 21 '16 at 22:57
0

To reconstruct from your list you have to reverse the location mapping:

# reconstruct
reversed_dictionary = {x:y for y, x in dictionary.items()}
print(' '.join(reversed_dictionary[x] for x in word_pos_list))

This can be done more nicely using a defaultdict (dictionary with predifined default value, in your case a list of locations for the word):

#!/usr/bin/env python3.4

from collections import defaultdict

# preprocessing
sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           
punctuation = ['()?:;,.!/"\'']         
for punct in punctuation:                    
    sentence = sentence.replace(punct," %s" % punct)

# using defaultdict this time
word_to_locations = defaultdict(list)
for part in enumerate(sentence.split()):
    word_to_locations[part[1]].append(part[0])

# word -> list of locations
print(word_to_locations)

# location -> word
location_to_word = dict((y, x) for x in word_to_locations for y in word_to_locations[x])
print(location_to_word)

# reconstruct
print(' '.join(location_to_word[x] for x in range(len(location_to_word))))
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
0

It's not the randomness of dictionary keys that's the problem here, it's the failure to record every position at which a word was seen, duplicate or not. The following does that and then unwinds the dictionary to produce the original sentence, sans punctuation:

from collections import defaultdict

sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           

print ('This is the sentence:', sentence)       

punctuation = set('()?:;\\,.!/"\'')  

sentence = ''.join(character for character in sentence if character not in punctuation)

print ('This is the sentence with no punctuation:', sentence)

words = sentence.split()

print('A list of the words in the sentence:', words)         

dictionary = defaultdict(list)            

last_word_position = 0   

for word in words:                     

    last_word_position += 1                        

    dictionary[word].append(last_word_position)         

print('A list of unique words in the sentence and their positions:', dictionary.items())         

# Now the tricky bit to unwind our random dictionary:

sentence = []

for position in range(1, last_word_position + 1):
    sentence.extend([word for word, positions in dictionary.items() if position in positions])

print(*sentence)

The output of the various print() statements:

This is the sentence: This Sentence is a very, very good sentence. Did you like my very good sentence?
This is the sentence with no punctuation: This Sentence is a very very good sentence Did you like my very good sentence
A list of the words in the sentence: ['This', 'Sentence', 'is', 'a', 'very', 'very', 'good', 'sentence', 'Did', 'you', 'like', 'my', 'very', 'good', 'sentence']
A list of unique words in the sentence and their positions: dict_items([('Sentence', [2]), ('is', [3]), ('a', [4]), ('very', [5, 6, 13]), ('This', [1]), ('my', [12]), ('Did', [9]), ('good', [7, 14]), ('you', [10]), ('sentence', [8, 15]), ('like', [11])])
This Sentence is a very very good sentence Did you like my very good sentence
cdlane
  • 40,441
  • 5
  • 32
  • 81