Python 3 compressing text by stripping punctuation from input string and counting position of words in string

Question

I am creating a program that takes an input string from a user, then strips the punctuation from the string using the ord function. It should then calculate the position of each word (starting from 1) and disregard any repeated words. The compressed sentence, should be written to a text file as should the positions.

The problem with my code is that the input string is split into individual letters and the position count, counts individual letters. I am sure there is a simple fix but 84 versions later i have run out of ideas.

import string

sentence=input("Please enter a sentence: ")
sentence=sentence.upper()
sentencelist = open("sentence_List.txt","w")
sentencelist.write(str(sentence))
sentencelist.close()


words=list(str.split(sentence))
wordlist=len(words)
position=[]
text=()
uniquewords=[]
texts=""
nsentence=(sentence)

for c in list(sentence):
        if not ord(c.lower()) in range(97,122):
                nsentence=nsentence.replace(c, "")#Ascii a-z
print(nsentence)


nsentencelist=len(nsentence)
print(nsentencelist)
nsentencelist2 = open("nsentence_List.txt","w")
nsentencelist2.write(str(nsentence))
nsentencelist2.close()

if it is the words, why you replace the no-alphabet to ""(an empty string)? — EvanL00, Mar 06 '17 at 12:58

score 0 · Answer 1 · edited May 23 '17 at 11:46

0

The problem is you replace the punctuation with ""(Empty string), so when you try to split the sentence "we are good. OK" to words, you actually split the "wearegoodOK". Try to replace the punctuation with a whitespace " ".

Or you could use regex to split the words, as suggested in Strip Punctuation From String in Python

edited May 23 '17 at 11:46

Community

1
1

answered Mar 06 '17 at 13:06

EvanL00

370
3
13

score 0 · Answer 2 · answered Mar 06 '17 at 13:45

Here's a function that returns a sentence with punctuation and capitalization stripped away and a sorted dictionary of word:index_of_first_occurrence pairs. You can output this data to a file, which I haven't done here because I do not know your specific output requirements:

import re
from collections import OrderedDict

def compress(sentence):

    # regular expression looks for punctuation
    PUNCTUATION_REGEX = re.compile(str(r'[^a-zA-Z\s]'))

    # use an OrderedDict to keep items sorted
    words = OrderedDict()
    # look for punctuation and replace it with an empty string. also sets case to lower.
    sentence = re.sub(PUNCTUATION_REGEX, '', sentence).lower()
    # loop through words in the sentence
    for idx, word in enumerate(sentence.split()):
        # check that we haven't encountered this word before
        if not words.get(word):
            # add new word to dict, with index as value (not 0-indexed)
            words[word] = idx + 1
    return sentence, words

Python 3 compressing text by stripping punctuation from input string and counting position of words in string

2 Answers2