-1

I have to create a program that reads in lines of code until a single "." is entered, I have to remove punctuation, change all to lower case, remove stopwords and suffixes. I've manged all this except being able to remove suffixes, I've tried .strip as you can see but it will only accept one argument and doesnt actually removed suffixes from the list elements. Any advice/pointers/help? Thanks

stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \
          "of", "from", "here", "even", "the", "but", "and", "is", "my", \
          "them", "then", "this", "that", "than", "though", "so", "are" ]

noStemWords = [ "feed", "sages", "yearling", "mass", "make", "sly", "ring" ]


# -------- Replace with your code - e.g. delete line, add your code here ------------

Text = raw_input("Indexer: Type in lines, that finish with a . at start of line only: ").lower()
while Text != ".":
    LineNo = 0 
    x=0
    y=0
    i= 0

#creates new string, cycles through strint Text and removes puctutaiton 
    PuncRemover = ""
    for c in Text:
        if c in ".,:;!?&'":
            c=""
        PuncRemover += c

    SplitWords = PuncRemover.split()

#loops through SplitWords list, removes value at x if found in StopWords list
    while x < len(SplitWords)-1:
        if SplitWords[x] in stopWords:
            del SplitWords[x]
        else:
            x=x+1

    while y < len(SplitWords)-1:
        if SplitWords[y] in noStemWords:
            y=y+1
        else:
            SplitWords[y].strip("ed")
            y=y+1

    Text = raw_input().lower()

print "lines with stopwords removed:" + str(SplitWords)
print Text
print LineNo
print x
print y
print PuncRemover
Karan Nagpal
  • 341
  • 2
  • 10
Rydooo
  • 3
  • 1
  • 3
  • You are reading just once here, look at `raw_input` – martianwars Dec 01 '16 at 18:49
  • 2
    A couple things about code style first. You should take a look at [Python naming conventions](https://www.python.org/dev/peps/pep-0008/#naming-conventions). Capitalized words are generally reserved for classes or type variables. Also, your `while` loops should really be `for` loops since you know how many iterations you are going to perform. As far as your problem, you need to actually assign the list elements that are being changed. For stripping a sequence of characters, see [this question](http://stackoverflow.com/questions/3900054/python-strip-multiple-characters) – Daniel Underwood Dec 01 '16 at 18:53
  • The read in lines are meant to get added to a dictionary which is why for now it only reads once. – Rydooo Dec 01 '16 at 19:12

1 Answers1

1

The following function should remove suffixes from any given string.

from itertools import groupby


def removeSuffixs(sentence):

    suffixList = ["ing", "ation"] #add more as nessecary

    for item in suffixList:
        if item in sentence:

            sentence = sentence.replace(item, "")
            repeatLetters = next((True for char, group in groupby(sentence)
                                  if sum(1 for _ in group) >= 2), False)

            if repeatLetters:

                sentence = sentence[:-1]

    return sentence

Examples:

print(removeSuffixs("climbing running")) # 'climb run'
print(removeSuffixs("summation")) # 'sum'

In your code, replace SplitWords[y].strip("ed") with,

SplitWords[y] = removeSuffixs(SplitWords[y])

J Darbyshire
  • 382
  • 1
  • 7