0

I am trying to get this python code to get rid of punctuation marks associated with words and count the unique words. For some reason it's still counting both "hello." and "hello". Any help would be most appreciated.

def word_distribution(words):
            word_dict = {}
            words = words.lower()
            words = words.split()
            for word in words:
                if ord('a') <= ord(word[-1]) <= ord('z'):
                    pass
                elif ord('A') <= ord(word[-1]) <= ord('Z'):
                    pass
                else: 
                    word[:-1]
            word_dict = {word:words.count(word)+1 for word in set(words)}
            return(word_dict)
BRose
  • 3
  • 4
  • The problem is deeper than that, but first of all, `word[:-1]` alone does not do anything, since it is not assigned to any variable. – Efferalgan Oct 11 '16 at 17:23
  • I am using x='Hello My Name Is hello.' I am looking to get this from the function-> {'hello': 2, 'is': 1, 'my': 1, 'name': 1} – BRose Oct 11 '16 at 17:24
  • Honestly, I'd delete everything you've written for dropping punctuation and use the solution from this post: http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python For counting the words, use the Counter class: https://docs.python.org/2/library/collections.html#collections.Counter – Sohier Dane Oct 11 '16 at 17:24

3 Answers3

1

I don't know why you're adding 1 to count.

def word_distribution(words):
        word_dict = {}
        words = words.lower().split()
        for word in words:
            if ord('a') <= ord(word[-1]) <= ord('z'):
                pass
            elif ord('A') <= ord(word[-1]) <= ord('Z'):
                pass
        word_dict = {word:words.count(word) for word in set(words)}
        return(word_dict)

{'hello': 2, 'my': 1, 'name': 1, 'is': 1}

Edit:

as brianpck, points out:

def word_distribution(words):
        word_dict = {}
        words = words.lower().split()
        word_dict = {word:words.count(word) for word in set(words)}
        return(word_dict)

also will give the same result.

tgikal
  • 1,667
  • 1
  • 13
  • 27
  • Why are you doing a `for` loop and passing for each branch? Also, this will not deal with punctuation correctly `this != this.` – brianpck Oct 11 '16 at 17:40
  • I was just posting their code modified for the result they were looking for. – tgikal Oct 11 '16 at 17:43
1

You are making it too complicated, as Sohier Dane mentioned in the comments you can make use of the other post to remove punctuation and simplify the script to:

import string
def word_distribution(words):
    words = words.translate(None, string.punctuation).lower()
    d = {}
    for w in words.split():
        if w not in d.keys():
            d[w] = 1
        else:
            d[w] += 1   
    return d

Results:

>>> x='Hello My Name Is hello.'
>>> print word_distribution(x)  
>>> {'is': 1, 'my': 1, 'hello': 2, 'name': 1}
coder
  • 12,832
  • 5
  • 39
  • 53
1

There are certainly better way of achieving what you are trying to do but this answer fixes your code.

Strings are immutable and lists are mutable. Nowhere in your code you were modifying the list. and words[-1] wont have any impact because you were not re assigning it and string are immutable

def word_distribution(words):
        word_dict = {}
        words = words.lower()
        words = words.split()
        for word in words:
            index = words.index(word)
            if ord('a') <= ord(word[-1]) <= ord('z'):
                pass
            elif ord('A') <= ord(word[-1]) <= ord('Z'):
                pass
            else: 
                word = word[:-1]
                words[index] = word 

        word_dict = {word:words.count(word) for word in set(words)}
        return(word_dict)
saurabh baid
  • 1,819
  • 1
  • 14
  • 26