0

I have a paragraph such as this:

Shank spare ribs ball tip, frankfurter alcatra rump pancetta picanha beef ribs biltong pig venison chicken ham hock. T-bone beef ribs chicken pork jerky tenderloin andouille turducken kevin short ribs. Drumstick bacon short loin, chicken turducken leberkas chuck swine pork belly doner biltong ham hock. Swine hamburger tenderloin meatloaf prosciutto pancetta meatball tongue drumstick ham hock. Meatball bresaola landjaeger doner brisket pork belly pancetta spare ribs corned beef tenderloin bacon fatback pork loin boudin rump kevin andouille beef ham capicola biltong. Pork chop corned beef swine turkey, prosciutto biltong kielbasa short loin ground round cupim shoulder.

And I need to write a code that will find out how many times a certain word is said. I can't figure out how to delete the commas and periods that are next to those words! Here is what I have so far:

for line in input_file:
    lines = line.split(' ')
    lines = line.replace('\n', '')
    new_List.append(lines)
enigma
  • 3,476
  • 2
  • 17
  • 30
cheese
  • 39
  • 1
  • 4
  • Possible duplicate of [Check if string ends with one of the strings from a list](http://stackoverflow.com/questions/18351951/check-if-string-ends-with-one-of-the-strings-from-a-list) – MattDMo Nov 12 '15 at 21:00
  • 3
    or even better, [use Counter](https://docs.python.org/2/library/collections.html#collections.Counter) – R Nar Nov 12 '15 at 21:00

4 Answers4

1

There are several things that may help. First, Counter:

s = '''Shank spare ribs ball tip, frankfurter alcatra rump ...'''
c = Counter(s.split())

Note not passing a parameter to str.split:

If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed).

Next, instead of Counter, you could use str.count to count each word's occurrences:

for word in s.split():
    print word, s.count(word)

Another option over str.split is a simple regex:

for word in re.findall(r'[a-zA-Z-_]+', s):
    print word, s.count(word)

Finally, as part of your question exactly, you can use str.rstrip to strip characters from the end of strings:

s = 'foobar.,'
t = s.rstrip('.,')
Celeo
  • 5,583
  • 8
  • 39
  • 41
1

You can just split your string by the occurrences of non word character except those between expression like - in "T-bone" using re.split

import re
from collections import Counter

s = '''Shank spare ribs ball tip, frankfurter alcatra rump pancetta 
       picanha beef ribs biltong pig venison chicken ham hock. 
       T-bone beef ribs chicken pork jerky...
    '''

Counter(re.split(r'(?!<\w)\W(?!\w)|\s+', s))

DEMO

But if what you want is remove only commas and period the following will work fine:

Counter(re.split(r'[,\s.]', s))
styvane
  • 59,869
  • 19
  • 150
  • 156
0
word_dict = {}
for line in input_file:
    line = line.replace('\n', '')
    line = line.replace(',', '')
    line = line.replace('.', '')
    # now line just has words and spaces
    words = line.split(' ')
    for word in words:
        if word in word_dict:
            word_dict[word] += 1
        else:
            word_dict[word] = 0
Einstein
  • 431
  • 1
  • 5
  • 12
0

you can do this with Counter pretty easily:

import string
from collections import Counter

# s = s.lower() # if you don't care about case
chars = set(string.ascii_letters + string.whitespace)
c = Counter(''.join(c for c in s if c in chars).split())
acushner
  • 9,595
  • 1
  • 34
  • 34