0

I feel disgraced by asking about the solution of my "homework" here. But I have already spent 4 hours on it, cannot continue like this.

Assignment: count occurrences of a specific string inside a Lorem Ipsum text (already given); a helper function tokenize that splits a given text and returns a list of tokens has been provided.

def tokenize(text):
    return text.split()

for token in tokenize(text):
    print(token)

Task: Write a function search_text() which takes two parameters in this order: filename and query.

The function should return the number of occurrences of query inside the file filename.

query = 'ipsum'
search_text('lorem-ipsum.txt', query) # returns 24

My code:

def tokenize(text):
    return text.split()

def search_text(filename, query):
    with open("lorem-ipsum.txt", "r") as filename:
      wordlist = filename.split()
      count = 0
   for query in wordlist:
      count = count + 1
   return count

query = "lorem"
search_text('lorem-ipsum.txt', query)

It doesn't work and looks a little bit mess. To be honst, I don't unterstand how the function tokenize() works here.

Could someone give me a hint?

Ralf
  • 16,086
  • 4
  • 44
  • 68
Suwan Wang
  • 83
  • 1
  • 8
  • The function takes an input string `text`, runs `text.split()`, and returns the resulting list (it is a list of words in that string, in this example). In your code it does nothing, because you do not ever call `tokenize()`. – CJR Oct 15 '18 at 19:36
  • Maybe take a step back from the code for a second. How would you go about solving this by hand, if you were given a string of characters and told to find a specific string? – AnilRedshift Oct 15 '18 at 19:36
  • Hey @CJ59 Thank you very much for the explanation. I also tried: "for query in tokenize(filename)". Still wrong. – Suwan Wang Oct 15 '18 at 19:49

1 Answers1

0

You actually have to call the function tokenize() if you want to use it; your code does not.

This version could work:

def tokenize(text):
    return text.split()

def search_text(filename, query):
    word_list = []
    with open(filename, 'r') as f:
        for line in f:
            line = line.strip()
            if len(line) > 0:
                # add tokens to the list, only if line is not empty
                wordlist.extend(tokenize(line))

    count = 0
    for word in word_list:
        if word == query:
            count += 1

    return count

query = "lorem"
search_text('lorem-ipsum.txt', query)

You could also use other counting methods, like this question shows. Here is a solution using the .count() method of sequences:

return word_list.count(query)
Ralf
  • 16,086
  • 4
  • 44
  • 68