1
r = ","
x = ""
output = list()
import string

def find_word(filepath,keyword):
    doc = open(filepath, 'r')

    for line in doc:
        #Remove all the unneccessary characters
        line = line.replace("'", r)
        line = line.replace('`', r)
        line = line.replace('[', r)
        line = line.replace(']', r)
        line = line.replace('{', r)
        line = line.replace('}', r)
        line = line.replace('(', r)
        line = line.replace(')', r)
        line = line.replace(':', r)
        line = line.replace('.', r)
        line = line.replace('!', r)
        line = line.replace('?', r)
        line = line.replace('"', r)
        line = line.replace(';', r)
        line = line.replace(' ', r)
        line = line.replace(',,', r)
        line = line.replace(',,,', r)
        line = line.replace(',,,,', r)
        line = line.replace(',,,,,', r)
        line = line.replace(',,,,,,', r)
        line = line.replace(',,,,,,,', r)
        line = line.replace('#', r)
        line = line.replace('*', r)
        line = line.replace('**', r)
        line = line.replace('***', r)

        #Make the line lowercase
        line = line.lower()

        #Split the line after every r (comma) and name the result "word"
        words = line.split(r)

        #if the keyword (also in lowercase form) appears in the before created words list
        #then append the list output by the whole line in which the keyword appears

        if keyword.lower() in words:
            output.append(line)

    return output

print find_word("pg844.txt","and")

The goal of this piece of code is to search through a text file for a certain keyword, say "and", then put the whole line in which the keyword is found into a list of type (int,string). The int should be the line number and the string the above mentioned rest whole line.

I'm still working on the line numbering - so no question concerning that yet. But the problem is: The output is empty. Even if I append a random string instead of the line, I don't get any results.

If I use

if keyword.lower() in words:
        print line

I get all the desired lines, in which the keyword occurs. But I just can't get it into the output list.

The text file I'm trying to search through: http://www.gutenberg.org/cache/epub/844/pg844.txt

neacal
  • 47
  • 7
  • how are you calling the function? – Anand S Kumar Oct 20 '15 at 18:01
  • Sorry, I missed that last piece of code. I edited the original post. – neacal Oct 20 '15 at 18:03
  • where are you checking output after calling the function? – Anand S Kumar Oct 20 '15 at 18:03
  • can you include a sample of `pg844.txt` – Cody Bouche Oct 20 '15 at 18:03
  • The code you posted works fine for me using the following input. I'm voting to close since I can't reproduce the problem. https://www.gutenberg.org/cache/epub/844/pg844.txt – DaoWen Oct 20 '15 at 18:06
  • 2
    *search through a text file for a certain keyword* - That's a whole lot of code to do `for line_num, line in enumerate(open('filename')): if keyword.lower in line: output.append((line_num, line))` – TessellatingHeckler Oct 20 '15 at 18:06
  • 1
    @TessellatingHeckler: `if keyword.lower() in line.lower()` – Steven Rumbalski Oct 20 '15 at 18:08
  • I second @TessellatingHeckler comment: perhaps what you actually want/need is a better and simpler algorithm. This one is definitely weird considering your quite simple need. – heltonbiker Oct 20 '15 at 18:09
  • 1
    @TessellatingHeckler: Also, `'and' in 'band,hand'` gives different result than `'and' in 'band,hand'.split(',')`. The split allows the match to be on whole words only. – Steven Rumbalski Oct 20 '15 at 18:12
  • @StevenRumbalski Exactly. I just want the "whole" words, and not word parts. And the whole replace thing is to avoid any punctuation and so on. (isalnum does not work for me here). – neacal Oct 20 '15 at 18:13

3 Answers3

2

Please use Regex. See some documentation for Regex in Python. Replacing every character/character set is confusing. The use of lists and .append() looks correct, but perhaps look into debugging your line variable within the for-loop, printing it occasionally to insure its value is what you want it to be.

An answer by pyInProgress makes a good point about global variables, though without testing it, I'm not convinced it's required if the output return variable is used instead of the global output variable. See this StackOverflow post if you need more information about global variables.

Community
  • 1
  • 1
Kody
  • 905
  • 9
  • 19
1

Loop through string.punctuation to remove everything before iterating through the lines

import string, re

r = ','

def find_word(filepath, keyword):

    output = []
    with open(filepath, 'rb') as f:
        data = f.read()
        for x in list(string.punctuation):
            if x != r:
                data = data.replace(x, '')
        data = re.sub(r',{2,}', r, data, re.M).splitlines()

    for i, line in enumerate(data):
        if keyword.lower() in line.lower().split(r):
            output.append((i, line))
    return output

print find_word('pg844.txt', 'and')
Cody Bouche
  • 945
  • 5
  • 10
0

Since output = list() is at the top-level of your code and isn't inside a function, it is considered a global variable. To edit a global variable within a function, you must use the global keyword first.

Example:

gVar = 10

def editVar():
    global gVar
    gVar += 5

So to edit the variable output within your function find_word() you must type global output before assigning it values.

It should look like this:

r = ","
x = ""
output = list()
import string

def find_word(filepath,keyword):
    doc = open(filepath, 'r')

    for line in doc:
        #Remove all the unneccessary characters
        line = line.replace("'", r)
        line = line.replace('`', r)
        line = line.replace('[', r)
        line = line.replace(']', r)
        line = line.replace('{', r)
        line = line.replace('}', r)
        line = line.replace('(', r)
        line = line.replace(')', r)
        line = line.replace(':', r)
        line = line.replace('.', r)
        line = line.replace('!', r)
        line = line.replace('?', r)
        line = line.replace('"', r)
        line = line.replace(';', r)
        line = line.replace(' ', r)
        line = line.replace(',,', r)
        line = line.replace(',,,', r)
        line = line.replace(',,,,', r)
        line = line.replace(',,,,,', r)
        line = line.replace(',,,,,,', r)
        line = line.replace(',,,,,,,', r)
        line = line.replace('#', r)
        line = line.replace('*', r)
        line = line.replace('**', r)
        line = line.replace('***', r)

        #Make the line lowercase
        line = line.lower()

        #Split the line after every r (comma) and name the result "word"
        words = line.split(r)

        #if the keyword (also in lowercase form) appears in the before created words list
        #then append the list output by the whole line in which the keyword appears

        global output
        if keyword.lower() in words:
            output.append(line)

    return output

In the future, try to stay away from global variables unless you absolutely need them. They can get messy!

  • Incorrect. You can call a mutating method on a global variable without using the `global`. You only need `global` when you want to *assign* to a global, otherwise assignment creates a local variable with the same name. – Steven Rumbalski Oct 20 '15 at 18:28
  • Interesting point. I can't explain why this solution worked for neacal then. Any ideas? – pyInProgress Oct 20 '15 at 18:38
  • Depends on his use of the method. He returns the list and has a global defined. If he adds `global` within the method, then he could see those changes on the global object. However, if he is using the returned value, he wouldn't need to add `global`. – Kody Oct 20 '15 at 20:56