0

I am new to Python; little experience in programming C++. I saw this question but it doesn't address my problem.

Python 2.7.9, 64-bit AMD, Windows 7 Ultimate, NTFS, administrator privileges & no "read only" attribute on file to be read.

I want to create a list of strings which fulfill a certain criteria, the strings are lines of the file(see notepad.cc/diniko93).So I wrote the following function-

def makeLineList( filePtr, ptr ):
    lines = []
    while True:
        s = filePtr.readline()
        if not s=="":
            s = s[3:]
            s = s.split()
            if s[0].isdigit():
                print("O")
                lines.append(s)
            elif s[0] in {"+", "-"}:
                print("U")
                lines.append(s)
        else:
            print("none")
            break
    filePtr.seek(ptr, 0);    #I did this to restore file pointer, so other functions accessing this file later don't misbehave
    return lines

and the 2 possible main()-like (pardon my ignorance of python) bodies that I am using are-

with open("./testStage1.txt", 'r') as osrc:
    osrc.seek(291, 0)
    L = makeLineList( osrc, osrc.tell())
    print "".join(L)

and the other one-

osrc = open("./testStage1.txt", 'r')
osrc.seek(291, 0)
L = makeLineList( osrc, osrc.tell())
print "".join(L)
osrc.close()

both the times the output on terminal is a disappointing none

Please Note that the code above is minimum required to reproduce the problem and not the entire code.

EDIT: Based on @avenet's suggestion, I googled & tried to use iter (__next__ obj.next() in python 3.3+ or next(obj) in 2.7) in my code but the problem persists, I am unable to read next line even if I call next(osrc) from inside the function check out these 2 snippets

  • version2 next used only in main()-ish part transform_line function is not called. Calling next() 3 times produces desirable/expected output but in
  • version3 I get a list index out of range error, even for lists[0] which definately has a digit

EDIT 2: I tried scope check inside my functions as if not osrc in locals(): and in next line with proper indent print("osrc not reachable"). And the output is osrc not reachable. I also tried using from tLib import transform_line from a temporary tLib.py but with identical results. Why is osrc not available in either case?

EDIT 3: Since the problem appears to be of scope. So to avoid passing of file variable- make a function whose sole purpose is to read a line. The decision to get next line or not depends upon returned value of a function like isLineUseful()

def isLineUseful( text, lookFor ):
    if text.find(lookFor)!=-1:
        return 1
    else:
        return 0
def makeList( pos, lookFor ):
    lines = []
    with open("./testStage1.txt", 'r') as src:
        src.seek(pos)
        print(src.read(1))
        while True:
            line = next(src)
            again = isLineUseful(line, lookFor)
            if again==0:
                src.seek(pos)
                break
            else:
                lines.append(line)
    return lines

t = makeList(84, "+")
print "\n".join(t)

Tried it, it works perfectly on this(notepad.cc/diniko93) sample testStage1.txt.

So my programming issue is solved (thanks to responders :D) & I am marking this as answered but posting a new question about the anomalous/ behavior of readline() & __next__.

P.S. I am still learning the ways of python so I would be very happy if you could suggest a more pythonic & idomatic version of my code above.

Community
  • 1
  • 1
newPython
  • 21
  • 1
  • 8
  • One thing where Python is easy to work is with files... – nbro Dec 30 '14 at 13:44
  • This doesn't directly address your problem, but you should probably replace your `i+=1; lines[i] = s` lines with `lines.append(s)`. – Kevin Dec 30 '14 at 13:44
  • 7
    Also, you don't need semicolons to end a statement in Python, unless you want to put two statements on a single line, and you almost never want to do that. – Tim Pietzcker Dec 30 '14 at 13:46
  • If you want that every line of your file is an element of your list, then you are making your like harder than it should be... – nbro Dec 30 '14 at 13:48
  • Thanks @Kevin I was treating list like strings. – newPython Dec 30 '14 at 13:50
  • `lines[i] = s` doesn't work if `lines` is a string, either ;-) – Kevin Dec 30 '14 at 13:52
  • @BhargavRao I am pretty sure I haven't reached the end of file. There are at least 50 lines after that & my loop doesn't even run once- it just prints `none` on terminal – newPython Dec 30 '14 at 13:53
  • 3
    How about the obvious scenario: No line in the file starts with a digit or `+` or `-`, and the `none` is caused by `readline()` returning the empty string on EOF? – Aran-Fey Dec 30 '14 at 13:54
  • 1
    Is the `else` part mismatched? – Bhargav Rao Dec 30 '14 at 13:55
  • It would be very helpful to see a sample input file that reproduces your problem. I tried running your code on a file filled with lines of digits, and I got a whole lot of output as expected. – Kevin Dec 30 '14 at 13:57
  • @TimPietzcker I'll keep that in mind. Anyway I did that to save reduce no. of lines in question. – newPython Dec 30 '14 at 13:59
  • Are you using the [Markdown package](https://pythonhosted.org/Markdown/) to read in that `testStage1.md`? Can you confirm that it is getting read in to begin with? – WAF Dec 30 '14 at 14:02
  • 1
    That notepad.cc page appears to be blank. – Kevin Dec 30 '14 at 14:05
  • 1
    All of your lines start with "

    ". "<" is not a digit, and it's not "-" or "+", so it makes sense that none of those conditions in your code would succeed. You're using `strip`, right? Are you sure you're using it right? Remember, just calling strip does nothing if you don't assign the result. `s.strip("

    ")` has no effect, you have to do `s = s.strip("

    ")`.

    – Kevin Dec 30 '14 at 14:30
  • @TimPietzcker please take a look at my edits. And +1 if you think I show research effort & clarity, this question needs to be answered. – newPython Jan 01 '15 at 00:57

3 Answers3

2

First of all, you are not using Python as it should be used. The purpose of using a language like Python is to write just fewer lines of code to achieve the same result of other snippets of code in other programming languages, such as C++ or Java.

It's not necessary to pass a file pointer as a function parameter to read the file, you can open directly the file within the function to which you pass the filename.

Then you can call this function with the file name and store the list in a variable that you will eventually manipulate. If you are not familiar with exceptions handling, you could for example use a function from the module os to check if the file already exists: os.path.exists(filename).

If you want to search for a pattern in the line you are currently using, you can simply use an if statement (there are a lot of ways of doing that, this is just an example):

if line not in list_of_strings_you_want_not_to_include: 
    lines.append(line)

If you to check if the pattern is at the beginning, you can use the startswith string function on the line:

if not str(line).startswith("+"):
    lines.append(line)     

If you want to skip a certain amount of characters, you can use the seek function (as you are effectively using). This is just a way that uses more lines of code, but it's still very simple:

def read_file(filename, _from):
    lines = []
    try:
        with open(filename) as file:
            file.seek(_from)
            for line in file:
                lines.append(line)     
    except FileNotFoundError:
        print('file not found')
    return lines

filename = "file.txt"
lines = read_file(filename, 10)

Much easier, you can also do this, instead of iterating explicitly through all lines:

with open(filename) as file:
    file.seek(_from)
    return list(file)

Or using your favourite function readlines:

with open(filename) as file:
    file.seek(_from)
    return file.readlines()

The purpose and the advantage of iterating explicitly through all lines is that you can do a lot of checking and whatever you want with the lines or characters in the right moment you are reading, so I would adopt certainly the first option I suggested above.

nbro
  • 15,395
  • 32
  • 113
  • 196
  • Ok, but from the OP's `osrc.seek(291, 0);` line, it seems like he wants to skip the first 291 characters. How should he do that if he iterates over the lines of the file this way? – Kevin Dec 30 '14 at 14:07
  • I passed the pointer because I am using a different functions.py file in my program. I `import pyFunctions` – newPython Dec 30 '14 at 14:10
  • 1
    IMHO `readlines()` would be much better that reading all lines in a list as your 1st code snippet suggests. – newPython Dec 30 '14 at 14:12
  • @newPython It's not necessary, you just have to pass the filename – nbro Dec 30 '14 at 14:12
  • @newPython Why is it much better? – nbro Dec 30 '14 at 14:13
  • @nbro because someone told me 1. inbuilt functions are always better. 2. readlines() manages memory & buffer stuff better & 3. I am simply scared of using n>1 lines of code if a single statement could do it. – newPython Dec 30 '14 at 14:25
  • @newPython So use your `readlines` and don't ask questions again, right? What's your problem then? I simply cannot understand why you are asking help if you cannot even try to understand new things... – nbro Dec 30 '14 at 14:27
  • whoa nbro, calm down. – wwii Dec 30 '14 at 14:31
  • ```if line not in ("+", "-", "Barack Obama")``` will only work if ```line``` is exactly one of those things. OP wants to match on the first item of a sequence - ```s[0]```. – wwii Dec 30 '14 at 14:34
  • @wwii It's just an example. – nbro Dec 30 '14 at 14:34
  • @wwii My question does not aim to be a tutorial. I have already been very patient with the OP, whose question is very unclear. If it was my question, it would have already been downvoted of -20 – nbro Dec 30 '14 at 14:42
  • @nbro thanks for the alternatives but could u explain just why `readline()` isn't working as expected?? Considering that everything in above codeblocks is all the code I have in 1 .py file. – newPython Dec 30 '14 at 14:51
  • @newPython Give me some moment and I will try to check your problem ;) – nbro Dec 30 '14 at 14:53
  • @newPython I will try to analyse your code line by line. `if not s == "":`, you probably want to say, if `s` is different from `""`, which you can achieve easily (like in C++) with `if s != "":` – nbro Dec 30 '14 at 15:03
  • @newPython The problem of your code is certainly this: you're reading from the `291` character forward, but at that point there's no character in the file, and `readline` returns always `""`, an empty string (of course), that's why it jumps directly to the `else` statement and prints `none`. I don't know how your file looks like (how many lines...), but try using a smaller number, like 5 or 10 and see if something changes... – nbro Dec 30 '14 at 15:06
1

If you want to modify the lines your way:

def transform_line(line):
    if line != "":
        if line[0].isdigit():
            print("O")
        elif line[0] in {"+", "-"}:
            print("U")
    else:
        print("None")
    return line

with open("./testStage1.txt", 'r') as osrc:
    osrc.seek(291)
    lines = [transform_line(line) for line in osrc]
    #Do whatever you need with your line list

If you don't want to transform lines just do this:

with open("./testStage1.txt", 'r') as osrc:
    osrc.seek(291)
    lines = list(osrc)
    #Do whatever you need with your line list

Or just implement a line iterator if you need to stop on a certain condition:

def line_iterator(file):
    for line in file:
        if not line[0].isdigit() and not line in ["+", "-"]:
            yield line
        else:
            break

with open("./testStage1.txt", 'r') as osrc:
    osrc.seek(291)
    lines = list(line_iterator(osrc))
    #To skip lines from the list containing 'blah'
    lines = [x for x in lines if 'blah' not in line]
    #Do whatever you need with your line list
avenet
  • 2,894
  • 1
  • 19
  • 26
  • 1
    Ok, but from the OP's `osrc.seek(291, 0);` line, it seems like he wants to skip the first 291 characters. How should he do that if he iterates over the lines of the file this way? – Kevin Dec 30 '14 at 14:08
  • Let's implement an iterator, my friend!! – avenet Dec 30 '14 at 14:43
  • @avenet I followed your advice- please check out the **Edit** in question & let me know whats wrong . . . – newPython Dec 30 '14 at 15:57
  • @newPython Do my solution gives you a list? As I see on your Edit, you are trying to use next, next will only work with an iterable element, on a list it won't work. – avenet Dec 30 '14 at 16:03
  • @newPython The list created on the lines variable already gives you the filtered data. What else do you need? – avenet Dec 30 '14 at 16:04
  • @avenet Yes your code gives me list of all lines beyond 291. Google says both list & files are iterable – newPython Dec 30 '14 at 16:10
  • I don't see how the list is filtered. I still have to process each line. Here is what I am trying to get- add only those lines to list which contain a special starting substring, thereby minimizing memory usage & keeping list short. – newPython Dec 30 '14 at 16:13
  • Ohh, OK, if you want to filter the strings that contain a special substring I will add it on the answer – avenet Dec 30 '14 at 16:16
  • So if the line contains a special character, we will ignore it and go to the next one, is that what you want? – avenet Dec 30 '14 at 16:19
  • @avenet thanks a lot man :) your ideas helped me solve my programming issue & shed some light on iters. If I don't get any answers about behavior of readline() soon I'll mark your answer as "accept" – newPython Dec 31 '14 at 05:13
0

You try to process this input:

<P> unnecessart line </P>
<P> Following is an example of list </P>
<P> 1. abc </P>
<P>     + cba </P>
<P>     + cba </P>
<P>             + xyz </P>

Now in your brain, you just see the important bits but Python sees everything. For Python (and any other programming language), each line starts with <. That's why the if's never match.

If you stripped the <P>, be sure to strip the spaces as well because

1. abc
    + cba

the second line starts with a space, so s[0] isn't +. To strip spaces, use s.trim().

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820