0

EDIT: Adding in

upperline = []
lowerline = []

above the for loop seems to allow the function to be called once as expected, but not more than once. If called a second time the following error will be thrown:

transitenergy = (float(upperline[1]) - float(lowerline[1]))
IndexError: list index out of range

If instead

upperline = [1,2]
lowerline = [4,5]

is added above the for loop, the function returns the expected value the first time, and then -3 every other time.


I am having a problem with a for loop seemingly being unable to retain variables when trying to return these variables, even though I can print the variables. If I define the function as follows, when it is called , the transitenergy will be printed to the console, and then the following error will be thrown:

transitenergy = (float(upperline[1]) - float(lowerline[1]))

UnboundLocalError: local variable 'upperline' referenced before assignment"

 

def crossreference(datafile, lookuppointers):
    pointers = [(int(lookuppointers[0]) - 1), (int(lookuppointers[1]) - 1)]
    lowerpointer = min(pointers)
    upperpointer = max(pointers)
    for i, line in enumerate(datafile):
        if i == lowerpointer:
            lowerline = filter(lambda a: a!= '\t',filterstring(line))
        elif i == upperpointer:
            upperline = filter(lambda a: a!= '\t',filterstring(line))
            break
    transitenergy = (float(upperline[1]) - float(lowerline[1]))]
    print transitenergy
    return transitenergy

I have also tried moving the return statement inside the loop i.e.

...
elif i == upperpointer:
    upperline = filter(lambda a: a!= '\t',filterstring(line))
    transitenergy = (float(upperline[1]) - float(lowerline[1]))
    return transitenergy

or adding the return to a further elif branch i.e.

...
elif i == upperpointer:
    upperline = filter(lambda a: a!= '\t',filterstring(line))
elif i > upperpointer:
    transitenergy = (float(upperline[1]) - float(lowerline[1]))
    return transitenergy

but both of these just return a NoneType when the function is called and throws TypeError: bad operand type for abs(): NoneType when I try to call abs() on it (as expected of a NoneType).

The interesting part here, is if a print statement after defining the local transitenergy variable, in any of the trials I have described, calling the function prints transitenergy without a problem, and then throws the errors.

I should mention that the datafile used in the datafile argument are very large files (on the order of 100+Mb) where each line has the structure:

"           [line number+1]     [float]      ...." 

(there are more numbers after this in the string but they are not relevant to the task)

The lookuppointers argument are lists of the following structure:

[int, int, ...]

The integers are not ordered (hence the min and max) and refer to a [line number +1] of the datafile

The line:

filter(lambda a: a!= '\t',filterstring(line))

Is because I am iterating over a list of many of these files, and although they usually are in the correct format, and sometimes they will have a \t at the beginning.

The filterstring function is defined as:

def filterstring(string):
    return filter(lambda a:a!='',string.split(" "))

to turn the line in the datafile into a list of strings.

The question is how can I return the transitenergy variable as it is printed.

If there is another way that I can perform this type of cross referencing without having the whole datafile in memory then that would work also.

Positive
  • 11
  • 4
  • 2
    `upperline` is only defined in the `elif` branch, its definition is not a given – Moses Koledoye Sep 13 '16 at 12:14
  • Then why can it be printed outside of the branch? – Positive Sep 13 '16 at 12:15
  • And furthermore, to solve the problem then, should both upperline and lowerline be declared as global variables to fix this? Or rather just defined before the for loop with something like upperline = [] and lowerline = []? @MosesKoledoye – Positive Sep 13 '16 at 12:22
  • Most likely the condition `i == upperpointer` never occurs, so the assignment `upperline = ...` never get called. – acw1668 Sep 13 '16 at 13:03
  • Tested this, the condition is met. If it wasn't met; how could the line `print transitenergy` print anything at all? @acw1668 – Positive Sep 13 '16 at 13:07
  • Where are you calling the function? I'm suspicious the `print` works on one call, then a second call is what triggers your error. – ShadowRanger Sep 13 '16 at 13:10
  • Yes, this does infact happen, I'm calling the function within a for loop that iterates over every line of a different file that has `lookuppointers` as each line @ShadowRanger – Positive Sep 13 '16 at 13:13
  • Could you try something then: could you add upperline=[1,2] above the for loop? If it runs this way, then you probably have some line in the file that does not enter the elif branch. – jfish003 Sep 13 '16 at 13:26
  • What I bet is happening is you have some line or lines where the min and the max are equal, that is every number is the same and thus lowerpointer = upperpointer. In this case the elif gets skipped. Try changing the elif to if and it shouldn't get skipped. – jfish003 Sep 13 '16 at 13:36
  • @jfish003 Tried both of the two things you suggested; adding `upperline = [1,2]` above the for loop just moved the problem over to the `lowerline` variable and replacing the `elif` statement with an `if` statement didn't help at all unfortunately. – Positive Sep 13 '16 at 13:47
  • Ok so given that upperline =[1,2] shifted the problem to lowerline it seems to me that you have a line in your file that does not contain anything that you are looking for, so that line contains no value which attains upperpointer or lowerpointer. – jfish003 Sep 13 '16 at 13:56
  • @jfish003 Having gone through both files by hand, I know that this is not the case. I should also note that this function is being called many thousands of times, and the _only_ time that it works as planned is the first time; all other times it returns -3 when `upperline = [1,2]` and `lowerline = [4,5]` is added above the for loop (see my edit at the top of the question) – Positive Sep 13 '16 at 13:58
  • Ok try this then, right after the loop use print i, lowerpointer, upperpointer this way you can see what the value of i is compared to the lowerpoint and upperpointer. I would be willing to bet there is a line where no i is equal to lowerpointer and no i is equal to upper pointer – jfish003 Sep 13 '16 at 14:01
  • @jfish003 Oh this seems to be getting somewhere; the first call of the function gives the printout as `14592 14591 14592`, but the second printout is completely off; `29 14557 14558` does enumerate wrap around? – Positive Sep 13 '16 at 14:05
  • @jfish003, The `enumerate` only ticks up to 29 on the second call, but ticks up the whole way the first time that the it is called. Is there something that I need to refresh? Does `enumerate` save its progress or something? I should also add that calling `print max([i for i,j in enumerate(datafile)])` above the `for` loop gives `14622`, and then the `print i, lowerpointer, upperpointer` after the loop prints out `14622 14591 14592` – Positive Sep 13 '16 at 14:17
  • I am not sure enumerate is not a function I typically use, I am looking into it though – jfish003 Sep 13 '16 at 14:18
  • You do know that an opened file is a generator that does not automatically reset itself? I suspect you keep using the same one over and over again and therefore always starting where you left of before. – swenzel Sep 13 '16 at 14:20
  • @swenzel This could be the solution; that is exactly what I am doing. The files are actually temporary files; so implementing a fix that closes them could be somewhat difficult. But I will try this next. – Positive Sep 13 '16 at 14:22
  • So let me get this straight to be absolutely sure, you want to read specific lines in the file correct? – jfish003 Sep 13 '16 at 14:22
  • Yup this is correct – Positive Sep 13 '16 at 14:24
  • I think that this answer will be useful then: http://stackoverflow.com/questions/24312123/memory-efficent-way-to-iterate-over-part-of-a-large-file – jfish003 Sep 13 '16 at 14:25
  • Would adding datafile.seek(0) to the beginning of the function stop this from happening? @swenzel BINGO; this is our fix – Positive Sep 13 '16 at 14:27
  • Thanks guys, seems to be working as planned now. – Positive Sep 13 '16 at 14:29
  • You're welcome ;). There is plenty of room for optimization though... you could for example sort your pointers beforehand so you only need to go through the file only once. Or, since your lines have line numbers, you could do some binary search in combination with seek. – swenzel Sep 13 '16 at 14:32

1 Answers1

0

The solution lies in in the fact that the datafile was kept open. Adding the line datafile.seek(0) to the function i.e.

def crossreference(datafile, lookuppointers):
    pointers = [(int(lookuppointers[0]) - 1), (int(lookuppointers[1]) - 1)]
    lowerpointer = min(pointers)
    upperpointer = max(pointers)
    datafile.seek(0)
    for i, line in enumerate(datafile):
        if i == lowerpointer:
            lowerline = filter(lambda a: a!= '\t',filterstring(line))
        elif i == upperpointer:
            upperline = filter(lambda a: a!= '\t',filterstring(line))
            transitenergy = (float(upperline[1]) - float(lowerline[1]))
            return transitenergy

Caused the file to be read from the beginning each time the function was called, as opposed to what was happening before where the file was being read from the last place it was read from.

Positive
  • 11
  • 4