7

I've checked this, this and this.

The 3rd link seemed to have the answer yet it didn't do the job.

I can't have a solution where the whole file is brought to main memory, as the files I'll be working with will be very large. So I decided to use islice as shown in the 3rd link. First 2 links were irrelevant as they used it for only 2 lines or read 1000 characters. Whereas I need 1000 lines. for now N is 1000

My file contains 1 million lines:

Sample:

1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
1 6 1
1 7 1
1 8 1
1 9 1
1 10 1

So if I read 1000 lines at a time, I should go through the while 1000 times, yet when I print p to check how many times I've been in through, it doesn't stop at a 1000. It reached 19038838 after running my program for 1400 seconds!!

CODE:

def _parse(pathToFile, N, alg):
    p = 1
    with open(pathToFile) as f:
        while True:
            myList = []
            next_N_lines = islice(f, N)
            if not next_N_lines:
                break
            for line in next_N_lines:
                s = line.split()
                x, y, w = [int(v) for v in s]
                obj = CoresetPoint(x, y)
                Wobj = CoresetWeightedPoint(obj, w)
                myList.append(Wobj)
            a = CoresetPoints(myList)
            client.compressPoints(a) // This line is not the problem
            print(p)
            p = p+1
    c = client.getTotalCoreset()
    return c

What am I doing wrong ?

Community
  • 1
  • 1
Tony Tannous
  • 14,154
  • 10
  • 50
  • 86
  • 1
    the `f` is probably not consumed so you end up reading the same and same 1000 lines every time. This will never terminate. You have to use the alternative formulation for `islice` (`itertools.islice(iterable, start, stop[, step])` this one and not `itertools.islice(iterable, stop)` this) – Ma0 Jan 30 '17 at 13:37

1 Answers1

5

As @Ev.kounis said your while loop doesn't seem to work properly.

I would recommend to go for the yield function for chunk of data at a time like this:

def get_line():
    with open('your file') as file:
        for i in file:
            yield i

lines_required = 1000
gen = get_line()
chunk = [next(gen) for i in range(lines_required)]
Shivkumar kondi
  • 6,458
  • 9
  • 31
  • 58
  • But won't it try to open the same file `1M` times for each line ? it will slow down the program, is it not ? – Tony Tannous Jan 30 '17 at 13:48
  • 2
    No, it will only repeat the steps in the for loop. Yield can be interpreted as "return this input and come back exactly here when asked to". Have a look at doc for Generators: https://docs.python.org/3/howto/functional.html#generators – MKesper Jan 30 '17 at 13:56
  • @MKesper And how do if the file is over so I can stop iterating and reading ? `if not chunk: break` didn't work. Any ideas ? – Tony Tannous Jan 30 '17 at 14:56
  • Managed to fix this by wrapping it by `try` and if an exception is thrown then I break. Thanks! – Tony Tannous Jan 30 '17 at 15:07