2

I have a file as follows:

Col1 Col2
A 1
A 1
A 1
A 2
A 1
A 1
A 3
A 1
A 1
A 1

I want to read the file line by line in Python. When I reach a line with Col2 > 1, I want to skip the number of lines equal to Col2. Here, when reached to A 2, I want to skip the next 2 lines and when I reach to Col2 equal to 3, I want to skip the next 3 lines and so on.

If I read the whole file in a list, I can do the following:

k = [1, 1, 1, 2, 1, 1, 3, 1, 1, 1]
i = []
for element in k:
    if element > 1:
        h = k.index(element)
        i.append(h)
        for j in range(1,element+1):
            i.append(h+j)

new_list = []       
for d in range(1, len(k)+1):
    if not d in i:
        new_list.append(k[d-1])

But my actual file is 7.2 GB so I thought to be more memory efficient, to read it line by line. How could I then implement this in Python?

jpp
  • 159,742
  • 34
  • 281
  • 339
Homap
  • 2,142
  • 5
  • 24
  • 34

3 Answers3

2

Just keep track of how many lines you need to skip as you are reading line by line, and decrement that value if it is > 0

with open('test.csv') as f:
  rd = csv.reader(f, delimiter=' ');next(rd) # skips header
  skip = 0
  for i, j in rd:
    if not skip:
      skip = 0 if int(j) < 2 else int(j)
      print(i, j)
    else:
      skip -= 1

Output for your sample input:

A 1
A 1
A 1
A 2
A 3

The 2 lines after A 2 and the three lines after A 3 are all skipped.

user3483203
  • 50,081
  • 9
  • 65
  • 94
  • Thanks! Could you please explain the code from if not skip? – Homap May 31 '18 at 14:26
  • `if not skip` just checks if `skip` is equal to 0 or not, and if it is, prints the line otherwise skips the line – user3483203 May 31 '18 at 14:32
  • Cool! I have never written if and else in the same line. So, when you have written `skip = 0 if int(j) <2 else int(j)`, here, if Col2 is smaller than 2, skip stays 0, is that so? – Homap May 31 '18 at 14:38
  • That is correct, this is [python's version of the ternary operator](https://stackoverflow.com/a/394814/3483203) – user3483203 May 31 '18 at 14:46
  • Sorry, just one more thing. Here it seems that when `else int(j)` is true, skip becomes equal to int(j), is that right? If so, how come it's not `else skip = int(j)`? – Homap May 31 '18 at 15:06
  • The `else skip = int(j)` is implied, it's just how Python's ternary operator works. – user3483203 May 31 '18 at 15:07
2

You can use csv module with a generator function. Then just iterate through items in your generator.

def gen_rows(file):
    with open(file, 'r') as fin:
        reader = csv.reader(mystr, delimiter=' ')
        headers = next(reader)

        for col1, col2 in reader:
            num = int(col2)
            if num > 1:
                for i in range(num):
                    next(reader)
            yield col1, col2

for i in gen_rows('file.csv'):
    print(i)

('A', '1')
('A', '1')
('A', '1')
('A', '2')
('A', '3')
jpp
  • 159,742
  • 34
  • 281
  • 339
0

Well this should save you the computation time, about twice as fast:

skipper = 0

result = []

for i in k:
    if i > 1:
        if not skipper:
            skipper = i
        else:
            skipper -= 1
    elif not skipper:
        result.append(i)
zipa
  • 27,316
  • 6
  • 40
  • 58