0

I am having a bit of trouble figuring the following out:

I have a file with 100 lines for example, let's call it file A

I also have another file with 100 lines for example, let's call it file B

Now I need the first loop to read 10 lines from file A and do it's thing and then go to the other loop that reads 10 lines from file B, does it thing and then goes back to the first loop to do 11-20 lines from file A and then back to second loop that does 11-20 lines from file B.

I need both loops to remember from which line to read.

How should I approach this?

Thanks!

EDIT:

Could something like this work?

a=0
b=10
x=0
y=10

    for 1000 times:
        read a-b rows:
            do its thing
        a += 10
        b += 10
            
        read x-y rows:
             do its thing
        x += 10
        y += 10
Karuvägistaja
  • 293
  • 1
  • 8
  • 17
  • 2
    generators are your friend. – eatmeimadanish Mar 07 '22 at 19:01
  • Hi and welcome to SO. It is important for the community that you *also* demonstrate that you are working to solve your issue. The best way to do that in my opinion is to include the **text** based version of the source code you have so far, even if it is not working quite right. If you want a nudge getting started I might look at a parent loop that has two child loops – JonSG Mar 07 '22 at 19:01
  • edited the first post – Karuvägistaja Mar 07 '22 at 19:10

7 Answers7

1

You can iterate over 10 lines at a time using this approach.

class File:
    def __init__(self, filename):
        self.f = open(filename, 'r')

    def line(self):
        yield self.f.readline()

    def next(self, limit):
        for each in range(limit):
            yield self.f.readline()

    def lines(self, limit=10):
        return [x for x in self.next(limit=limit)]


file1 = File('C:\\Temp\\test.csv')
file2 = File('C:\\Temp\\test2.csv')
print(file1.lines(10)
print(file2.lines(10)
print(file1.lines(10)
print(file2.lines(10)

Now you can jump back and forth between files iterating over the next 10 lines.

eatmeimadanish
  • 3,809
  • 1
  • 14
  • 20
1

Here is another solution using a generator and a context manager:

class SwitchFileReader():
    
    def __init__(self, file_paths, lines = 10):
        self.file_paths = file_paths
        self.file_objects = []
        self.lines = 1 if lines < 1 else lines

    def __enter__(self):
        for file in self.file_paths:
            file_object = open(file, "r")
            self.file_objects.append(file_object)
        return self
    
    def __exit__(self, type, value, traceback):
        for file in self.file_objects:
            file.close()
    
    def __iter__(self):

        while True:

            next_lines = [
                [file.readline() for _ in range(self.lines)] 
                for file in self.file_objects
            ]
            
            if any(not all(lines) for lines in next_lines):
                break

            for lines in next_lines:
                yield lines
file_a = r"D:\projects\playground\python\stackgis\data\TestA.txt"
file_b = r"D:\projects\playground\python\stackgis\data\TestB.txt"

with SwitchFileReader([file_a, file_b], 10) as file_changer:
    for next_lines in file_changer:
        print(next_lines , end="")  # do your thing

The iteration will stop as soon as there are less remaining lines in any of the files.

Assuming file_a has 12 lines and file_b has 13 lines. Line 11 and 12 from file_a and line 11 to 13 from file_b would be ignored.

Thomas
  • 8,357
  • 15
  • 45
  • 81
0

For simplicity I'm going to work with list. You can read the file into a list.

Let's split the problem. We need

  1. group each list by any number. In your case 10
  2. Loop in each 10 bunches for both arrays.

Grouping

Here an answer: https://stackoverflow.com/a/4998460/2681662

def group_by_each(lst, N):
    return [lst[n:n+N] for n in range(0, len(lst), N)]

Loop in two list at the same time:

You can use zip for this.

lst1 = list(range(100)) # <- Your data
lst2 = list(range(100, 200)) # <-- Your second data

def group_by_each(lst, N):
    return [lst[n:n+N] for n in range(0, len(lst), N)]

for ten1, ten2 in zip(group_by_each(lst1, 10), group_by_each(lst2, 10)):
    print(ten1)
    print(ten2)
MSH
  • 1,743
  • 2
  • 14
  • 22
  • Could the solution in my edited first post also work? – Karuvägistaja Mar 07 '22 at 19:12
  • 1
    Yep. As @eatmeimadanish mentioned. You can use generators. A generator can go to next iteration by demand. You just call the `next` function. So imagine you call `next` 10 times for the first file, then for the second file. And you do not need to keep track of line number, since generator would go to the next line anyway. – MSH Mar 07 '22 at 19:16
  • 2
    The only problem with this method is that the entire file is read into memory and then parsed. This could be a performance bottleneck or not work at all in large files. Instead check out my example using a generator. – eatmeimadanish Mar 07 '22 at 19:17
  • 1
    @eatmeimadanish you are absolutely right. But the question says the files are 100 lines. So I don't think it would be some huge data. – MSH Mar 07 '22 at 19:19
  • Yeah, the 100 lines was an example but the real file will be like 300 lines so.. also.. it's not a huge file. – Karuvägistaja Mar 07 '22 at 19:20
  • 3
    Totally agree. But you should always build it once and forget it. The right way to do this is a generator, it is faster, more efficient, and will always work, no matter what you throw at it. The biggest pitful in writing code is short sitedness. – eatmeimadanish Mar 07 '22 at 19:22
0

When you iterate over a file object, it yields lines in the associated file. You just need a single loop that grabs the next ten lines from both files each iteration. In this example, the loop will end as soon as either file is exhausted:

from itertools import islice

lines_per_iter = 10

file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")

while (a := list(islice(file_a, lines_per_iter))) and (b := list(islice(file_b, lines_per_iter))):
    print(f"Next {lines_per_iter} lines from A: {a}")
    print(f"Next {lines_per_iter} lines from B: {b}")

file_a.close()
file_b.close()
Paul M.
  • 10,481
  • 2
  • 9
  • 15
0

Ok, thank you for all the answers, I found a working solution to my project like this:

a=0
b=10
x=0
y=10

while True:

    for list1 in range(a, b):
        #read the lines from file A
    a += 10
    b += 10

    for list2 in range(x, y):
        #read the lines from file B
    if y == 100:
        break
    x += 10
    y += 10
Karuvägistaja
  • 293
  • 1
  • 8
  • 17
0

I know it's been a long time since this question was asked, but I still feel like answering it my own way for future viewers and future reference. I'm not exactly sure if this is the best way to do it, but it can read multiple files simultaneously which is pretty cool.

from itertools import islice, chain
from pprint import pprint


def simread(files, nlines_segments, nlines_contents):
    lines = [[] for i in range(len(files))]
    total_lines = sum(nlines_contents)
    current_index = 0

    while len(tuple(chain(*lines))) < total_lines:
        if len(lines[current_index]) < nlines_contents[current_index]:
            lines[current_index].extend(islice(
                files[current_index],
                nlines_segments[current_index],
            ))

        current_index += 1

        if current_index == len(files):
            current_index = 0

    return lines


with open('A.txt') as A, open('B.txt') as B:
    lines = simread(
        [A, B], # files
        [10, 10], # lines to read at a time from each file
        [100, 100], # number of lines in each file
    ) # returns two lists containing the lines in files A and B
    pprint(lines)

You can even add another file C (with any number of lines, even a thousand) like so:

with open('A.txt') as A, open('B.txt') as B, open('C.txt') as C:
    lines = simread(
        [A, B, C], # files
        [10, 10, 100], # lines to read at a time from each file
        [100, 100, 1000], # number of lines in each file
    ) # returns two lists containing the lines in files A and B
    pprint(lines)

The values in nlines_segments can also be changed, like so:

with open('A.txt') as A, open('B.txt') as B, open('C.txt') as C:
    lines = simread(
        [A, B, C], # files
        [5, 20, 125], # lines to read at a time from each file
        [100, 100, 1000], # number of lines in each file
    ) # returns two lists containing the lines in files A and B
    pprint(lines)

This would read file A five lines at a time, file B twenty lines at a time, and file C 125 lines at a time.

NOTE: The values provided in nlines_segments all have to be factors of their corresponding values in nlines_contents, which should all be the exact number of lines in the files they correspond to.

I hope this heps!

Pyzard
  • 451
  • 3
  • 14
0

There is already a billion answers, but I just felt like answering this in a simple way.

with open('fileA.txt', 'r') as a:
    a_lines = a.readlines()
    a_prog = 0

with open('fileB.txt', 'r') as b:
    b_lines = b.readlines()
    b_prog = 0

for i in range(10):
    temp = []
    for line in range(a_prog, a_prog + 10):
        temp.append(a_lines[line].strip())
    a_prog += 10
    #Temp is the full 10-line block.
    #Do something...

    temp = []
    for line in range(b_prog, b_prog + 10):
        temp.append(b_lines[line].strip())
    b_prog += 10
    #Temp is the full 10-line block.
    #Do something...
    
Hchap
  • 58
  • 6