Processing csv iteratively 3 rows at a time in Python

Question

I have a csv file like following :

A, B, C, D
2,3,4,5
4,3,5,2
5,8,3,9
7,4,2,6
8,6,3,7

I want to fetch the B values from 3 rows at a time(for first iteration values would be 3,3,8) and save in some variable(value1=3,value2=3,value3=8) and pass it on to a function. Once those values are processed. I want to fetch the values from next 3 rows (value1=3,value2=8,value3=4) and so on.

The csv file is large. I am a JAVA developer, if possible suggest the simplest possible code.

score 2 · Accepted Answer · answered Aug 07 '20 at 09:27

2

An easy solution would be the following:

import pandas as pd
data = pd.read_csv("path.csv")

for i in range(len(data)-2):
    value1 = data.loc[i,"B"]
    value2 = data.loc[i+1,"B"]
    value3 = data.loc[i+2,"B"]
    function(value1, value2, value3)

answered Aug 07 '20 at 09:27

Edoardo Pericoli

298
1
9

What if the number of rows is not a multiple of 3? – Riccardo Bucco Aug 07 '20 at 09:31

score 1 · Answer 2 · answered Aug 07 '20 at 09:28

This is a possible solution (I have used the function proposed in this answer):

import csv
import itertools

# Function to iterate the csv file by chunks (of any size)
def grouper(n, iterable):
    it = iter(iterable)
    while True:
       chunk = tuple(itertools.islice(it, n))
       if not chunk:
           return
       yield chunk

# Open the csv file
with open('myfile.csv') as f:
    csvreader = csv.reader(f)
    # Read the headers: ['A', 'B', 'C', 'D']
    headers = next(csvreader, None)
    # Read the rest of the file by chunks of 3 rows
    for chunk in grouper(3, csvreader):
        # do something with your chunk of rows
        print(chunk)

Printed result:

(['2', '3', '4', '5'], ['4', '3', '5', '2'], ['5', '8', '3', '9'])
(['7', '4', '2', '6'], ['8', '6', '3', '7'])

deadshot · Answer 3 · 2020-08-07T09:47:43.623

0

You can use csv module

import csv
with open('data.txt') as fp:
    reader = csv.reader(fp)
    next(reader) #skips the header
    res = [int(row[1]) for row in reader]
    groups = (res[idx: idx + 3] for idx in range(0, len(res) - 2))
for a, b, c in groups:
    print(a, b, c)

Output:

3 3 8
3 8 4
8 4 6

edited Aug 07 '20 at 09:47

answered Aug 07 '20 at 09:28

deadshot

8,881
4
20
39

Depending on how large the file is, your approach might end up using too much memory. Isn't it a better idea using iterators instead of lists? – Riccardo Bucco Aug 07 '20 at 09:30

score 0 · Answer 4 · answered Aug 07 '20 at 09:40

You can use pandas to read your csv with chunksize argument as described here (How can I partially read a huge CSV file?)

import pandas as pd

#Function that you want to apply to you arguments
def fn(A, B, C, D):
    print(sum(A), sum(B), sum(C), sum(D))

#Iterate through the chunks
for chunk in pd.read_csv('test.csv', chunksize=3):
    #Convert dataframe to dict
    chunk_dict = chunk.to_dict(orient = 'list')
    #Pass arguments to your functions
    fn(**chunk_dict)

Processing csv iteratively 3 rows at a time in Python

4 Answers4