0

I have a csv file like following :

A, B, C, D
2,3,4,5
4,3,5,2
5,8,3,9
7,4,2,6
8,6,3,7

I want to fetch the B values from 3 rows at a time(for first iteration values would be 3,3,8) and save in some variable(value1=3,value2=3,value3=8) and pass it on to a function. Once those values are processed. I want to fetch the values from next 3 rows (value1=3,value2=8,value3=4) and so on.

The csv file is large. I am a JAVA developer, if possible suggest the simplest possible code.

Ankit Chauhan
  • 646
  • 6
  • 20

4 Answers4

2

An easy solution would be the following:

import pandas as pd
data = pd.read_csv("path.csv")

for i in range(len(data)-2):
    value1 = data.loc[i,"B"]
    value2 = data.loc[i+1,"B"]
    value3 = data.loc[i+2,"B"]
    function(value1, value2, value3)
1

This is a possible solution (I have used the function proposed in this answer):

import csv
import itertools

# Function to iterate the csv file by chunks (of any size)
def grouper(n, iterable):
    it = iter(iterable)
    while True:
       chunk = tuple(itertools.islice(it, n))
       if not chunk:
           return
       yield chunk

# Open the csv file
with open('myfile.csv') as f:
    csvreader = csv.reader(f)
    # Read the headers: ['A', 'B', 'C', 'D']
    headers = next(csvreader, None)
    # Read the rest of the file by chunks of 3 rows
    for chunk in grouper(3, csvreader):
        # do something with your chunk of rows
        print(chunk)

Printed result:

(['2', '3', '4', '5'], ['4', '3', '5', '2'], ['5', '8', '3', '9'])
(['7', '4', '2', '6'], ['8', '6', '3', '7'])
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50
0

You can use csv module

import csv
with open('data.txt') as fp:
    reader = csv.reader(fp)
    next(reader) #skips the header
    res = [int(row[1]) for row in reader]
    groups = (res[idx: idx + 3] for idx in range(0, len(res) - 2))
for a, b, c in groups:
    print(a, b, c)

Output:

3 3 8
3 8 4
8 4 6
deadshot
  • 8,881
  • 4
  • 20
  • 39
  • Depending on how large the file is, your approach might end up using too much memory. Isn't it a better idea using iterators instead of lists? – Riccardo Bucco Aug 07 '20 at 09:30
0

You can use pandas to read your csv with chunksize argument as described here (How can I partially read a huge CSV file?)

import pandas as pd

#Function that you want to apply to you arguments
def fn(A, B, C, D):
    print(sum(A), sum(B), sum(C), sum(D))

#Iterate through the chunks
for chunk in pd.read_csv('test.csv', chunksize=3):
    #Convert dataframe to dict
    chunk_dict = chunk.to_dict(orient = 'list')
    #Pass arguments to your functions
    fn(**chunk_dict)
perlusha
  • 153
  • 6