3

I am struggling to find the correct method to read and parse a csv file in order to output the number of rows contained within the file

I am trying to figure out using different method but I am little stumped

import boto3, botocore, csv

s3 = boto3.resource('s3')
s3obj = s3.Object('mybucket','myfile')

with s3obj.get() as f:
    reader=csv.reader(f,delimter=",")
    data=list(reader)
    row_count=len(date)

This obviously is not working becuase either 1, syntax is wrong or number 2 I have no idea what I am doing. I was referencing this article and tried to implement it using s3.

Row count in a csv file

Instead of explicitly opening the file is possible to perform the csv.reader function on the s3 object using s3obj.get()?

Excuse my ignorance still learning programming and all so any explanation would be very helpful

Community
  • 1
  • 1
Huzaifa M Aamir
  • 383
  • 2
  • 4
  • 16

3 Answers3

3

I was able to get the desired results by using a regular count method in python:

import boto3, botocore

s3 = boto3.resource('s3')
s3obj = s3.Object( 'mybucket', 'myfile')

filedata= s3obj.get()["Body"].read()


print (filedata.decode('utf8').count('\n')-1)
Huzaifa M Aamir
  • 383
  • 2
  • 4
  • 16
1

s3obj.get() returns a dict response. You have to get the Body from the response which is the Object data (StreamingBody).

s3obj = s3.Object('mybucket','myfile')
content = s3obj.get()['Body']

But this StreamingBody supports only read(), which does not support iterator protocol required by csv.reader().

franklinsijo
  • 17,784
  • 4
  • 45
  • 63
  • Thank you for your explanation. I think I was able to get it working I had to subtract 1 from the total count to account for the header row. Not sure if this is the best practice to do it. – Huzaifa M Aamir Mar 04 '17 at 18:08
1

Previous answer works pretty good but sometimes the following error can appear:

'utf-8' codec can't decode byte 0xf3 in position 127: invalid continuation byte

If so, try with:

filedata.decode('ISO-8859-1').count('\n')-1
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
herbertgoto
  • 339
  • 1
  • 5