0

I am using boto to read a csv file and parse it contents. This is the code I wrote:

import boto
from boto.s3.key import Key
import pandas as pd
import io

conn = boto.connect_s3(keyId, sKeyId)
bucket = conn.get_bucket(bucketName)

# Get the Key object of the given key, in the bucket
k = Key(bucket, srcFileName)

content = k.get_contents_as_string()
reader = pd.read_csv(io.StringIO(content))

for row in reader:
    print(row)

But I am getting error at read_csv line:

TypeError: initial_value must be str or None, not bytes

How can I resolve this error and parse the contents of the csv file present on S3

UPDATE: if I use BytesIO instead of StringIO then the print(row) line only prints 1st row of the csv. How do I loop over it?

This is my current code:

    import boto3

    s3 = boto3.resource('s3',aws_access_key_id = keyId, aws_secret_access_key = sKeyId)

    obj = s3.Object(bucketName, srcFileName)

    content = obj.get_contents_as_string()
    reader = pd.read_csv(io.BytesIO(content), header=None)

    count = 0
    for index, row in reader.iterrows():
        print(row[1])

When I execute this I get AttributeError: 's3.Object' object has no attribute 'get_contents_as_string' error

user2966197
  • 2,793
  • 10
  • 45
  • 77
  • can you try `BytesIO` instead of `StringIO`? – salient Jun 01 '17 at 22:06
  • @salient how do I loop over all the rows in the csv file? When I use `BytesIO` and do `for row in reader: print(row)` it only prints 1st row of the csv file – user2966197 Jun 01 '17 at 22:10
  • have you removed the `[0]`? – salient Jun 01 '17 at 22:11
  • @salient yes I removed [0] – user2966197 Jun 01 '17 at 22:13
  • oh sorry, you get a pandas dataframe back from that operation. can't you just do something along these lines to print: https://stackoverflow.com/questions/19124601/is-there-a-way-to-pretty-print-the-entire-pandas-series-dataframe – salient Jun 01 '17 at 22:15
  • If you must iterate over rows: http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.DataFrame.iterrows.html – salient Jun 01 '17 at 22:19
  • @salient thanks I was able to execute that but now running into different related issues. Somehow when I execute it on an AWS server it throws error on `import boto`. I checked installed packages and there is `boto3` but no `boto`. Also I do not have permissions to install a new package. Does this works only for `boto` and not`boto3`? – user2966197 Jun 01 '17 at 22:47
  • The syntax for `boto3` is a bit different http://boto3.readthedocs.io/en/latest/guide/migrations3.html but the principles are the same – salient Jun 01 '17 at 22:48
  • @salient I am getting `AttributeError: 's3.Object' object has no attribute 'get_contents_as_string'` when trying boto3. I have updated my code in my post above in update section – user2966197 Jun 01 '17 at 23:28
  • try `s3.get_object()['Body'].read()` – salient Jun 01 '17 at 23:34
  • http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object – salient Jun 01 '17 at 23:34

0 Answers0