Put io.BytesIO object in a dataframe

Question

I have this byte object that I read from csv in s3 bucket (not the whole object):

b'00032501;2020-03-03;-;00:00:00;00:59:03;60;3543;1;0;0\r\n00032502;2020-03-03;-;00:00:00;00:59:03;60;3543;1;0;0\r\n00067601;2020-03-03;RTL;00:00:00;00:20:19;5;1219;1;1;0\r\n00067601;2020-03-03;VOX;00:20:19;00:59:27;8;2348;1;1;0\r\n00102204;2020-03-03;-;00:00:00;00:21:56;24;1316;2;1;1\r\n00170201;2020-03-03;-;00:00:00;00:59:50;62;3590;1;0;0\r\n00170202;2020-03-03;-;00:00:00;00:59:50;62;3590;1;0;0\r\n00170801;2020-03-03;ZDF;00:00:00;00:10:13;3;613;1;1;0\r\n00187202;2020-03-03;-;00:00:00;00:24:08;26;1448;1;0;0\r\n00339802;2020-03-

How can I put it in dataframe? Now I am using io.BytesIO to read the data, but when I put it in dataframe with df = pd.read_csv(io.BytesIO object, dtype=str, sep=';'), it is not separated properly, all the columns are inside the first one. When I use sep=';' it does nothing.

s3 = boto3.client("s3")

    if event:

        file_obj = event["Records"][0]
        filename = str(file_obj['s3']['object']['key'])

        f_obj = s3.get_object(Bucket = bucket, Key = filename)
        print(f_obj)

        file_content = f_obj["Body"].read()

        data = io.BytesIO(file_content)

        df = pd.read_csv(data)

        print(df)

do you read the csv directly into the dataframe, or are you first reading the csv with `io.BytesIO`? — MGP, Mar 23 '20 at 10:48
@MG92 I read the file with io.BytesIO first. The post is edited with my code. — curiosLlama, Mar 23 '20 at 10:56
This post might be relevant to your question: https://stackoverflow.com/questions/35803601/reading-a-file-from-a-private-s3-bucket-to-a-pandas-dataframe — Scratch'N'Purr, Mar 23 '20 at 11:17
Not sure if this is related to your issue, but I had this silly problem where pandas expected 7 columns instead of 8, so I just passed in column names (through the names) and all is well — janeon, Dec 29 '21 at 22:01

Put io.BytesIO object in a dataframe

0 Answers0