I need to transfer the data from S3 bucket to GCP bucket. I convert s3 file to DataFrame with pandas, then i make a parquet file and upload it to GCP bucket but this does not work... the last line of code is the one that I understand that I am not getting to work I have this error: pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 5, saw 3
import boto3
import io
from google.cloud import storage
import os
import pandas as pd
buffer = io.BytesIO()
s3 = boto3.resource('s3',
aws_access_key_id='MyKey',
aws_secret_access_key='MySecretKey')
object=s3.Object('my_bucket_s3','2022/test.parquet')
object.download_fileobj(buffer)
df = pd.read_parquet(buffer)
client = connections["My-Connection"].storage_client
client = storage.Client()
bucket = client.get_bucket('my_bucket_gcp')
bucket.blob('TEST/test.parquet').upload_from_string(df.to_parquet(), 'parquet')