0

I'm gonna connect to a S3 bucket, get the csv files and copy the rows to RDS DB. On this script we are using arcpy, I'm not that familiar with this package, I'm just trying to get the csv file directly from S3 bucket as source without downloading it on the server. The code is as follows:

import arcpy
from boto.s3.key import Key
import StringIO
import pandas as pd
import boto
import boto.s3.connection

access_key = ''
secret_key = ''
conn = boto.connect_s3(aws_access_key_id = access_key,aws_secret_access_key = secret_key,host = 's3.amazonaws.com')

b = conn.get_bucket('mybucket')
#for key in b.list:
b_key = b.get_key('file1.csv')

arcpy.env.overwriteOutput = True

b_url = b_key.generate_url(0, query_auth=False, force_http=True)
print b_url

##Read file
k = Key(b,file1.csv)
content = k.get_contents_as_string()
sourcefile_csv = pd.read_csv(StringIO.StringIO(content))

##CopyRows_management (in_rows, out_table, {config_keyword})
#http://pro.arcgis.com/en/pro-app/tool-reference/data-management/copy-rows.htm

arcpy.CopyRows_management(sourcefile_csv, "RDSTablePath", "")

print("copy rows done")

Error: in CopyRows arcgisscripting.ExecuteError. Failed to execute Parameters are not valid

If we use a path on the server as source path like below it works fine:

sourcefile_csv = "D:\\DEV\\file1.csv"
arcpy.CopyRows_management(sourcefile_csv, "RDSTablePath", "")

Any help would be appreciated.

Matrix
  • 2,399
  • 5
  • 28
  • 53
  • Assuming you already have access to an AWS account, and if the goal is just taking a CSV file in a S3 bucket and then loading the contents into an RDS (with maybe some transformations in the content), you might save yourself some trouble and use [Amazon Glue](https://aws.amazon.com/glue/?nc1=h_ls). You can find an example [here](https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-copys3tords.html). – Rrr Apr 25 '18 at 14:44

2 Answers2

0

It looks like you are trying to use the Pandas dataframe as the table to read from with CopyRows_management? I don't think that is a valid input for the function, thus the "Parameters are not valid" error. The documentation says that in_rows should be "The rows from a feature class, layer, table, or table view to be copied." I think the use of pandas is unnecessary here anyways.

So either save the csv somewhere that the script can access it (as you did in when you used the path on the server) or, if you don't want to save the file anywhere, just read the contents of the csv and iterate through it using an Insert Cursor to write it to your table/feature class.

See this post on how to read a csv from a string using the csv module. Then just loop through the rows of the csv and use the Insert Cursor to write to your table.

BigGerman
  • 525
  • 3
  • 11
0

If your RDS happens to be an Aurora MySql then you should take a look into Loading Data from S3 feature, where you can skip the code and just loads line by line into your DB.

Joaquín Bucca
  • 1,060
  • 8
  • 11