0

I am trying to load csv into dynamodb table with python program as below but getting like index out of range error

Input csv file looks like:

1st line is atrributes
2nd line is datatype for attributes
3rd line onwards actual data

csv file content:

customer_id,key_id,dashboard_name,tsm,security_block,core_block,type,subscription,account_id,region,sed,jumpbox,dc,av,gl,backup,cpm,zb
int,int,string,string,string,string,string,string,string,string,string,string,string,string,string,string,string,string
1,1,Act,yes,no,no,az,xxxxx-xxx-xxxx-xxxx-xxxx,null,eu-west-1,yes,yes,yes,no,yes,no,notapplicable,yes
1,2,Act,no,no,yes,az,xxxxx-xxx-xxxx-xxxx-xxxx,null,eu-west-1,no,yes,no,yes,no,yes,notapplicable,no
2,1,Cap,no,no,yes,aws,notapplicable,xxxxxxxx,us-west-2,yes,no,no,no,yes,no,yes,yes
2,2,Cap,yes,no,no,aws,notapplicable,xxxxxxxx,us-west-2,yes,no,no,no,yes,no,no,yes
2,3,Cap,no,yes,no,aws,notapplicable,xxxxxxxx,us-west-2,no,yes,no,yes,no,yes,yes,no
2,4,Cap,yes,no,no,aws,notapplicable,xxxxxxxx,us-west-1,yes,no,no,no,yes,no,no,yes
2,5,Cap,no,no,yes,aws,notapplicable,xxxxxxxx,us-east-1,no,yes,no,yes,no,yes,yes,yes     

What I tried:

# Python Script to insert csv records in dynamodb table.
from __future__ import print_function  # Python 2/3 compatibility
from __future__ import division  # Python 2/3 compatiblity for integer division
import argparse
import boto3
from csv import reader
import time
# command line arguments
parser = argparse.ArgumentParser(
    description='Write CSV records to dynamo db table. CSV Header must map to dynamo table field names.')
parser.add_argument('csvFile', help='Path to csv file location')
parser.add_argument('table', help='Dynamo db table name')
parser.add_argument('writeRate', default=5, type=int, nargs='?',
                    help='Number of records to write in table per second (default:5)')
parser.add_argument('delimiter', default=',', nargs='?', help='Delimiter for csv records (default=,)')
parser.add_argument('region', default='us-west-2', nargs='?', help='Dynamo db region name (default=us-west-2')
args = parser.parse_args()
print(args)

# dynamodb and table initialization
endpointUrl = "https://dynamodb.us-west-2.amazonaws.com"
dynamodb = boto3.resource('dynamodb', region_name=args.region, endpoint_url=endpointUrl)
table = dynamodb.Table(args.table)

# write records to dynamo db
with open(args.csvFile) as csv_file:
    tokens = reader(csv_file, delimiter=args.delimiter)
    # read first line in file which contains dynamo db field names
    header = next(tokens)
    # read second line in file which contains dynamo db field data types
    headerFormat = next(tokens)
    # rest of file contain new records
    for token in tokens:
        print(token)
        item = {}
        for i, val in enumerate(token):
            print(val)
            if val:
                key = header[i]
                if headerFormat[i] == 'int':
                    val = int(val)
                if headerFormat[i] == 'stringset':
                    tempVal = val.split('|')
                    val = set()
                    for tok in enumerate(tempVal):
                        print(tok)
                        val.add(str(tok[1]))
                print(val)
                item[key] = val
        print(item)
        table.put_item(Item=item)

        time.sleep(1 / args.writeRate)  # to accomodate max write provisioned capacity for table

Error I am getting:

Traceback (most recent call last):
  File "C:\csv\dbinsert.py", line 39, in <module>
    key = header[i]
IndexError: list index out of range

I am passing filename and table name as parameter. Actually first two columns are numbers in dynamodb table, that means, in csv, 1,1 are considered as strings ? not sure where i am getting it wrong.

Can any one suggest please

asp
  • 777
  • 3
  • 14
  • 33
  • 1
    That error means one of your rows has more columns than your header row does – jordanm Jul 16 '20 at 15:43
  • Yeah, just noticed, i corrected it, but now i get different issue like below `{'customer_id': 1, 'key_id': 1, 'dashboard_name': 'Act', 'tsm': Traceback (most recent call last): botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the PutItem operation: One or more parameter values were invalid: Missing the key customer_id in the item` and if you see before customer id its printing some non readable character...dont know why. and if that is causing this error – asp Jul 16 '20 at 15:55
  • 1
    It looks like your CSV file may have been encoded in Unicode. Those initial characters are the [Byte Order Mark](https://en.wikipedia.org/wiki/Byte_order_mark). Try `with open(args.csvFile, encoding='utf-8') as csv_file:` – jarmod Jul 16 '20 at 16:53
  • ok at least some progress, i changed as per your susgestion now again getting same error but that character looking like below `{'\ufeffcustomer_id': '1', 'key_id': '1', ` – asp Jul 16 '20 at 16:56
  • fixed the issue now with https://stackoverflow.com/questions/17912307/u-ufeff-in-python-string `with open(args.csvFile, mode ='r',encoding='utf-8-sig') as csv_file:` – asp Jul 16 '20 at 17:15

1 Answers1

1

Fixed issue with suggestion from @jarmod and added and referring to u'\ufeff' in Python string

This worked :

with open(args.csvFile, mode ='r',encoding='utf-8-sig') as csv_file:
asp
  • 777
  • 3
  • 14
  • 33