-1

I have a .csv file with comma separated fields. I am receiving this file from a 3rd party and the content cannot change. I need to import the file to a database, but there are commas in some of the "comma" separated fields. The comma separated fields are also fixed length - when I stright up print the fields as per the below lines in function insert_line_csv they are spaced in a fixed length.

I need essentially need an efficient method of collecting fields that could have comma's included in the field. I was hoping to combine the two methods. Not sure if that would be efficient.

I am using python 3 - willing to use any libraries to make the job efficient and easy.

Currently I am have the following:

with open(FileName, 'r') as f:
    for count, line in enumerate(f):
        insert_line_csv(count, line)

with the insert_line_csv function looking like:

def insert_line_csv(line_no, line):
    line = line.split(",")
    field0 = line[0]
    field1 = line[1]
    ...... 

I am importing the line_no, as that is also being entered into the db.

Any insight would be appreciated.

A sample dataset:

text    ,2000.00   ,2018-07-07,textwithoutcomma      ,text     ,1 
text    ,3000.00   ,2018-07-08,textwith,comma        ,text     ,7 
text    ,1000.00   ,2018-07-07,textwithoutcomma      ,text     ,4 
Kevin Smith
  • 581
  • 2
  • 6
  • 13

1 Answers1

0

If the comma seperated fields are all fixed length, you should be able to just splice them off by count instead of splicing by commas, see Split string by count of characters

as a mockup-code you have

toParse = line

while (toParse != "")
    chunk = first X chars of toParse
    restOfLine = toParse without the chars just cut off
    write chunk to db
    toParse = restOfLine

That should work imho

Edit: upon seeing your sample dataset. Can there only be one field with commas inside of it? If so, you could split via comma, read out the first 3 fields, then the last two. Whatever is left, you concatenate again, because it is the value fo the 4th field. (If it had commas, ou'll need to actually concatenate there, if not, its already the value)