0

I am trying to get a process going in python to write data into a .csv which can then be BCP'd into a MSSQL database.

The basic to_csv command I am using is:

df.to_csv(csv_path, sep = "«", header = False, index = False, line_terminator="[~~]")

The one issue I have been seeing is how the program handles double quotes within text. If a set of double quotes is found, it quote-escapes it.

So a row which looks like:

    1, 7, Executed Job: "abcdf" Complete

Gets turned into this when loaded into the data:

    1, 7, "Executed Job: ""abcdf"" Complete"

This is frustrating because it means if I want to use the data post-BCP, I need to process it further on the SQL side. Since my ETL runs over hundreds of tables, implementing this would be a huge headache.

I am wondering if there is a way to handle quotes in the data BEFORE running the to_csv command via the pandas dataframe.replace() function. Is there any option to feed replace which would turn quotes in the dataframe into something to_csv could read as double quotes without causing escape problems?

For example something along the lines of:

df.replace('"','\"')
df.to_csv(csv_path, sep = "«", header = False, index = False, line_terminator="[~~]")

Edit: For reference I have tried using doublequote=False and setting csv quoting to QUOTE NONE, but both lead to 'need to escape character and lack an escape char' errors.

Josh Kraushaar
  • 369
  • 5
  • 17
  • 1
    Make your life easier and go directly from pandas to sql server:https://stackoverflow.com/questions/25661754/get-data-from-pandas-into-a-sql-server-with-pyodbc – AidanGawronski Oct 30 '19 at 01:29
  • 1
    pandas to_sql is dramatically slower than just bcp-ing it. My current ETL setup is using to_sql but it isn't cutting it for the very large tables. – Josh Kraushaar Oct 30 '19 at 01:31
  • makes sense.... – AidanGawronski Oct 30 '19 at 01:32
  • Sadly I don't believe it does: see the edit I added. If I set quoting = csv.QUOTE_NONE it creates an escape error. If I set QUOTE_NONE and add a new escape character then it introduces the problem of needing to remove the characters on the database side, which is messy. It would be FAR easier to nip this problem in the bud with .replace() so I can make 1 code change rather than dozens. – Josh Kraushaar Oct 30 '19 at 01:35

1 Answers1

1

use

import csv
...
df.to_csv(..., quoting=csv.QUOTE_NONE)
Lambda
  • 1,392
  • 1
  • 9
  • 11
  • I should have specified (will edit the main ask): using quote_none or doublequote = False creates an escape char error. Using any escape char in place of quotes creates the same 'need to process the characters after the fact in the database' issue. – Josh Kraushaar Oct 30 '19 at 01:32
  • I think maybe there's some other characters should be escaped in you case. I used the row you showed to test my code, and got the correct result. You can try `df['z'].replace(re.compile(r'"'), '\\\\"')` to replace `"`. – Lambda Oct 30 '19 at 01:59