0

I have a python script here:

import pyodbc
import pandas as pd
from sqlalchemy import create_engine 
import csv

df = pd.read_sql("""script_generator""", conn)

for count, row in df.iterrows():
    row.to_csv('generatedfile{}.sql'.format(count), index=False, index_label=False, quoting=csv.QUOTE_NONE, escapechar=' ')

and when I run it, it creates separate csv files that are formatted in sql. The output looks like this in generatedfile2:

2
IF    EXISTS  (SELECT  *  FROM  sys.objects  WHERE  object_id  =  OBJECT_ID(N'table1')  AND  type  in  (N'U')) 
 
BEGIN 
 
        PRINT  'DROPPING  TABLE  [dbo].[table1]....' 
 
        DROP  TABLE  [dbo].[table1];   
 
END;   

The rest of the files have this same format. Is there any way I can change my code to get rid of the "2" at the beginning of the code? It won't run properly in SQL because of it. For some reason index_label=False won't get rid of it. Thanks a bunch!

moni
  • 3
  • 4
  • You know what, this might be a bug! I do reproduce issue. Strange the defaults `index=True` and `index_label=None` leads with comma in .csv. Consider alerting authors on the [GitHub pandas dev page](https://github.com/pandas-dev/pandas/issues). Be sure to include a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) with runnable code and small data sample. – Parfait Nov 13 '20 at 23:08

1 Answers1

0

When running DataFrame.iterrows, row renders as a Pandas Series and not a Data Frame. So actually, you are running Series.to_csv.

for count, row in df.iterrows():
    print(type(row))

# <class 'pandas.core.series.Series'>
# <class 'pandas.core.series.Series'>
# <class 'pandas.core.series.Series'>
...

However, this method is not useful to iterate off a single column. Instead, consider simply looping directly on the values of the column (assumed to be entirely your SQL statement). With each iteration, create the corresponding csv file with typical open + write methods. Use enumerate for sequential numbering for file suffixes.

for count, val in enumerate(df['sql_string_column']):
    with open('generatedfile{}.sql'.format(count), 'w') as f:
        f.write(val)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you for your input! I was able to get rid of the index number with your method. My column doesn't have a column name so I wrote: for i, val in enumerate(df['']): with open('generatedfile{}.sql', 'w') as f: f.write(val) but it only creates one .sql file now, which is just named generatedfile{}.sql. Is having a blank column name messing it up? – moni Nov 14 '20 at 00:12