1

I need to generate postgres schema from a dataframe. I found csvkit library to come closet to matching datatypes. I can run csvkit and generate postgres schema over a csv on my desktop via terminal through this command found in docs:

csvsql -i postgresql myFile.csv 

csvkit docs - https://csvkit.readthedocs.io/en/stable/scripts/csvsql.html

And I can run the terminal command in my script via this code:

import os
a=os.popen("csvsql -i postgresql Desktop/myFile.csv").read()

However I have a dataframe, that I have converted to a csv string and need to generate schema from the string like so:

csvstr = df.to_csv()

In the docs it says that under positional arguments:

The CSV file(s) to operate on. If omitted, will accept
                        input on STDIN

How do I pass my variable csvstr into the line of code a=os.popen("csvsql -i postgresql csvstr").read() as a variable?

I tried to do the below line of code but got an error OSError: [Errno 7] Argument list too long: '/bin/sh':

a=os.popen("csvsql -i postgresql {}".format(csvstr)).read()

Thank you in advance

RustyShackleford
  • 3,462
  • 9
  • 40
  • 81

1 Answers1

2

You can't pass such a big string via commandline! You have to save the data to a file and pass its path to csvsql.

import csv

csvstr = df.to_csv()
with open('my_cool_df.csv', 'w', newline='') as csvfile:
    csvwriter= csv.writer(csvfile)
    csvwriter.writerows(csvstr)

And later:

a=os.popen("csvsql -i postgresql my_cool_df.csv")
Nadav Ruskin
  • 154
  • 1
  • 9
  • I can already do this, and write dataframe directly to csv using `to_csv('file/path'). I would really like to pass a string so I can take in other data sources and make process more efficient. – RustyShackleford Jul 15 '18 at 14:08
  • do you think I could pass a string in this way? https://stackoverflow.com/questions/163542/python-how-do-i-pass-a-string-into-subprocess-popen-using-the-stdin-argument – RustyShackleford Jul 15 '18 at 14:13
  • To be honest- I'm not sure. Refer to https://csvkit.readthedocs.io/en/1.0.2/tricks.html#specifying-stdin-as-a-file on syntax to passing STDIN to csvkit. An stdin channel is different from an argument list, I don't think you should be hitting a limit. I'd test for performance though, I'm not sure Python can handle this faster than simply saving and loading a file will be. – Nadav Ruskin Jul 15 '18 at 15:11