0

I'm using the :r command to running multiple sql scripts in the same session. Each script outputs multiple rows and columns. So by time the script is complete, I am left with a text file with multiple dataframes. Is there a way to easily split this text file into multiple text files, or some way to add a delimiter between the datasets? I have a delimiter added, but it's between columns. The command I am running is sqlcmd -S DATABASE_CONNECTION -m 1 -s "\t" -i sql_testing.sql > C:\Users\USER\file.txt. Within sql_testing I am using the :r command to run a few different scripts. I'm stuck on how to extract these multiple datasets from the text file. I'm not sure if it is something that can be done with pandas or from the command line.

Edit:

The output from the SQL scripts into a single text file looks like this:

ActDate                               \PLACE_NUMBER                                                                                                                                                                                      
--------------------------------------\--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
           2011-08-22 00:00:00.0000000\234                                                                                                                                                                                              
           2011-08-22 00:00:00.0000000\235                                                                                                                                                                                            
           2011-08-22 00:00:00.0000000\236                                                                                                                                                                                              
           2011-08-22 00:00:00.0000000\237                                                                                                                                                                                            
           2011-08-22 00:00:00.0000000\238                                                                                                                                                                                               
           2011-08-22 00:00:00.0000000\239                                                                                                                                                                                            
           2011-08-22 00:00:00.0000000\240                                                                                                                                                                                               
           2011-08-22 00:00:00.0000000\241                                                                                                                                                                                            
           2011-08-22 00:00:00.0000000\242                                                                                                                                                                                               
           2011-08-22 00:00:00.0000000\243                                                                                                                                                                                               
ActDate                               \PLACE_NUMBER                                                                                                                                                                                      
--------------------------------------\--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
           2013-08-22 00:00:00.0000000\456                                                                                                                                                                                               
           2013-08-22 00:00:00.0000000\457                                                                                                                                                                                            
           2013-08-22 00:00:00.0000000\458                                                                                                                                                                                               
           2013-08-22 00:00:00.0000000\459                                                                                                                                                                                            
           2013-08-22 00:00:00.0000000\460                                                                                                                                                                                               
d84_n1nj4
  • 1,712
  • 6
  • 23
  • 40
  • pandas DataFrames are in-memory objects and do not exist in text format. Please show example of text output to illustrate what you mean. – Parfait Jun 02 '21 at 00:47
  • @Parfait I updated my question to include the text file contents. For Pandas, I was referring to using `pandas.read_csv`, not sure if that's possible though. – d84_n1nj4 Jun 02 '21 at 01:40
  • In `.sql` script, can you not build a temp table to compile all these query results and then export that single table to text? – Parfait Jun 02 '21 at 14:08

1 Answers1

1

This might work for you (GNU csplit):

csplit -z file '/ActDate/' '{*}'

Will split file into files named xxnn where nn is from 00.

potong
  • 55,640
  • 6
  • 51
  • 83
  • That definitely helps, thanks. I also came across this, https://stackoverflow.com/a/59338732/3737798. Which helps clean up the output. Using a combination of that, your answer and adding `--digits=2 --prefix=outfile` to move the data to output files. When using results with different column names, I came across this https://stackoverflow.com/a/38315308/3737798, FYI. – d84_n1nj4 Jun 02 '21 at 15:12
  • Any idea why this wouldn't work on original output file. I can open that file with notepad++ and copy the contents and paste to a new file and this code works for that new file, no problem. – d84_n1nj4 Jun 02 '21 at 21:53
  • I think it was because it was not encoded properly (UTF-8). I'll leave these comments in case someone comes across this. – d84_n1nj4 Jun 02 '21 at 22:44