1

My task is to dump a large mssql table into a csv file to be uploaded elsewhere. The task needs to run nightly (or on any scheduled basis), and it should also be executable on demand (from a web page). The csv conversion must be done in Java.

This is something I've never implemented, but it seems like a very common need. From my Google research, I came up with two options. But I have no idea if these have obvious pitfalls, or if there are obvious (and superior) implementation options that I am unaware of. So I could really use some informed feedback/advice on how to approach this.

  1. Execute my Java application in a web services container, something like Apache Tomcat. The application would just run and to do it's business logic on whatever schedule, and Tomcat would make it easily accessible from a web page. But this just seems like overkill, to setup and maintain a Tomcat installation.

  2. Package my Java program a jar, and setup a Jenkins job (I have access to a Jenkins server installation) to execute the main class on whatever schedule. And create Jenkins jobs (in php or something) as needed to do on demand executions. This seems like a very clean and easy option, are there any obvious pitfalls that I'm missing?

There are so many tools for this job, no idea how to pick the right one.

Also, any general advice on how to convert the table to csv in a memory safe and time efficient manner would be greatly appreciated.

ab11
  • 19,770
  • 42
  • 120
  • 207

2 Answers2

1

The most common way for handling running a job on a schedule would be cron.

As for accessing it via a webpage perhaps it could be operated as a webservice, using somekind of rest or soap api. This webservice would basically run your java logic and deliver the csv either as a response or generate it on the file system on the server.

As for dumping the data - it should be possible to export into csv from mysql directly: Dump a mysql database to a plaintext (CSV) backup from the command line

If there is no requirement to do this in Java then I would suggest a simple webpage with a submit button for calling the mysqldump command. This could be a php script that would be a wrapper to a system call to run the mysqldump to csv and dump it to a location on the http server.

Community
  • 1
  • 1
niallhaslam
  • 282
  • 2
  • 12
  • I read the question as being for MSSQLSERVER rather than MySQL. I've posted an answer response separately -- and like you indicate, I implemented an ESB with a trivial Apache REST service. – irwinj Jul 20 '15 at 09:09
0

I've done something like this before but the constraints I had were:

  1. The invocation was by a third party (using Control-M -- other scheduling tools are available including cron).

  2. The target was an ETL into MySQL.

  3. The platform was Linux at my end and (obviously)/Windows for the MSSQLSERVER end.

To present the service, we used Apache and a trivial trio of PHP scripts that took a noun (database schema name) & verb (start/stop/status) that queued, killed and peeked the job status for the relevant extract.

The start was invoked by pushing a task onto a queue that was monitored by a simple (cronnable probably) loop which was written in bash. This invoked isql to dump each of the schema's tables.

We needed to discover the schema and the table structure dynamically so we used something like...

SELECT TABLE_NAME from INFORMATION_SCHEMA.TABLES where TABLE_NAME like'<Schema>.%';

We needed to filter out temporary tables that matched simple patterns -- I think I used sed for that. If you are not worried about reimporting to a different RDBMS, we can skip the step of reconstructing a create table statement (we didn't find a MSSQLSERVER equivalent of show create table and did something else).

To dump the table, we simply invoked the isql client. If you are on Windows, you can use the native MSSQLSERVER client

echo "select * from $table" | isql -b -q -d\|  $schema $username $p/w > /tmp/$table.csv

This dumps using pipes (|) rather than commas since we had a lot of commas in free text data. It also quote encapsulated string fields. I think I also edited the source for isql (since it was open source and brilliant for the job) to escape embedded quotation marks within string types (I had to do this for our Oracle sources too) to make the load into MySQL easier.

The stop was similar where a job would invoke a number of process kills (found via a process tree, more elegant methods are available I'm sure) -- it needed to be brutal and immediate since its invocation was in the event of colliding with the online day. If we missed one day of extract it was deemed less important than affecting the start of the business day. The script also tidied up the status and marked the extract as bad for the downstream services so they ignored it and continued with the previous unload.

The status was a convenience for the client of this whole service. What we learned is that you should prepare for them to issue a start followed by a busy loop on status (between 20&200 per second!) Until fifteen minutes to three hours later when the status returns idle followed by the client issuing a stop (make sure you no-op a stop on an idle extract). We simply added a sleep to the status service as it was sadly too difficult to convince the client to change their logic.

This was for a central government enforcement agency, the client above was one of our outsourced IT service providers. This ecosystem was (is!) running against about half a dozen sources including Oracle, MSSQLSERVER and SESAM.

irwinj
  • 113
  • 8