2

Question:

I a general sense, into what exactly do I need to abstract a .CSV file to “run SQL statements” on it?

I think I can write the iteration loop for step 1 of "Problem" below, but I don’t have the slightest idea how to do step 2.

Problem:

I work in university office where an overnight script (that I didn't write) combs the university's database and generates .csv files for letters that I need to send via "mail merge" the following day. It saves these .csv files to a server folder to which I have access.

The technology that does this is limited, and it produces a different .csv file for each admission term. (This seems to be connected to how the automation software knows which admission terms to pay attention to when it combs the university's database.)

However, my MS Word Mail Merge documents don't need this separation of .csv files, as "admission term" is also a field in the .csv file itself. Any variation I need can be done with conditional logic and field insertion.

The separation of .csv files was making me maintain multiple redundant Word documents for the same general idea, just one per term. Headache to maintain every time we transitioned to focusing on a new set of semesters. I fixed that with an Access solution (see below), but now IT is a headache to maintain.

Unfortunately, I don't have access to change the output of the overnight script, so I'm stuck with the .csv files I've got.

What I would like to do is write a program that lives on my desktop PC or on our server and:

  1. loops through every file in the applicable server folder, determining if it meets certain filename criteria and has non-null contents
  2. abstracts the .csv files that match step 1 into some abstraction on which SQL can be performed
  3. performs a SQL “UNION ALL” statement on “SELECT * FROM [the abstraction of the .csv file]” for every matching .csv file
  4. writes the results of step 3 as a new “merged” .csv file to our server (to which I’ll point Mail Merge files).
    • (Plus I’d like to automate my program to execute on a certain schedule, so that will impact my tool choice.)

So far, I've used MS Access to do steps 3 & 4 (and then I have Word read straight from the .mdb file when a Mail Merge file is opened).

However, it’s a real headache when we want to focus on a different set of terms. I have to manually change which .csv files the .mdb looks at because the .csv files I get have admission term built into their filenames.

Every 6 months, I take several hours to import specific new .csv files into new “linked tables,” repoint the unions, clean up old “linked tables” I no longer need, etc.

I’d much rather put a computer to work on the task, even if it has to do it every day. :-)

k..
  • 401
  • 3
  • 11
  • Couldn't you just concatenate the CSV files together? I don't understand what advantage you would get by converting the CSVs into a database if you're just going to run a `UNION`. If your CSV files have differences, you could use shell commands like AWK to clean them up: http://www.tsd.net.au/blog/cleaning-csv-data-awk – nullability Apr 25 '13 at 20:11
  • Hi nullability, The project above has identical CSV files and could use plain concatenation. Project #2, after succeeding w/ this project, was going to be to one that processes CSV files with differences in need of cleaning before concatenation. I'm familiar with SQL and your response is my first exposure to an alternative for such processing. Thanks for being so quick. – k.. Apr 25 '13 at 20:59
  • possible duplicate of [Generate Insert SQL statements from a csv file](http://stackoverflow.com/questions/8213/generate-insert-sql-statements-from-a-csv-file) – Paul Sweatte Aug 16 '13 at 16:54
  • I think you need an Access macro that asks you which directory of csv files to create linked tables for and then deletes all the existing linked tables and creates new ones for those CSV files. If you think that is a good idea I can probably give you an example. – Jerry Jeremiah Sep 04 '13 at 08:25

2 Answers2

0

You can use SSIS for this purpose

Create a package that scans for files, and imports them to a SQL database

Perform the data transformation you need

Export the transformed data to another CSV file

You can run this package as a manually triggered job, or on a schedule

Akash
  • 1,716
  • 2
  • 23
  • 43
0

Here's an update now that I've learned more about computers that I hope can help people who are:

1) My attempt to answer my own question in extremely simple language: SQL is a "vanilla" concept of what implementations of a family of programming languages ought to be. The most common type of software that includes functionality to understand SQL commands is called a "relational database." A "relational database" is meant to be its own miniature filesystem for your data. Therefore, you would want to copy the data out of the ".CSV" file that lives on your own computer's filesystem and paste it into the filesystem of a "relational database." (And since the two filesystems are so different, the most typical way to do that "pasting" is through SQL commands.)

2) That said, there are pieces of software in the world that use ".CSV" files sitting on your own computer's file system as "their" file system (they don't have one of their own) and can "understand" SQL-like commands. These are nice because ."CSV" files are typically a similar data structure to the abstract notion of "tables" within the "relational databases" that SQL commands are meant to operate against. One such piece of software for Windows is Microsoft Log Parser. (Note, though, that it doesn't understand all "vanilla" SQL commands.)

3) To solve the problem I had in mind when I wrote this question, I put Microsoft Log Parser on our Windows server and wrote Windows shell scripts to execute the SQL-like commands I wanted it to perform on various ".CSV" files that also lived on our Windows server.

Part of my colleague's daily work depends on the data transformation from my original post. I just wrote "double-click '____.cmd' now..." into his instructions. It was a kludge, but it worked!

4) SQL is not the only set of programming commands that is good for working on table-like data. For example, Python has a "csv" package. It's less like SQL than Log Parser's commands are, but it can do some tasks Log Parser can't, such as right-padding field contents.

Python has a "Portable Python" product that you can put on a Windows filesystem (be it your PC or some Windows server space) that allows you to write Python programs and run them against ".CSV" files on that Windows filesystem. This installation-free environment (like LogParser) can help you "kludge" data-processing tasks against ".CSV" files. ("Kludges" are fun to write and often useful short-term but not recommended for projects where security, maintainability, scale, etc. are issues - you've been warned!)

k..
  • 401
  • 3
  • 11