1

In a project I'm working on, I need to read from a CSV file, update a field on each row, and then save the results back to the CSV file. I'm looking for a library that will help me with this.

My first attempt was to use ADO. This worked like a charm for reading, but when I attempted to update the file I received the error "Updating data in a linked table is not supported by this ISAM."

So now I'm looking for a replacement (or workaround). These are my requirements:

  1. I would rather not define every column in the file. I only need two columns, and am concerned that additional columns may be added at a future date.

  2. I need to be able to preserve (or, at the very least, replicate) the column heading information.

  3. I would prefer to have as little knowledge of the underlying format/file as possible (i.e. I don't want to write a CSV writer from scratch).

I've run across a number of alternative readers, and a couple of writers... But the writers all involve reading the CSV file into a pre-defined set of fields, and then writing only those fields back to the new file. I want to minimize the amount of information about the column structure hard coded into my program.

AaronSieb
  • 8,106
  • 8
  • 39
  • 58
  • I'm not sure of what you're looking for. Is it perhaps a CSV writer that only "locks" the rows it is currently updating instead of the whole file? – LazyOfT May 04 '11 at 22:11
  • @Brizio I need a method of writing CSV files that requires as little of the CSV's structure to be specified at the source code level as possible. – AaronSieb May 04 '11 at 22:30

4 Answers4

2

The way modern file systems work you can only update any file in place if the new data is the exact same size as the original. Otherwise you must re-write the entire file from scratch. If you can meet this constraint, you can do it with low-level file streams. I don't know of a csv package that supports this off the top of my head, but the reason for this is that csv is simple enough you can do it on your own.

That said, if you are updating every row anyway then re-writing the file probably isn't that big of a deal. Writing a csv record is dead simple. Observe the following C# code:

public WriteRecord(IEnumerable items, TextWriter outputStream))
{
    string delimiter = "";
    foreach(var item in items)
    {
        outputStream.Write(delimiter);
        outputStream.Write("\"");
        outputStream.Write(item.ToString().Replace("\"", "\"\""));
        outputStream.Write("\"");
        delimiter = ",";
    }
    outputStream.Write(Environment.Newline);
}

Of course, if you have complex types that you want to be more picky about, that's fine, but since you don't want to constrain yourself to specific future column arrangements this code should be just fine. Additionally, it will complement my own CSV parser listed here on Stack Overflow, which does not require advance knowledge of the columns in the file. You could do something like this:

var tempPath = @"Some-temp-file-path.csv";
var srcPath = @"input-file-path.csv";
using (var outFile = new StreamWriter(tempPath))
{
    foreach (var items in CSV.FromFile(srcPath))
    {
        items[someInt] = "new value";
        items[otherInt] = "other value";
        WriteRecord(items, outFile);
    }

}
File.Copy(tempPath, srcPath);
Community
  • 1
  • 1
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Sorry about the ambiguous wording. This is essentially what I want. My main concern is dealing with corner cases in the output. The CSV file has things like this: "4-1/2"" Decal, cool",P,MT555,23... So items (in your first example) would probably be populated as `4-1/2" Decal, cool`, `P`, `MT555`, `23`. I don't want it written to the file as: 4-1/2"" Decal, cool,P,MT555,23 (that's broken). Is quoting every field valid? Or does that lose information on the integers? – AaronSieb May 04 '11 at 22:26
  • @Aaron quoting every field is perfectly valid and will not break integers unless you use a broken parser elsewhere. Doubling up quotes is the correct way to escape them, but you could easily alter my code to backslash-escape them instead. You also only need to worry about escaping quotes if there's an internal comma in the field or if the field starts with a quote. You could certainly build that logic into the function. – Joel Coehoorn May 04 '11 at 22:31
  • Also, I had neglected the enclosing quotes in my original answer. This is now corrected. – Joel Coehoorn May 04 '11 at 22:34
  • Adding the wrapping quotes in makes this much clearer to me. Thanks. – AaronSieb May 04 '11 at 22:53
1

A .csv file is a flat file, and as far as I know you cannot update a file in place unless you have something way out of my past, like an indexed file system.

Suggest reading your .csv file into your program.

Store it into a database like SQL Lite or enough heap memory to hold the length of the file and some additional space to hold changes.

Make changes.

Write file out.

I've left out lots of details as to how you would do the update. Hopefully someone with lots of file experience can correct my notion of what you should do. It may be that some Microsoft database object libraries can do this, but I am not familiar with them.

octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
  • Hopefully I've straightened up the wording a bit. I'm okay with rewriting the entire file "under the covers" but I don't want my program to need to define the structure for any more of the CSV than absolutely necessary. So your skeleton is exactly what I want... But I need more detail on the read and write steps to cover how to do it without losing information. – AaronSieb May 04 '11 at 22:29
1

If you are using C#4.0 there are some nice extensions in the File class that can help you rewrite your csv file even if you couldn't lock a single row. You should take a look at File.ReadLines and File.WriteAllLines. They both take an IEnumerable as a parameter, so you could perform your conversion on a per-line basis. Even though this doesn't mean that you are not locking your file, is certainly less memory intensive than having the whole file in memory.

var lines = File.ReadLines(oldFile);
File.WriteAllLines(newFile, ChangeCSV(lines));

public IEnumerable<string> ChangeCSV(IEnumerable<string> csvLines)
{
  foreach(var line in csvLines)
  {    
    var convertedLine = ... // Do your conversion here for a single line
    yield return convertedLine;
  }
}

EDIT: if you are looking for a quick way of parsing your csv line there is this regex expression that can do the trick for you.

LazyOfT
  • 1,428
  • 7
  • 20
  • It's not so much the per-line writing that I'm looking for, as avoiding the risk of breaking the fields I don't explicitly touch through corner-cases and the like. – AaronSieb May 04 '11 at 22:55
  • Have you given a look at the regex I pointed you at? With the example that you provided worked pretty well and didn't break anything. – LazyOfT May 04 '11 at 22:57
  • I'm having more problems WRITING the file than reading it. Reading concerns are mostly just a function of enabling writing. – AaronSieb May 05 '11 at 01:14
0

CSV files are untyped streams of characters so it is possible to replace individual characters, but you cannot add or remove characters without rewriting the entire file.

I strongly suggest, from personal experience, to instead create a simple CSV parser as @Joel suggests and recreate the entire file for each update. Trying to update data in-place can easily corrupt your entire file irrecoverably if there is an error.

Follow this procedure:

  1. Read the file field-by-field.
  2. If it's a field you don't care about, add it to a temp file.
  3. If it's a field you do care about, write the new value to the temp file.
  4. If you got an error, delete the temp file.
  5. If there were no errors, exchange the temp and original files and delete the old file.
Dour High Arch
  • 21,513
  • 29
  • 75
  • 90