6

I'm parsing a CSV file and placing the data in a struct. I'm using the TextFieldParser from this question and it's working like a charm except that it returns a String[]. Currently I have the ugly process of:

String[] row = parser.ReadFields();
DispatchCall call = new DispatchCall();
if (!int.TryParse(row[0], out call.AccountID)) {
    Console.WriteLine("Invalid Row: " + parser.LineNumber);
    continue;
}
call.WorkOrder = row[1];
call.Description = row[2];
call.Date = row[3];
call.RequestedDate = row[4];
call.EstStartDate = row[5];
call.CustomerID = row[6];
call.CustomerName = row[7];
call.Caller = row[8];
call.EquipmentID = row[9];
call.Item = row[10];
call.TerritoryDesc = row[11];
call.Technician = row[12];
call.BillCode = row[13];
call.CallType = row[14];
call.Priority = row[15];
call.Status = row[16];
call.Comment = row[17];
call.Street = row[18];
call.City = row[19];
call.State = row[20];
call.Zip = row[21];
call.EquipRemarks = row[22];
call.Contact = row[23];
call.ContactPhone = row[24];
call.Lat = row[25];
call.Lon = row[26];
call.FlagColor = row[27];
call.TextColor = row[28];
call.MarkerName = row[29];

The struct consists of all those fields being Strings except for AccountID being an int. It annoys me that they're not strongly typed, but let's over look that for now. Given that parser.ReadFields() returns a String[] is there a more efficient way to fill a struct (possibly converting some values such as row[0] needing to become an int) with the values in the array?

**EDIT:**One restriction I forgot to mention that may impact what kind of solutions will work is that this struct is [Serializable] and will be sent Tcp somewhere else.

Community
  • 1
  • 1
Corey Ogburn
  • 24,072
  • 31
  • 113
  • 188
  • Reflection would definitely be less efficient, I would just live with it as is – RobJohnson Aug 17 '12 at 16:01
  • CsvHelper might be very helpful to you https://github.com/JoshClose/CsvHelper/wiki/Basics – KeesDijk Aug 17 '12 at 16:01
  • 2
    If by "more efficient" you are referring to **speed**, then there's little to be gained here. It's already almost as fast as it can possibly be. If you mean **"lines of code"** then any gain is small, since this is already only 30 LOC (unless you have dozens more such classes). If you mean **maintainability** then reflection, as mentioned by others, might offer an improvement. – Roman Starkov Aug 17 '12 at 16:10
  • Is there no header row in the file and shouldn't that variable be called `columns` not `row`? – Jodrell Aug 17 '12 at 16:11

6 Answers6

7

Your mileage may vary on whether it is a better solution, but you could use reflection and define an Attribute class that you use to mark your struct members with. The attribute would take the array index as an argument. Assigning the value from the right array element would then happen by using reflection.

You could define your attribute like this:

[AttributeUsage(AttributeTargets.Property)]
public sealed class ArrayStructFieldAttribute : Attribute
{
    public ArrayStructFieldAttribute(int index)
    {
        this.index = index;
    }

    private readonly int index;

    public int Index {
        get {
            return index;
        }
    }
}

This means the attribute can simply be used to associate an int value named Index with a property.

Then, you could mark your properties in the struct with that attribute (just some exemplary lines):

[ArrayStructField(1)]
public string WorkOrder { // ...

[ArrayStructField(19)]
public string City { // ...

The values could then be set with the Type object for your struct type (you can obtain it with the typeof operator):

foreach (PropertyInfo prop in structType.GetProperties()) {
    ArrayStructFieldAttribute attr = prop.GetCustomAttributes(typeof(ArrayStructFieldAttribute), false).Cast<ArrayStructFieldAttribute>().FirstOrDefault();
    if (attr != null) {
         // we have found a property that you want to load from an array element!
        if (prop.PropertyType == typeof(string)) {
            // the property is a string property, no conversion required
            prop.SetValue(boxedStruct, row[attr.Index]);
        } else if (prop.PropertyType == typeof(int)) {
            // the property is an int property, conversion required
            int value;
            if (!int.TryParse(row[attr.Index], out value)) {
                Console.WriteLine("Invalid Row: " + parser.LineNumber);
            } else {
                prop.SetValue(boxedStruct, value);
            }
        }
    }
}

This code iterates over all properties of your struct type. For each property, it checks for our custom attribute type defined above. If such an attribute is present, and if the property type is string or int, the value is copied from the respective array index.

I am checking for string and int properties as that's the two data types you mentioned in your question. even though you have only one particular index that contains an int value now, it's good for maintainability if this code is prepared to handle any index as a string or an int property.

Note that for a greater number of types to handle, I'd suggest not using a chain of if and else if, but rather a Dictionary<Type, Func<string, object>> that maps property types to conversion functions.

O. R. Mapper
  • 20,083
  • 9
  • 69
  • 114
1

If you want to create something very flexible you can mark each property on DispatchCall using a custom attribute. Something like this:

class DispatchCall {

  [CsvColumn(0)]
  public Int32 AccountId { get; set; }

  [CsvColumn(1)]
  public String WorkOrder { get; set; }

  [CsvColumn(3, Format = "yyyy-MM-dd")]
  public DateTime Date { get; set; }

}

This allows you to associate each property with a column. For each row you can then iterate over all properties and by using the attribute you can assign the right value to the right property. You will have to do some type conversion from string to numbers, dates and perhaps enums. You can add extra properties to the attribute to assist you in that process. In the example I invented Format which should be used when a DateTime is parsed:

Object ParseValue(String value, TargetType targetType, String format) {
  if (targetType == typeof(String))
    return value;
  if (targetType == typeof(Int32))
    return Int32.Parse(value);
  if (targetType == typeof(DateTime))
   DateTime.ParseExact(value, format, CultureInfo.InvariantCulture);
  ...
}

Using TryParse methods in the above code can improve the error handling by allowing you to provide more context when an unparsable value is encountered.

Unfortunately, this approach is not very efficient because the reflection code will be executed for each row in your input file. If you want to make this more efficient you need to dynamically create a compiled method by reflecting once over DispatchCall that you then can apply on each row. It is possible but not particular easy.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
1

How dependent are you on the library that you're using? I've found File Helpers to be quite useful for this sort of thing. Your code would look something like:

using FileHelpers;

// ...

[DelimitedRecord(",")]
class DispatchCall {
    // Just make sure these are in order
    public int AccountID { get; set; }
    public string WorkOrder { get; set; }
    public string Description { get; set; }
    // ...
}

// And then to call the code
var engine = new FileHelperEngine(typeof(DispatchCall));
engine.Options.IgnoreFirstLines = 1; // If you have a header row
DispatchCall[] data = engine.ReadFile(FileName) as DispatchCall[];

You now have a DispatchCall array, and the engine did all the heavy lifting for you.

merthsoft
  • 878
  • 1
  • 7
  • 10
0

Use reflection as @Grozz suggested in the comment. Mark each property of the struct class with an attribute (ie [ColumnOrdinal] ) and then use this to map the information with the proper column. If you have double, decimal and so on as a target, you should also consider using Convert.ChangeType to proper convert in the target type. if you are not happy with the performances, you can enjoy create a DynamicMethod on the fly, more challenging, but really performant and beautiful. The challenge is to write the IL instruction in memory to do the "plumbing" you did by hand ( I usually create some example code, and then look inside it with IL spy as a starting point ). of course you will cache somewhere such dynamic methods so creating them is requested just once.

Felice Pollano
  • 32,832
  • 9
  • 75
  • 115
0

The first thing that comes to mind is to use reflection to iterate over the properties and match them up to the elements in the string[] based on an attribute value.

public struct DispatchCall
{
  [MyAttribute(CsvIndex = 1)]
  public string WorkOrder { get; set; }
}

MyAttribute would just be a custom attribute with an index that would match up to the field position in the CSV.

var row = parser.ReadFields(); 

    for each property that has MyAttribute...
      var indexAttrib = MyAttribute attached to property
      property.Value = row[indexAttrib.Index]
    next

(Pseudocode, obviously)

or

[StructLayout(LayoutKind.Sequential)] // keep fields in order
public strict DispatchCall
{
  public string WorkOrder;
  public string Description;  
}

StructLayout will keep the struct fields in order, so you can iterate over them without having to explicitly specify a column number for each field. That can save some maintenance if you have a lot of fields.

Or, you could skip the process entirely, and store the field names in a dictionary:

var index = new Dictionary<int, string>();

/// populate index with row index : field name values, preferable from some sort of config file or database
index[0] = "WorkOrder";
index[1] = "Description";
...

var values = new Dictionary<string,object>();

for(var i=0;i<row.Length;i++) 
{
  values.Add(index[i],row[i]);
}

That's easier to load, but doesn't really take advantage of strong typing, which makes this less than ideal.

You could also generate a dynamic method or a T4 template. You could generate code from a mapping file in the format

0,WorkOrder
1,Description
...

load that, and generate a method that looks like this:

  /// emit this
  call.WorkOrder = row[0];
  call.Description = row[1];

etc.

That approach is used in a few micro-ORMs floating around and seems to work pretty well.

Ideally, your CSV would include a row with field names that would make this a lot easier.

OR, yet another approach, use StructLayout along with a dynamic method to avoid having to keep a field:column_index mapping aside from the struct itself.

OR, create an enum

public enum FieldIndex
{
WorkOrder=0
,
Description // only have to specify explicit value for the first item in the enum
, /// ....
,
MAX /// useful for getting the maximum enum integer value
}

for(var i=0;i<FieldIndex.MAX;i++)
{
  var fieldName = ((FieldIndex)i).ToString(); /// get string enum name
  var value = row[i];

  // use reflection to find the property/field FIELDNAME, and set it's value to VALUE.
}
3Dave
  • 28,657
  • 18
  • 88
  • 151
0

if you are going for speed you could a brittle switch statement.

var columns = parser.ReadFields();

for (var i = 0; i < columns.Length; i++)
{
    SetValue(call, i, columns[i]);
}

private static void SetValue(DispatchCall call, int column, string value)
{
    switch column
    {
        case 0:
            SetValue(ref call.AccountId, (value) => int.Parse, value);
            return;

        case 1:
            SetValue(ref call.WorkOrder, (value) => value, value);
            return;

        ...

        default:
            throw new UnexpectedColumnException();
    }      
}

private static void SetValue<T>(
    ref T property,
    Func<string, T> setter
    value string)
{
    property = setter(value);
}

Its a shame that TextFieldParser does not allow you to read one field at a time, then you could avoid building and indexing the columns array.

Jodrell
  • 34,946
  • 5
  • 87
  • 124