3

I have a system that reads input data from a Serial/UDP/TCP source. The input data is simply a CSV of different datattypes (e.g. DateTime, double, int, string). An example string might be:

2012/03/23 12:00:00,1.000,23,information,1.234

I currently have (untested) code that allows the user to specify which value in the CSV list goes to which property of a POCO.

So in the above example, i would have a object like so:

public class InputData
{
 public DateTime Timestamp{get;set;}
 public double Distance{get;set;}
 public int Metres{get;set;}
 public string Description{get;set;}
 public double Height{get;set;}
} 

Now in this class, i have a method to parse a CSV string and populate the properties. This method also requires "Mapping" information, as there is no guarantee which order the CSV data will arrive in - it is up to the user to define the correct order.

This is my Mapping class:

//This general class handles mapping CSV to objects
public class CSVMapping
{
    //A dictionary holding Property Names (Key) and CSV indexes (Value)
    //0 Based index
    public IDictionary<string, int> Mapping { get; set; }
}

Now my method ParseCSV():

//use reflection to parse the CSV survey input
public bool ParseCSV(string pCSV, CSVMapping pMapping)
{
    if (pMapping == null) return false;
    else
    {
        Type t = this.GetType();
        IList<PropertyInfo> properties = t.GetProperties();
        //Split the CSV values
        string[] values = pCSV.Split(new char[1] { ',' });
        //for each property set its value from the CSV
        foreach (PropertyInfo prop in properties)
        {
            if (pMapping.Mapping.Keys.Contains(prop.Name))
            {
                if (prop.GetType() == typeof(DateTime))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        DateTime tmp;
                        DateTime.TryParse(values[pMapping.Mapping[prop.Name]], out tmp);
                        prop.SetValue(this, tmp, null);
                    }
                }
                else if (prop.GetType() == typeof(short))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        double tmp;
                        double.TryParse(values[pMapping.Mapping[prop.Name]], out tmp);
                        prop.SetValue(this, Convert.ToInt16(tmp), null);
                    }
                }
                else if (prop.GetType() == typeof(double))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        double tmp;
                        double.TryParse(values[pMapping.Mapping[prop.Name]], out tmp);
                        prop.SetValue(this, tmp, null);
                    }
                }
                else if (prop.GetType() == typeof(string))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        prop.SetValue(this, values[pMapping.Mapping[prop.Name]], null);
                    }
                }
            }
        }
        return true;
    }
}

Now for my question:

I potentially have several classes that will require this functionality. Would it be beneficial to implement a generic class or an extension class to do the parsing for me? Is my method a sound way to parse CSV data and popluate my object - or is there a better way to do this?

I have read other questions on dynamically parsing CSV, but all deal with the order being known before runtime, whereas i require the user to define the order.

Simon
  • 9,197
  • 13
  • 72
  • 115
  • There is a discussion about this in the following post http://stackoverflow.com/questions/1941392/are-there-any-csv-readers-writer-libraries-in-c – TGH Mar 23 '12 at 04:52
  • The TextFieldParser class would make my code a little more robust, thanks. However i am more interested in the mapping side presented which is not really addressed in that question. – Simon Mar 23 '12 at 05:01
  • Just a quick question. How is it you don't know the order? – Dominic Zukiewicz Mar 26 '12 at 18:50
  • I know what data will be coming down, but depending which client and the way they send us data, determines the order. ClientA might send date first, ClientB might send Distance. This is why the user of my app needs to specify the mapping through a dialog. – Simon Mar 27 '12 at 07:04

2 Answers2

3

OleDB is great at parsing CSV data and you don't have to use reflection for it. Here's the main idea for mapping with OleDB classes:

  1. User defines a mapping (using delegate, fluent interface or something) and it gets into the Dictionary in your Mapper class.
  2. Parser creates a DataTable and inserts columns from mapper
  3. Parser creates OleDbConnection, Adapter, Command and fills dataTable from CSV file in correct types.
  4. Parser extracts IDataRecords from DataTable and your Mapper needs to map from IDataRecord to objects. For guidance on record-to-object mapping I'd recommend reading sources of ORM mappers like Dapper.NET, Massive, PetaPoco.

OleDb CSV parsing: Load csv into oleDB and force all inferred datatypes to string

UPDATE

Since there's only one string, it goes without saying that using easiest and simplest approach is better. So, for the questions:

Implement generic class - if there's no need to further advance parsing (no more string, no more constraints/features in the future), I'd go for a static class that takes object, string and mapping information. It'd have almost the same look as yours does right now. Here's somewhat modified version (may not compile, but should reflect the general idea):

public static class CSVParser
{
    public static void FillPOCO(object poco, string csvData, CSVMapping mapping)
    {
        PropertyInfo[] relevantProperties = poco.GetType().GetProperties().Where(x => mapping.Mapping.Keys.Contains(x)).ToArray();
        string[] dataStrings = csvData.Split(',');
        foreach (PropertyInfo property in relevantProperties)
            SetPropertyValue(poco, property, dataStrings[mapping.Mapping[property.Name]]);
    }

    private static void SetPropertyValue(object poco, PropertyInfo property, string value)
    {
        // .. here goes code to change type to the necessary one ..
        property.SetValue(poco, value);
    }
}

Regarding the string to typed value conversion - there's Convert.ChangeType method that handles most of the cases. There's particular problem with boolean variables (when it's given "0" instead of "false") though.

As for data population - though reflection is said to be slow, for single objects and seldom usages it should suffice since it's easy and simple. Usual methods for dealing with problem of poco population are: run-time conversion method creation (that uses reflection to be initialized and then is compiled and called like any other method) - usually implemented using DynamicMethod, Expression Trees and similar - there's plenty of topic here on so; usage of dynamic objects (available since C# 4.0) - where to assign/get variable you don't need to declare it; use available libraries on the market (usually from the ORM systems, since they rely heavily on data-to-object conversion).

Personally, I'd measure if reflection is suitable for my performance needs and would progress forward pass the problem.

Community
  • 1
  • 1
Dmitry Reznik
  • 6,812
  • 2
  • 32
  • 27
  • Thanks for the information. Unfortunately i think this is a bit overkill. I only have 1 line of CSV data to parse (its not a text file). Going from CSV - DataTable - POCO is 1 extra step in my opinion, unless someone can convince me otherwise – Simon Mar 23 '12 at 07:35
  • Sorry, I've somehow missed the point of "one line" thing. Updated the answer. – Dmitry Reznik Mar 23 '12 at 18:42
2

I would 100% agree with @Dimitriy on this one, as I've wrote 5-10 CSV parsers over the past few weeks.

Edit: (Note this requires saving the text to a temporary file using something like Path.GetTempFile(), but it will allow the flexibility you desire)

The argument of using a DataTable would be best as when the connection string is used properly - using Extended Properties='true;FMT=Delimited;HDR=Yes', it will go into a DataTable and the column headings (which would help you in this case) would be preserved.

So you could write a CSV like

Name,Age,City
Dominic,29,London
Bill,20,Seattle

This generates a DataTable with the column headings you have specified. Otherwise, stick to using the ordinals as you had before.

To integrate this, add a constructor (or extension method which I will get to shortly) that when passed a DataRow will strip out the data:

public UserData(DataRow row)
{
     // At this point, the row may be reliable enough for you to
     // attempt to reference by column names. If not, fall back to indexes

    this.Name = Convert.ToString(row.Table.Columns.Contains("Name") ? row["Name"] : row[0]);
    this.Age = Convert.ToInt32(row.Table.Columns.Contains("Age") ? row["Age"] : row[1]);
    this.City = Convert.ToString(row.Table.Columns.Contains("City") ? row["City"] : row[2] );
}

Some would argue that the conversion process is not really the responsibility of the UserData class - since it is a POCO. Instead implement either an extension method in a ConverterExtensions.cs class.

public static class ConverterExtensions
{
     public static void LoadFromDataRow<UserData>(UserData data, DataRow row)
     {
         data.Name = Convert.ToString(row.Table.Columns.Contains("Name") ? row["Name"] : row[0]);
         data.Age = Convert.ToInt32(row.Table.Columns.Contains("Age") ? row["Age"] : row[1]);
         data.City = Convert.ToString(row.Table.Columns.Contains("City") ? row["City"] : row[2] );
     }
}

A more architectually sound method is to implement an interface that defines the conversion. Implement that interface with the conversion process and then store that interface reference internally. This will do the conversion for you, keeping the mapping completely seperate and keep your POCO nice and tidy. It will also allow you to "plug-in" mappers.

public interface ILoadFromDataRow<T>
{
     void LoadFromDataRow<T>(T object, DataRow dr);
}

public class UserLoadFromDataRow : ILoadFromDataRow<UserData>
{
     public void LoadFromDataRow<UserData>(UserData data, DataRow dr)
     {
        data.Name = Convert.ToString(row.Table.Columns.Contains("Name") ? row["Name"] : row[0]);
        data.Age = Convert.ToInt32(row.Table.Columns.Contains("Age") ? row["Age"] : row[1]);
        data.City = Convert.ToString(row.Table.Columns.Contains("City") ? row["City"] : row[2] );
     }
}

public class UserData
{
    private ILoadFromDataRow<UserData> converter;

    public UserData(DataRow dr = null, ILoadFromDataRow<UserData> converter = new LoadFromDataRow<UserData>())
    {
         this.converter = (converter == null ? new LoadFromDataRow<UserData>() : converter);

         if(dr!=null)
             this.converter.LoadFromDataRow(this,dr);
    }

    // POCO as before
}

For your scenario, go for the extension methods. This interface method (called segregation) was the way to implement it before extension methods came about.

Dominic Zukiewicz
  • 8,258
  • 8
  • 43
  • 61