-1

I have a project with around 100k XML configuration files for managing test equipment. In order to make the files easier to update, I want to be able to port them around to Excel, Access, etc. I've written VBA code to do this, but it is painfully slow. C# is orders of magnitude faster and offers more utility. I created XML Schema to read the XML, but now I want to create a set of reusable data objects that manage the reading and writing to various formats and derived objects that specify the data content.

I decided to build a two-tier data structure, but the static constructors don't seem to perform as one would expect, and I need a workaround.

public abstract class Data_Structure {
    private static readonly Dictionary<string, int> _DatIndex = new Dictionary<string, int>();
    private static string _Name = "";
    private string[] _Data = null;
    protected static void Init(string datName, List<string> listProps) {
        int iIndex = 1;
        _Name = datName;
        foreach (string iProp in listProps) {
            _DatIndex.Add(iProp, iIndex);
            iIndex++;
        }
    }
    public Data_Structure() {
        _Data = new string[_DatIndex.Count + 1];
    }
    public static Dictionary<string, int> Get_PropList => _DatIndex;
    public static string Get_Name => _Name; 
    public string Prop_Get(string Prop_Name) {
        return this._Data[_DatIndex[Prop_Name]];
    }
    public void Prop_Set(string Prop_Name, string Prop_Value) {
        this._Data[_DatIndex[Prop_Name]] = Prop_Value);
    }
    // Code to manage input/output
}
public class Data_Item : Data_Structure {
    static Data_Item() {
        List<string> listProps = new List<string>();
        PropertyInfo[] arryProps = typeof(Data_Item).GetProperties();
        foreach (PropertyInfo iProp in arryProps) {
            listProps.Add(iProp.Name);
        }
        Init("Data_Item_Name", listProps);
    }
    public string Property1 {
        get => this.Prop_Get("Property1");
        set => this.Prop_Set("Property1", value);
    }
    // More Properties...
}

The idea is that Data_Structure can be the interface between all the different I/O formats. For each new XML file type, I will create a child instance (i.e. Data_Item) that defines its properties. I want to do it this way because I can use things like BindingList<T>, etc.

When I go to use the code, I try and pull the property list like so:

public void testFunction() {
    Console.WriteLine(Data_Item.Get_PropList.Count); //Outputs 0
    Data_Item tempItem = new Data_Item();
    Console.WriteLine(Data_Item.Get_PropList.Count); //Outputs correct number of properties
}

I'm not too familiar with reflection, but I understand its really slow during execution. In the above code, the intent is to front-load the reflection parts so at runtime (when iterating across 100k files) it executes faster. Also, any suggestions to make the derived classes simpler to define would be much appreciated. Thank you!


Edit for Solution:

I found that if you place a static property in a parent class and you access it, only the parent's static constructor is called. I found a workaround Here plus put in a bit of code from the chosen answer:

public abstract class Data_Item<TChild> where TChild : Data_Item<TChild> {
    private static Data_Schema _CurSchema;
    static Data_Item() {
        _CurSchema = Data_Schema_Libarary.Get(typeof(TChild));
    }
}
public class Data_Item_Instance : Data_Item<Data_Item_Instance> {
    [Custom_Property(Attributes1)]
    public string Property1 {get; set;}
    [Custom_Property(Attributes2)]
    public string Property2 {get; set;}
}
public static class Data_Schema_Library {
    private static Dictionary<string, Data_Schema> _SchemaLibrary;
    public static Data_Schema Get<T>() {
        //Read T with Reflection and load a Data_Schema into/from _SchemaLibrary
    }
}
public class Data_Schema {
    public Custom_Property Attribute {get; set;}
}

I think it is still slow, but it has helped. Hopefully it helps someone else too.

  • 1
    "but the static constructors don't seem to perform as one would expect" - can you clarify what you expect and what exactly you need a workaround for? Ideally with an [mre]. – Alexei Levenkov Aug 18 '23 at 16:20
  • I edited the post to fix a typo in the constructor name. I expected that when calling a static member of the derived class, the static constructor would be called, and initialize the dictionary. I played with some other test code, and it only calls the static constructor of the abstract parent class. – Manfred ad. Rickenbocker Aug 18 '23 at 16:27

1 Answers1

0

"but the static constructors don't seem to perform as one would expect, and I need a workaround."

The way you're using static variables is an anti-pattern. Static variables are not private in any way--they are global variables and should be treated as such. It's like putting a variable in your Main() function with each of those static values--which does not make sense for your use case.

public abstract class Data_Structure {
    private static string _Name = "";
    protected static void Init(string datName) {
        _Name = datName;
    }
}
public class Data_Item : Data_Structure {
    static Data_Item() {
        Init("ChildClass");
    }
}
//  Program.cs
Assert(Data_Structure._Name == "");
new Data_Item();
Assert(Data_Structure._Name == "ChildClass");

You are seeing this in one of your examples and you seem to be confused about it. This is because you aren't accessing the child class's value, you're accessing the parent class's value.

The idea is that Data_Structure can be the interface between all the different I/O formats

You can do this, but you do not want to do it with static members. Ideally, everything in C# is an instance of a class, not a static variable, as this helps draw boundaries of responsibility.

What I recommend is to create a schema object that handles the string-to-index mappings for each of your data types. You can then create exactly one schema instance per type, and use that as the source of truth for all mappings for that type. Here's a minimal example:

public interface IDataStructureSchema
{
    Dictionary<string, int> Items { get; }

    int Index(string index);
}

This schema class does everything you want your schema to do--it indexes data and it provides your mapping of string to underlying int types. All you have to do is create a brand new class that implements this interface and you can then do whatever you want on the backend.

Now, to create a data item type that uses this schema, we just add it as a field and use it's mappings for all lookups:

public class BaseItem
{
    public BaseItem(IDataStructureSchema schema, string[] data)
    {
        _schema = schema;
        _data = data;
    }

    protected IDataStructureSchema _schema;
    protected string[] _data;

    protected string Get(string index)
    {
        return _data[_schema.Index(index)]
    }

    protected void Set(string index, string value)
    {
        _data[_schema.Index(index)] = value;
    }
}

public class DataItem : BaseItem
{
    public BaseItem(IDataStructureSchema schema, string[] data)
        : base(schema, data)
    {
    }

    public string Property1 {
        get => this.Get("Property1");
        set => this.Set("Property1", value);
    }
}

Now all you need is a little bit of code that generates your schema for each item type. You've already got a fairly good base, so here's a snippet to get you started:

public static class SchemaCompiler
{
    private Dictionary<Type, IDataStructureSchema> _cache = new Dictionary<Type, IDataStructureSchema>();
    public static IDataStructureSchema Create(Type dataItem)
    {
        if (_cache.TryGetValue(dataItem, out var cachedSchema)
            return cachedSchema;

        //  There's no cached schema!
        //  Make one here using reflection
        //  and then add it to _cache when you're done.
    }
}

Why do it that way?

Each item in the provided example has explicit boundaries on it's responsibility. In your provided example, one class was responsible for everything. Now, we have three that can each operate independently: One class to access the data, one class to define how the data maps from a string to an int, and one that creates the mapping from a string to an int.

I'm not too familiar with reflection, but I understand its really slow during execution.

Until you have profiled your code and found reflection to be the culprit, you should not consider reflection optimization. Your XML parser will probably take most of the load, as it will have to do far more work than the already heavily tuned built-in reflection methods.

Even if reflection is 100x slower than compiled C#, it's still probably 2x slower than your XML/config file parser. Look there first--then look to reflection. (Warning: That is a general statement and not always true. Profile your code first!)

If you're absolutely certain you need something faster than reflection, then look into expression trees, which allow you to JIT-compile C# code easily for a specific task. This should be a last resort in your case.

Mooshua
  • 526
  • 2
  • 16
  • Thank you for this answer! It has given me a lot to think about in how I structure my code and how C# behaves. I also found another bit of help with this code as well: https://stackoverflow.com/a/5012880/22405894 I will mock up a solution, and if it works this will be my accepted answer. – Manfred ad. Rickenbocker Aug 21 '23 at 15:36