0

We have a few XML files that are being read by our applications. The XML format is fixed, and thus we can read them very easily with XmlSerializer.

I use this code to read the XML files and convert them to classes:

public static T FromXml<T>(this string xml) where T : class
{
    if (string.IsNullOrEmpty(xml))
    {
        return default(T);
    }

    XmlSerializer xmlserializer = new XmlSerializer(typeof(T));

    XmlTextReader textReader = new XmlTextReader(new StringReader(xml));
    textReader.Normalization = false;

    XmlReaderSettings settings = new XmlReaderSettings();

    T value;

    using (XmlReader reader = XmlReader.Create(textReader, settings))
    {
        value = (T)xmlserializer.Deserialize(reader);
    }

    return value;
}

However, there are some performance issues. When calling this code for the first time a specific type for T is used, the XmlSerializer generates a Project.XmlSerializer.dll file.

This is fine, but costs some precious milliseconds (about 900ms in my case). This can be circumvented by generating that assembly on forehand, using the XML Serializer Generator (sgen). This brings down the time to about half. Primarily due to the reading and reflection of the assembly.

I want to optimize this further, by bringing the XmlSerializer classes inside the assembly the actual classes are in, but I can't find a way to let XmlSerializer know not to read an external assembly, but use the serializer from the current assembly.

Any thoughts how to do this or an alternative way to make this work? (I can't pre-load them since most of the serialized classes are used at start-up)


The analysis using ANTS Profiler (metrics from other machine, but same pattern):

enter image description here

Plain. Most of the time (300ms + 400ms = 700ms) is lost in generating and loading the XmlSerializer assembly.

enter image description here

With sgen generated assembly. Most of the time (336ms) is lost in loading the XmlSerializer assembly.

enter image description here

When including the actual source of the assembly inside the project, and calling the serializer directly, the action goes down to 456ms (was 1s in first, 556ms in second).

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
  • Not related to the question: You don't need `reader.Close();` as using statement does that for you via `Dispose`. – Sriram Sakthivel Aug 11 '14 at 08:12
  • I can't understand what you mean in last paragraph starting *I want to optimize this further, by bringing the XmlSerializer classes...* Can you make it bit clear? I'll give my try then :) – Sriram Sakthivel Aug 11 '14 at 15:16
  • If loading the dll is the only problem you feel. You can just open the file and close it immediately(in another thread as soon as the app starts) (If you know the path). OS will cache the file in cache, so that next attempt to read the file(by clr) will be fast. Forget if I said something dumb without understanding you :( – Sriram Sakthivel Aug 11 '14 at 15:21
  • IS a single execution the normal way your app will operate? If generating/loading assemblies takes a long time and your app is using the FromXml many times you should do a more realistic performance test where you'd deserialize various data to see the *actual* deserialization performance. – Sten Petrov Aug 13 '14 at 13:15
  • UserSettings are read on start-up and saved when closing. There are some other xml files that usually have the same pattern. (Btw, the second time it is fast indeed, but I want to minimize start-up time) – Patrick Hofman Aug 13 '14 at 13:17
  • Just for giggles: can you attach an event listener to AppDomain's `FirstChanceException` event and count the times it's called while deserializing? I had an issue like that a while back – Sten Petrov Aug 13 '14 at 13:18
  • @StenPetrov: Just tried. Not getting called. – Patrick Hofman Aug 13 '14 at 13:19
  • can you share a sample xml that's reasonably real? – Sten Petrov Aug 13 '14 at 13:20
  • @StenPetrov: Check http://pastebin.com/d67nch3R. – Patrick Hofman Aug 13 '14 at 13:26
  • Try removing the namespace declarations – Sten Petrov Aug 13 '14 at 13:36
  • @StenPetrov: Won't help. It's auto-generated and it isn't the root cause of the problem (analyzed that already). – Patrick Hofman Aug 13 '14 at 13:39

2 Answers2

2

Unless you are doing the serialization at the very app startup, one way would be to force CLR to load and even compile whatever classes you're using ahead of time, possibly in a thread which would run in background as soon as you've started your app.

Something like, for example:

foreach (Assembly a in assembliesThatShouldBeCompileed)
    foreach (Type type in a.GetTypes())
        if (!type.IsAbstract && type.IsClass)
        {
            foreach (MethodInfo method in type.GetMethods(
                                BindingFlags.DeclaredOnly |
                                BindingFlags.NonPublic |
                                BindingFlags.Public |
                                BindingFlags.Instance |
                                BindingFlags.Static))
            {
                if (method.ContainsGenericParameters || 
                    method.IsGenericMethod || 
                    method.IsGenericMethodDefinition)
                    continue;

                if ((method.Attributes & MethodAttributes.PinvokeImpl) > 0)
                    continue;

                System.Runtime.CompilerServices
                   .RuntimeHelpers.PrepareMethod(method.MethodHandle);
            }
        }

It's strange, however, that your profiling seems to indicate that there is not much difference if the SGEN'd code is in a separate assembly, while loading seems to be the bottleneck. I wonder how the graph looks like for the case where they are in the same assembly?

vgru
  • 49,838
  • 16
  • 120
  • 201
  • Thanks for your answer. This would indeed work if I wouldn't need them at start-up, but I do... :( – Patrick Hofman Aug 13 '14 at 11:54
  • @Patrick: one detail, how does the profiler chart look for the "1.7s" case? – vgru Aug 13 '14 at 11:56
  • I updated the question. The strange this is that most of the time then is in reading the config file, not creating the serializer itself. – Patrick Hofman Aug 13 '14 at 12:03
  • Actually, `XmlSerializer1.CreateReader()` shouldn't actually load the the file at all (it only creates an instance of `XmlSerializationReader1` in the serialization assembly), so the overhead in that case is due to something else. What happens in these two branches below that call? – vgru Aug 13 '14 at 12:29
  • It reads the app.config file. Still digging in the reference source to see why. – Patrick Hofman Aug 13 '14 at 12:30
  • 1
    Found the actual place: it tries to read the [``](http://msdn.microsoft.com/en-us/library/ms229756(v=vs.110).aspx) section. – Patrick Hofman Aug 13 '14 at 12:45
  • Redid the profiling. Added longer trace on last one. – Patrick Hofman Aug 13 '14 at 13:12
  • @Patrick: so, the bottleneck boils down to `ConfigurationManager.GetSection`, which initializes the config system and then reads app.config. I am not sure if that can be efficiently hacked. By looking at ILSpy it seems to be accessing registry and doing various config stuff depending on your machine (I am not sure which exact conditions happen in your case). On the other hand, is it possible to simply switch to a different settings file format (e.g. protobuf)? – vgru Aug 13 '14 at 13:53
  • Yes. I didn't notice before, but the two slower options read config too, but using a different route (not sure why). That is 270ms to gain. Using different serializers is possible, but for this problem I had like to focus on XML (also for future reference). – Patrick Hofman Aug 13 '14 at 14:01
  • Is your `app.config` large? Does it potentially use some sections which would cause cfg manager to have to load other assemblies? I.e. the `` part? – vgru Aug 13 '14 at 14:20
  • 1
    Groo, I totally messed up. Please see my last comment under the other answer. You have been very helpful, just like the other guy. I don't see the need to spend more time on this (from both sides). Then there is left, the bounty. Since you have quite some rep, I want to suggest to accept your answer, but award the bounty to the other guy. Is that fair to you? – Patrick Hofman Aug 13 '14 at 14:40
  • @Patrick: sure, seems fair, I am glad to hear that the thing works after all. – vgru Aug 13 '14 at 16:38
  • Thanks. Will do so. Have to wait a few hours. Thanks for your help. – Patrick Hofman Aug 13 '14 at 16:39
2

Note: OP posted a sample config: http://pastebin.com/d67nch3R

Based on the sample config and the type of issue you're experiencing there are a couple brute-force ways, pretty much guaranteed to do the trick, both boiling down to abandoning the XML serializer altogether

Route #1

Abandon XML serialization and use XDocument to get data out of the XML.

Route #2

Use json and Newtonsoft Json to store and load configs. It should perform a lot better than XML Serializer

The sample json counterpart would look like this:

{
  "Connections": {
    "-default": "Local\\SqlServer",
    "-forcedefault": "false",
    "group": {
      "-name": "Local",
      "connection": {
        "-name": "SqlServer",
        "database": {
          "-provider": "SqlServer",
          "-connectionString": "blah"
        }
      }
    }
  },
  "LastLanguage": "en",
  "UserName": "un",
  "SavePassword": "true",
  "AutoConnect": "false",
  "Password": "someObfuscatedHashedPassword==",
  "ConnectionName": "Somewhere\\Database",
  "LastAvailableBandwidth": "0",
  "LastAvailableLatency": "0",
  "DateLastConnectionSuccesful": "2014-08-13T15:21:35.9663654+02:00"
}

And load it:

UserSettings settings = JsonConvert.DeserializeObject<UserSettings>(File.ReadAllText("settings.json"))
Sten Petrov
  • 10,943
  • 1
  • 41
  • 61