0

I have this function in my C# app:

public static string SafeTrim(object str) 
{
    if ( str == null || str == DBNull.Value )
      return string.Empty;
    else
      return str.ToString().Trim();
}

It works fine, but in my import utility it is called millions of times while processing hundreds of thousands of records. The ANTS profiler stated that this function consumes a lot of CPU cycles because it is called so often.

Edit 1

I neglected to mention that a very common usage of SafeTrim() in my app is for DataRow/DataColumn values. Example: SafeTrim(dt.Rows[0]["id"]) - it's common for that to contain a DBNull.Value, and it's also common for that to contain edge spaces that need to be trimmed.

Can it be optimized in any way?

Edit 2

I'll be trying these different approaches, under load, and reporting back tomorrow.

halfer
  • 19,824
  • 17
  • 99
  • 186
HerrimanCoder
  • 6,835
  • 24
  • 78
  • 158
  • 1
    This might be useful: https://stackoverflow.com/questions/6442421/c-sharp-fastest-way-to-remove-extra-white-spaces – Hunter McMillen Dec 03 '18 at 22:03
  • 2
    Why does it take parameter `object` if you want to trim a string value? How is this called and what is being passed to it? – Igor Dec 03 '18 at 22:06
  • 1
    I would start with asking yourself whether you really need to call this method so often. Perhaps (and just perhaps) you could eliminate a substantial number of calls to this methods by handling record fields in a type-aware manner (for example, for record fields containing numbers you would not need to call this method) –  Dec 03 '18 at 22:06
  • 1
    The DBNull.Value seems like something being fetched from IDataReader; funny thing is you can get IsDbNull/GetFieldType from the first record of IDataReader - then move on with blinding speed. If speed is an issue don't be lazy there. – Ondrej Svejdar Dec 03 '18 at 22:09
  • Trim your strings when you save them, not when you retrieve them. And make your field `NOT NULL`. – Dour High Arch Dec 03 '18 at 22:20
  • 5
    You could apply the [MethodImpl(MethodImplOptions.AggressiveInlining)] attribute. Works in x64 but not x86. That will stop the profiler from complaining about it since it can no longer see the method. Whether your program is *actually* faster is however highly doubtful. Not just because inlining it can make it slower, the execution time of this program should be completely dominated by I/O. The cost of retrieving the data, a profiler generally doesn't show that. Making the code twice as fast does not make the I/O twice as fast, you'd be lucky if you see a few percentage of improvement. – Hans Passant Dec 03 '18 at 22:44
  • I'm not 100% and I'd like to see the IL produced by some of these answers. I have gut feeling there is a lot of boxing and unboxing going on. Perhaps have overloaded `Trim` methods for different types, then at least you wont need to do `ToString()` on an object where you know its value. – Jeremy Thompson Dec 03 '18 at 22:46
  • Can `str` be anything else than string and DBNull? Can it be number? – Antonín Lejsek Dec 04 '18 at 02:08
  • @AntonínLejsek: it cannot be a true number. In all cases it's a string, a Column value from a DataTable, null string, or DBNull.Value. – HerrimanCoder Dec 04 '18 at 23:00

3 Answers3

1

Seems to me a couple of simple overloads might be useful:

public static string SafeTrim(object str)
{
    return str is string x ? SafeTrim(x) : String.Empty;
}

public static string SafeTrim(string str)
{
    return str?.Trim() ?? string.Empty;
}

I don't understand what the return value should be if the object str is some other type so I haven't put this in.


Here's my test code:

void Main()
{
    var rnd = new Random();
    var strings = Enumerable.Range(0, 1000000).Select(x => x.ToString() + (rnd.Next(2) == 0 ? " " : "")).ToArray();

    var results = new [] { new { source = 0, elapsed = TimeSpan.Zero } }.Take(0).ToList();

    for (int i = 0; i < 100; i++)
    {

        var sw = Stopwatch.StartNew();
        var trimmed0 = strings.Select(x => SafeTrim0(x)).ToArray();
        sw.Stop();
        results.Add(new { source = 0, elapsed = sw.Elapsed });

        sw = Stopwatch.StartNew();
        var trimmed1 = strings.Select(x => SafeTrim1(x)).ToArray();
        sw.Stop();
        results.Add(new { source = 1, elapsed = sw.Elapsed });      
    }

    results.GroupBy(x => x.source, x => x.elapsed.TotalMilliseconds).Select(x => new { x.Key, Average = x.Average() }).Dump();
}

public static string SafeTrim1(string str)
{
    return str?.Trim() ?? string.Empty;
}

public static string SafeTrim0(object str) 
{
    if ( str == null || str == DBNull.Value )
      return string.Empty;
    else
      return str.ToString().Trim();
}

That's given a 155.8 ms versus 147.7 ms average run length between SafeTrim0 and SafeTrim1 respectively.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
0
Possible solution
public static string SafeTrim(object str)
{
     string result = (str as string)?.Trim() ?? null;
     if (result == null)
           return string.Empty;
     return result;
}
Nick Reshetinsky
  • 447
  • 2
  • 13
  • 2
    Or a non-string case. This behaves *radically* differently. – Servy Dec 03 '18 at 22:11
  • Look at the line: string result = (str as string)?.Trim() ?? null; If the value DBNull passed in argument it will automatically cast to null as it's not string – Nick Reshetinsky Dec 03 '18 at 22:12
  • 1
    To increase performance even more you can avoid recreation of string values that are already Trimd, by rechecking first and last characters not being separator strings. – Vadim Alekseevsky Dec 03 '18 at 22:12
0

Using ReferenceEquals you can cut down on some nested function calls from the operator overloading.

public static string SafeTrim(object str) 
{
    if(ReferenceEquals(str, null) || 
       ReferenceEquals(str, DBNull.Value)) //DbNull.Value is singleton
    { 
        return string.Empty;
    }

    return str.ToString().Trim();
}
bubble
  • 33
  • 7