-5

I have a String like:

<1>something here just not relevant</1>

I need the fastest way to get the number 1.

My try:

int signature = Convert.ToInt32(data.Split('>')[0].Remove(0, 1));

user3571412
  • 105
  • 1
  • 9
  • Are you guaranteed your string is formatted as XML? – BradleyDotNET Apr 11 '17 at 17:29
  • [That's not XML](http://www.xml.com/axml/target.html#NT-Name). – Dour High Arch Apr 11 '17 at 17:31
  • Possible duplicate of [How can I extract a string between tags usings C#?](http://stackoverflow.com/questions/17298353/how-can-i-extract-a-string-between-strong-tags-usings-c) – austin wernli Apr 11 '17 at 17:31
  • @austinwernli Not quite; this OP wants the tag name, not what's between the tags. – Heretic Monkey Apr 11 '17 at 17:31
  • ahh wow, read it wrong heh – austin wernli Apr 11 '17 at 17:32
  • Is the string always in that format, i.e. you want the number starting at the second character in the string and ending at the first occurrence of ">" in the string? – Andrew Morton Apr 11 '17 at 17:33
  • 1
    Does what you have work? If so, how does it perform now? In other words, are you seeing performance issues with it, and that's why you need the "fastest" or is this just Code Golf? – Heretic Monkey Apr 11 '17 at 17:33
  • 3
    Have you already found that this is definitely a performance bottleneck in your program? If not, write the *simplest* code first, and then make sure you have concrete performance goals. – Jon Skeet Apr 11 '17 at 17:36
  • What is your performance budget (including memory usage). How large are the strings you need to support? What's the expected behavior if the tags contain non-digits? – Conrad Frix Apr 11 '17 at 17:36
  • 1
    [Whats the main difference between int.Parse() and Convert.ToInt32](http://stackoverflow.com/a/199484/1115360ToInt32]). – Andrew Morton Apr 11 '17 at 17:37
  • int.Parse() converts only from strings. Convert.ToInt32() converts also from other data types. How they behave internally, don't know, I think someone will need to disassembly. – andreim Apr 11 '17 at 17:56
  • Please explain a bit more your scenario. Else everyone will try guessing. Is this a job interview question? – andreim Apr 11 '17 at 18:04
  • There really is no context as to the purpose of the question. One could easily answer that signature = int.Parse(data.Substring(1,1)).... – Mad Myche Apr 11 '17 at 19:18
  • 2
    `int signature = 1;` also answers the question and is faster than anything else proposed so far. – Dour High Arch Apr 11 '17 at 19:46
  • @AndrewMorton: sorry for that, I think I was color blind not noticing the link – andreim Apr 12 '17 at 03:36

2 Answers2

0
string str = "<1>something here just not relevant</1>";
result = int.Parse(str.Substring(1, str.IndexOf('>') - 1));

This should be faster than a Regex and will work as long as you can be sure the first character of the string is the opening angle bracket (and not whitespace or a previous tag). If you can't be sure of that, use a second IndexOf:

string str = "<1>something here just not relevant</1>";
var start = str.IndexOf('<')+1;
result = int.Parse(str.Substring(start, str.IndexOf('>') - start));

A Regex match would be more reliable and cleaner, but slightly slower:

string str = "<1>something here just not relevant</1>";
result = int.Parse(Regex.Match(str, "<(\d+)>").Captures[0]);
KeithS
  • 70,210
  • 21
  • 112
  • 164
0

Just for fun and surprisingly fast:

public static int GetSignature(string value)
{
  int result = 0;
  for (int i = 1; i < 12; i++) // 12 should be more than enough for Int32
  {
    if (value[i] == '>')
    {
      break;
    }
    result = (result * 10) + (value[i] - '0');
  }
  return result;
}

Edit: My quick'n'dirty test setup

class Program
{
  static void Main(string[] args)
  {
    Run(new Random());

    Console.ReadLine();
  }

  private static void Run(Random rnd)
  {
    var counts = new[] { 10, 100000, 10000000 };

    foreach (var count in counts)
    {
      Console.WriteLine(count);
      Run(count, rnd);
      Console.WriteLine();
    }
  }

  private static void Run(int count, Random rnd)
  {
    var values = GetValues(count, rnd);

    var funcs
      = new Dictionary<string, Func<string, int>>
      {
        {"OP", GetSignatureOP},
        {"Keith", GetSignatureKeith},
        {"Fun", GetSignatureFun},
      };

    foreach (var kvp in funcs)
    {
      TimeSpan elapsed;
      Test(values, kvp.Value, out elapsed);
      Console.WriteLine("{0,-5}: {1:G}", kvp.Key, elapsed);
    }
  }

  private static IList<string> GetValues(int count, Random rnd)
  {
    var result = new List<string>(count);

    for (int index = 0; index < count; index++)
    {
      result.Add(string.Format("<{0}>something here just not relevant</{0}>", rnd.Next(1, 10)));
    }

    return result;
  }

  private static int Test(IEnumerable<string> values, Func<string, int> func, out TimeSpan elapsed)
  {
    GC.Collect();
    GC.WaitForPendingFinalizers();

    var sw = Stopwatch.StartNew();
    var count = values.Aggregate(0, (current, value) => current ^ func(value));
    sw.Stop();

    elapsed = sw.Elapsed;

    return count;
  }


  private static int GetSignatureOP(string value)
  {
    return Convert.ToInt32(value.Split('>')[0].Remove(0, 1));
  }
  private static int GetSignatureKeith(string value)
  {
    return int.Parse(value.Substring(1, value.IndexOf('>') - 1));
  }
  private static int GetSignatureFun(string value)
  {
    int result = 0;
    for (int i = 1; i < 12; i++)
    {
      if (value[i] == '>')
      {
        break;
      }
      result = (result * 10) + (value[i] - '0');
    }
    return result;
  }
}

Results (on my machine):

10
OP   : 0:00:00:00,0007532
Keith: 0:00:00:00,0001523
Fun  : 0:00:00:00,0001307

100000
OP   : 0:00:00:00,0306495
Keith: 0:00:00:00,0116116
Fun  : 0:00:00:00,0018416

10000000
OP   : 0:00:00:02,7450986
Keith: 0:00:00:01,1598363
Fun  : 0:00:00:00,1855654

And with random values for rnd.Next(1, int.MaxValue):

10
OP   : 0:00:00:00,0006975
Keith: 0:00:00:00,0001147
Fun  : 0:00:00:00,0001246

100000
OP   : 0:00:00:00,0409755
Keith: 0:00:00:00,0187789
Fun  : 0:00:00:00,0030894

10000000
OP   : 0:00:00:04,0060685
Keith: 0:00:00:01,9214684
Fun  : 0:00:00:00,3083399
Corak
  • 2,688
  • 2
  • 21
  • 26
  • It concerns me that this answer was accepted, even if technically answers the question – Mark Peters Apr 11 '17 at 19:20
  • @MarkPeters - agreed. Although, going from what OP provided, the only information we have is that a) first character is irrelevant, b) the next n charactes represent a valid `Int32` value that c) ends at the character `'>'`. Which this does, too. And according to my tests, even a little bit faster than the other solutions. But I'd never use something like that in production code. – Corak Apr 11 '17 at 19:29
  • No problem with your answer. It's the kind of answer I might write as an excuse to play (on a question without enough information). I'm concerned (by it being accepted) that it *will* end up in production code... – Mark Peters Apr 11 '17 at 19:47
  • 1
    Some quick benchmarks for 10M iterations show about a 50X improvement for this one over the Regex variant. Surprising? Not that I would ever use this in production code without a good reason (like I needed to do it 10M times and it had proven to be the bottleneck) – Mark Peters Apr 11 '17 at 20:48
  • @AndrewMorton - added test setup. -- Didn't bother with regex, because in my experience, that has always been slower for these very simple tasks. – Corak Apr 12 '17 at 04:25
  • I was impressed that I had to use a billion iterations to have it take any meaningful time on my machine (14 secs) – Mark Peters Apr 12 '17 at 13:57