3

I have a large string. In the string, there is a number. I am trying to use C# to return just that number

Sample string:

{ "@odata.context": "https://api.applicationinsights.io/v1/apps/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/events/$metadata#customEvents", "@ai.messages": [ { "code": "AddedLimitToQuery",
"message": "The query was limited to 500 rows" } ], "@odata.count": 45, "value": [ { "id": "XXXXXXXXXXXXXXX",
"count": 1..."

My goal is just to return 45 as an int.

The first number after "@odata.count":

I know I can parse the whole string with some JSON to object tool e.g. newtonsoft. However, in this case, I need this to be very performant and don't want to parse data I don't need.

On this case, I know the data I need is in the first 2 or 3 lines of a 100,000+ JSON.

My goal is not to parse the whole object by just to get a single value as fast as possible.

Question

How can I get the first number following a specific text string?

BDL
  • 21,052
  • 22
  • 49
  • 55
Luke Hammer
  • 2,076
  • 3
  • 18
  • 31
  • 4
    You could use a regex but id probably opt to parse the json using something like json.net then just grab the parsed value. That way if you ever need another value, you can easily get it – GregH Apr 03 '19 at 22:08
  • 6
    Forget regex. It's JSON; use it like JSON. – madreflection Apr 03 '19 at 22:09
  • 7
    **NO, NOT REGEX!** – Joel Coehoorn Apr 03 '19 at 22:10
  • Before you discard options (such as JSON parsing) you need to define what **very performant** means to you. It may well be that the simplest and most maintainable solution (i.e. JSON parsing) has performance characteristics that satisfy your required levels of performance. Without this information, your question isn't answerable. – Rob Apr 03 '19 at 22:11
  • @rob I'm confident that parsing all the JSON will be less performant the data I'm looking for is in the first 2 or 3 lines of a 100000+ line Json blob – Luke Hammer Apr 03 '19 at 22:14
  • why not use string location to start at the character after : and stop at the comma then trim it and convert to number? Or alternatively set up a compiled regex so that your program uses a precompiled regex built once and match the pattern you seek in a regex call like @odata\.count":(?\s\d*)\, – Brandon Culley Apr 03 '19 at 22:14
  • 3
    Have you considered reader-based JSON parsing, which can treat it like a stream rather than loading the whole graph? – madreflection Apr 03 '19 at 22:15
  • @GregH in this case i know i will not be needing the rest of the data – Luke Hammer Apr 03 '19 at 22:15
  • @madreflection OK this idea is new to me and sound like the right idea. can you tell me more? or point me to resource – Luke Hammer Apr 03 '19 at 22:16
  • @JoelCoehoorn why? Did you already tested it and sure that regex is less performant than parsing whole object? I'm agree that regex is ugly solution when working with JSON, but if we dealing with performance inside a bottleneck - any hacks is fine – Aleks Andreev Apr 03 '19 at 22:17
  • JSON.NET has a `JsonReader` class in the `Newtonsoft.Json` namespace. I've never used it but I believe the deserializer uses it internally. Check out the [official documentation](https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_JsonReader.htm). It's abstract so you'll probably want [`JsonTextReader`](https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_JsonTextReader.htm) – madreflection Apr 03 '19 at 22:18
  • 2
    Before you start looking for an exotic solution, I'd just parse the JSON in the usual way and see if you really do have a performance problem. It will take you 10 minutes to find out for sure. – Robert Harvey Apr 03 '19 at 22:20
  • @WizardHammer, the fact that the data is in the first 2 or 3 lines of a 100,000+ JSON blob is useful and pertinent information which you should include in the question. But again, have you *tried* plain old JSON parsing and determined that it isn't performant enough (it's vaguely implied by your "I'm confident" statement) to meet your requirements? In fact - what are the performance requirements you're trying to meet? =) – Rob Apr 03 '19 at 22:20
  • See also [How to parse huge JSON file as stream in Json.NET?](https://stackoverflow.com/q/43747477) – Robert Harvey Apr 03 '19 at 22:23
  • @Rob point taken detail added to the question – Luke Hammer Apr 03 '19 at 22:25
  • 1
    Read [this post](https://stackoverflow.com/q/43747477). – Robert Harvey Apr 03 '19 at 22:26
  • Another problem is the name from source with "@odata.count" the "@" is really messing with the code – Luke Hammer Apr 03 '19 at 23:14
  • If you absolutely positively want a non-JSON solution, find the index of the first occurrence of "@odata.count". Then (starting at index of the end of that sub-string) iterate through the string character by character using `char.IsDigit()`. Once you find the first digit, save them off (into a `StringBuilder`) and quit saving when you get to the first non-digit. Convert your StringBuilder to a string and then to a number It's brute and ugly as all get out, but it should work. – Flydog57 Apr 04 '19 at 00:26
  • I wrote you answer how to extract just a peace of data from your JSON, so I am interested how it works with large JSON files. Have you tried thad? @WizardHammer – Milenko Jevremovic Apr 23 '19 at 09:36
  • @Milenko Jevremovic Yes I have. Actually, your code is what is running in my solution now. I have a task to do a performance test on the different options. and plan on reporting my findings here when done. So your code works THANK YOU .. I'm yet to test performance. – Luke Hammer Apr 24 '19 at 17:48

2 Answers2

2

You don't need to deserialize whole json, get Newtonsoft, you need

using Newtonsoft.Json.Linq;

code:

var parsedJson = JObject.Parse("your_whole_json");
var result = parsedJson["@odata.count"].ToObject<int>();

by that you extracted peace of Json you need before deserialization, and you don't need any new class for that, if the value you need is a primitive type, like your property which is int. Try it to see is it fast as you want.

If your property is nested in some parent object you can do parsedJson["parent"]["@odata.count"]. Or if it is in some list, no problem, but you are going to need to loop thru result list first...

Check https://www.newtonsoft.com/json/help/html/SerializingJSONFragments.htm

0

Assuming performance is absolutely critical (probably not), and that this bit of code is the bottleneck (again, probably not). You can use something like the below, which is close to the fastest you can do it.

But note that this assumes you already have the full JSON string - which means you have already incurred a very large performance cost in downloading/reading from disk/whatever. The code below can be altered so that's not required, but it depends on how you are getting the JSON to begin with.

public static int GetNum(string s, string prefix, char terminator, int maxCharsToSearch = 10000)
{
    int numChars = Math.Min(s.Length, maxCharsToSearch);
    for (int i = 0; i < numChars; i++)
    {
        for (int j = 0; j < prefix.Length && j + i < s.Length; j++)
        {
            char c1 = s[j + i];
            char c2 = prefix[j];
            if (c1 == c2)
            {
                if (j == prefix.Length - 1)
                {
                    int num = 0;
                    for (int k = j + i + 1; k < s.Length && k < j + i + 10; k++)
                    {
                        char c = s[k];
                        if (c == terminator)
                        {
                            return num;
                        }
                        num *= 10;
                        num += c - 48;
                    }
                    return -1;
                }
            }
            else
            {
                break;
            }
        }
    }
    return -1;

}

Where you'd call it with:

  • s = The JSON
  • prefix = "@odata.count\": "
  • terminator = ','
  • maxCharsToSearch = However many characters in the JSON you want to search before giving up

This code finds the part of the input JSON string which has the prefix, and then reads characters which it assumes are ASCII characters for digits in a decimal number until it hits a terminator character, and returns the number.

This may break in several ways including:

  • The JSON is not in ASCII
  • The JSON's CaSe is not exactly as specified in the prefix
  • The number being read is not an integer or has commas/periods in it

You could potentially make this faster with unsafe code, maybe doing the loops a bit differently, using a shorter prefix, etc. but again, see how it performs, and then make a decision about whether you want to spend more time making it faster.

MineR
  • 2,144
  • 12
  • 18