0

I am trying to parse a string and build a dictionary that will contain the name of the field and its value as key / value.

Here's the code:

var dictPriceList = Regex.Matches(priceListToParse, @"""(.+?) - \$([\d.]+)").Cast<Match>()
                                         .ToDictionary(x => x.Groups[1].Value,
                                                       x => x.Groups[2].Value);

And here's a sample of the string I need to parse:

var results = 
[{\r\n\t\"bandanaColor\": \"rgb(0,0,255)\",\r\n    \"ninja\": \"Leonardo - $0.99\",\r\n    \"data\": [[1336485655241,0.99],[1336566333236,0.99],[1336679536073,0.99],[1336706394834,0.99],[1336774593068,0.99],[1366284992043,0.99]]},
\r\n{\r\n\t\"bandanaColor\": \"rgb(128,0,128)\",\r\n    \"ninja\": \"Donatello - $0.25\",\r\n    \"data\": [[1361061084420,0.23],[1366102471587,0.25],[1366226367262,0.25],[1366284992043,0.25]]},
\r\n{\r\n\t\"bandanaColor\": \"rgb(255,0,0)\",\r\n    \"ninja\": \"Raphael - $0.15\",\r\n    \"data\": [[1327305600000,0.15], [1365583220422,0.15],[1365669396241,0.15],[1365669396241,0.15],[1365753433493,0.15],[1366284992043,0.15]]},\r\n\
r\n{\r\n\t\"bandanaColor\": \"rgb(255,165,0)\",\r\n    \"ninja\": \"Michelangelo - $0.14\",\r\n    \"data\": [1366284992043,0.14]};

Right here the only values that I need would be the tag "ninja" and the "Leonardo - $0.99". So I would end up having these values:

Key \ Value

Leonardo \ 0.99

Donatello \ 0.25

Raphael \ 0.15

Michelangelo \ 0.14

Ok, I already tried my hand at this problem and got help from all of you, which I am grateful, but since then I had no occasion to try my hand. But since then the t\"bandanaColor\": \"rgb(0,0,255)\", line was added and I need to ignore it because it is not relevant for what I want to do. Now the line crashes, I think, because of this line.

Can anyone help me out figuring how to get what I'm looking for?

EDIT

When the code hits the regex line, here's the crash I obtain:

An item with the same key has already been added.

hsim
  • 2,000
  • 6
  • 33
  • 69
  • 2
    Json.NET by Newtsonsoft would make this a lot easier (without trying to hack it with regex). – Brad Christie Jun 18 '13 at 13:01
  • Not the first time I hear about Json.Net! But I don't know how it works. I'll go check it out, but that does not solve my problem right now – hsim Jun 18 '13 at 13:03
  • Using regex is what's not solving your problem. If you're going to parse JSON data, you can't afford NOT to use a parser written to eat, sleep and breathe JSON objects. – Brad Christie Jun 18 '13 at 13:05
  • Oh, ok. So if I understand well, what I'm reading is a JSon object, and since I'm gathering it in a string and trying to use a regex on it, I'm only making it harder? – hsim Jun 18 '13 at 13:07
  • Also, you have JSON error (the last `data:` should be `data: [[...]]` (it's a double array, but only begins with one opening `[`--maybe a copy-paste error?) – Brad Christie Jun 18 '13 at 13:08
  • Yes, a copy paste error for the last bit. As for the JSon object, what puzzles me is that the string I obtain (the code pasted) is the result of a javascript line posted in the html object I'm already parsing. Are those two the same object? – hsim Jun 18 '13 at 13:10

1 Answers1

1

Okay, so first off the JSON you supplied had a few errors to it. Assuming this was from just creating the question and there's not a larger issue, the final result should look similar to this:

[{\"bandanaColor\":\"rgb(0,0,255)\",\"ninja\":\"Leonardo - $0.99\",\"data\":[[1336485655241,0.99],[1336566333236,0.99],[1336679536073,0.99],[1336706394834,0.99],[1336774593068,0.99],[1366284992043,0.99]]},{\"bandanaColor\":\"rgb(128,0,128)\",\"ninja\":\"Donatello - $0.25\",\"data\":[[1361061084420,0.23],[1366102471587,0.25],[1366226367262,0.25],[1366284992043,0.25]]},{\"bandanaColor\":\"rgb(255,0,0)\",\"ninja\":\"Raphael - $0.15\",\"data\":[[1327305600000,0.15],[1365583220422,0.15],[1365669396241,0.15],[1365669396241,0.15],[1365753433493,0.15],[1366284992043,0.15]]},{\"bandanaColor\":\"rgb(255,165,0)\",\"ninja\":\"Michelangelo - $0.14\",\"data\":[1366284992043,0.14]}]

(Which can be tested using a site like this one.)

Now, as far as parsing, Regex is going to be problematic. there are a lot of different obstacles to JSON, so a parser is going to be the best bet. And, to avoid re-inventing the wheel, the Json.NET library happens to work amazingly welll. To give an example:

/* includes
using System.Linq;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
*/

// the original JSON string
String jsonString = "[{\"bandanaColor\":\"rgb(0,0,255)\",\"ninja\":\"Leonardo - $0.99\",\"data\":[[1336485655241,0.99],[1336566333236,0.99],[1336679536073,0.99],[1336706394834,0.99],[1336774593068,0.99],[1366284992043,0.99]]},{\"bandanaColor\":\"rgb(128,0,128)\",\"ninja\":\"Donatello - $0.25\",\"data\":[[1361061084420,0.23],[1366102471587,0.25],[1366226367262,0.25],[1366284992043,0.25]]},{\"bandanaColor\":\"rgb(255,0,0)\",\"ninja\":\"Raphael - $0.15\",\"data\":[[1327305600000,0.15],[1365583220422,0.15],[1365669396241,0.15],[1365669396241,0.15],[1365753433493,0.15],[1366284992043,0.15]]},{\"bandanaColor\":\"rgb(255,165,0)\",\"ninja\":\"Michelangelo - $0.14\",\"data\":[1366284992043,0.14]}]";
// The parsed result (using Json.NET)
var json = JsonConvert.DeserializeObject(jsonString);
// Grab all the "ninjas"
String[] ninjas = (json as JArray).Select (x => x.Value<String>("ninja")).ToArray();
// Begin aggregating the results
IDictionary<String, Double> result = ninjas.ToDictionary(
    x => x.Substring(0, x.IndexOf(" - ")),
    y => {
        Double d;
        return Double.TryParse(y.Substring(y.IndexOf(" - ") + 4), out d) ? d : default(Double);
    }
);

Gives you:

Key           Value
Leonardo      0.99 
Donatello     0.25 
Raphael       0.15 
Michelangelo  0.14 

I used simple string parsing (splitting by the instance of -) but you can get more elaborate if necessary.


Second Version

A little more investment time (since you're not creating an object) but the payoff is that you're not having to learn the Json syntax for dealing with JObject, JArray, etc. Instead, it serializes the data in to classes (you've created) making retrieving the information a bit more fluent. e.g.

public class ParentObj
{
    public String bandanaColor;
    public String ninja;

    public String NinjaName
    {
        get
        {
            String ninja = this.ninja ?? String.Empty;
            Int32 i = ninja.IndexOf(" - ");
            return i != -1 ? ninja.Substring(0, i) : String.Empty;
        }
    }
    public Double NinjaPrice
    {
        get
        {
            String ninja = this.ninja ?? String.Empty;
            Double price;
            Int32 i = ninja.IndexOf(" - $");
            return i != -1 && Double.TryParse(ninja.Substring(i + 4), out price) ? price : default(Double);
        }
    }
}

void Main()
{
    // the original JSON string
    String jsonString = "[{\"bandanaColor\":\"rgb(0,0,255)\",\"ninja\":\"Leonardo - $0.99\",\"data\":[[1336485655241,0.99],[1336566333236,0.99],[1336679536073,0.99],[1336706394834,0.99],[1336774593068,0.99],[1366284992043,0.99]]},{\"bandanaColor\":\"rgb(128,0,128)\",\"ninja\":\"Donatello - $0.25\",\"data\":[[1361061084420,0.23],[1366102471587,0.25],[1366226367262,0.25],[1366284992043,0.25]]},{\"bandanaColor\":\"rgb(255,0,0)\",\"ninja\":\"Raphael - $0.15\",\"data\":[[1327305600000,0.15],[1365583220422,0.15],[1365669396241,0.15],[1365669396241,0.15],[1365753433493,0.15],[1366284992043,0.15]]},{\"bandanaColor\":\"rgb(255,165,0)\",\"ninja\":\"Michelangelo - $0.14\",\"data\":[1366284992043,0.14]}];";
    // The parsed result (using Json.NET in to our custom object)
    IEnumerable<ParentObj> json = JsonConvert.DeserializeObject<IEnumerable<ParentObj>>(jsonString.TrimEnd(';'));
    // the use Linq to create a dictionary from our custom getters
    IDictionary<String, Double> result = json.ToDictionary (x => x.NinjaName, y => y.NinjaPrice);
}

Same outcome as above, but now the parsing is done through the ParenObj class with custom getters.

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
  • Interesting. I get a crash. Could it be that I need to remove the "var results = " line before deserializing the object? – hsim Jun 18 '13 at 13:30
  • This just in: I found out that the rest of the javascript used in the html view is also found in the string. So I have this error that pops: `Additional text encountered after finished reading JSON content: ;. Path '', line 54, position 2.` – hsim Jun 18 '13 at 13:34
  • @HerveS: Call a `.TrimEnd(';')` on the string before passing it to `JsonConvert.DeserializeObject`. – Brad Christie Jun 18 '13 at 13:35
  • Nearly there! I managed to get the dictionary using what you gave me, but there's a problem in the dictionary obtained. Each double is 0. But the doubles in the array are good. Is there a problem in the IDictionary line? – hsim Jun 18 '13 at 14:01
  • @HerveS: Can you provide an example of the JSOn you're trying to parse? also, which version are you using, the first or the second? – Brad Christie Jun 18 '13 at 14:16