1

Say I have this string:

var results = 
[{\r\n    \"ninja\": \"Leonardo - $0.99\",\r\n    \"data\": [[1336485655241,0.99],[1336566333236,0.99],[1336679536073,0.99],[1336706394834,0.99],[1336774593068,0.99],[1366284992043,0.99]]},
\r\n{\r\n    \"ninja\": \"Donatello - $0.25\",\r\n    \"data\": [[1361061084420,0.23],[1366102471587,0.25],[1366226367262,0.25],[1366284992043,0.25]]},
\r\n{\r\n    \"ninja\": \"Raphael - $0.15\",\r\n    \"data\": [[1327305600000,0.15], [1365583220422,0.15],[1365669396241,0.15],[1365669396241,0.15],[1365753433493,0.15],[1366284992043,0.15]]},\r\n\
r\n{\r\n    \"ninja\": \"Michelangelo - $0.14\",\r\n    \"data\": [1366284992043,0.14]]};

I wanted to build a dictionary that would store the names of the ninjas and their price, so that I would have:

Key \ Value

Leonardo \ 0.99

Donatello \ 0.25

Raphael \ 0.15

Michelangelo \ 0.14

So I have been reading a LOT since a few days about regex, and I don't know how it works yet. Up to now I have this line of code:

var dictNinjas = Regex.Matches(priceListValue, @"\*(\w+)=(a-zA-Z)|\*(\$(0-9))").Cast<Match>()
                                        .ToDictionary(x => x.Groups[0].Value,
                                                      x => x.Groups[1].Value);

My comprehension was that is would first seek all words with letters a-zA-Z, then all values located right after the $ symbol. The | symbol is the grouping, so the first parameters was group 0 and the second parameter would be group 1. But this does not work.

Can anyone help me out? I'm trying hard to understand how to make this work, thank you.

hsim
  • 2,000
  • 6
  • 33
  • 69
  • Where is this string coming from? – DGibbs Apr 19 '13 at 16:00
  • 12
    You string looks like a JSon string. Should't you use a json deserializer? – Steve B Apr 19 '13 at 16:00
  • @DGibbs Parsing through a html document using html agility pack, a string I got from a node. – hsim Apr 19 '13 at 16:01
  • @SteveB It is highly possible, I don't know because I don't know JSon – hsim Apr 19 '13 at 16:02
  • 3
    @HerveS It looks like json to me, check out [json.net](http://james.newtonking.com/projects/json-net.aspx) – DGibbs Apr 19 '13 at 16:03
  • Sounds cool, I'll check this out. If I use json.net, how could I do this? – hsim Apr 19 '13 at 16:05
  • Read [this previous SO answer](http://stackoverflow.com/a/1212115/588868) – Steve B Apr 19 '13 at 16:10
  • 1
    I see a couple of things in the RegEx that need changing: 1. in two places you have accidentally escaped the `*` character, so you are saying "zero or more `\ ` characters". You need `\\*` instead of `\*`. 2. `(0-9)` matches the string "0-9", but it looks like you want any digit; use `[0-9]` instead (or `[0-9]*` or `[0-9]+`) -- the same thing goes for `(a-zA-Z)`. There may be something else, but those are the first things I saw. – EvilBob22 Apr 19 '13 at 16:10

2 Answers2

1

Groups[0].Value is the whole match, so you need 1 and 2

var dictNinjas = Regex.Matches(str, @"""(\w+) - \$([\d.]+)").Cast<Match>()
                                    .ToDictionary(x => x.Groups[1].Value,
                                                  x => x.Groups[2].Value);

Groups[1].Value refers to the content captured in the first () in the regex, and `Groups[2].Value the second.

I am not sure why you have a = in your regex but t looks like you have misunderstood something along the way.

MikeM
  • 13,156
  • 2
  • 34
  • 47
  • `([\d.]+)` could match things like `1..231.21399..` but as long as the inputs are carefully typed it should be fine. – Izzy Apr 19 '13 at 16:37
  • Actually, this works fine! So let me resume: first, @MikeM, you are right: it's my first "try" with Regex and my comprehension is... close to nothing. So now, I think that this looks for any complete words (the \w looks for words, though I don't know what the + does exactly), then follows up to look for any $ sign and any numbers (The d sign) located right after. Right? – hsim Apr 19 '13 at 16:42
  • @Izzy. Yes, I agree that something like `(\d+(?:\.\d+)?)` would be better. – MikeM Apr 19 '13 at 16:43
  • @MikeM `(\d+\.\d{2})` is the simplest safe option I've found. – Izzy Apr 19 '13 at 16:46
  • @HerveS. `\w` means any word character, i.e. a letter, digit or `_`, and `+` means _one or more_. `\d+` means one or more digits. – MikeM Apr 19 '13 at 16:46
  • Thanks! My understanding of regex leveled up because of you both, so thanks a lot. – hsim Apr 19 '13 at 16:50
  • @MikeM It would only match if there were two digits, but if there were more it would still only match those two. I try to shy away from using `+?` and `+*` as much as possible.. but if you need the extra accuracy with your currency values then `(\d+\.\d\d+)` should be fine. (`(\d+\.\d+?)` stops after the first digit after the `.`) – Izzy Apr 19 '13 at 16:53
  • 1
    @HerveS Check out http://www.zytrax.com/tech/web/regex.htm, it will help a load (has a neat little tool to test your RE too) – Izzy Apr 19 '13 at 16:56
  • @MikeM I have a question: Say that I add another value looking like this:\r\n{\r\n \"label\": \"AAAA Team FourA - $19.98\",\r\n \"data\": [[1350156916637,24.98],[1350303087864,24.98],[1351125084705,24.98],[1351370833325,24.98], you Regex does not catch him since CCG is not a word, how could I included these? – hsim Apr 19 '13 at 17:38
  • 1
    @HerveS. Where is CCG? I am not sure what you mean, but if you change `(\w+)` to `(.+?)` it will match several words not just one. – MikeM Apr 19 '13 at 18:15
  • My mistake, I mean AAAA Team. It's because AAAA doesn't make a word, I think it skips this possibility. – hsim Apr 19 '13 at 18:32
  • Worked perfectly, that was exactly what I needed. – hsim Apr 19 '13 at 18:33
1

Firstly:

so the first parameters was group 0 and the second parameter would be group 1

  • Group 0 is the whole matched string
  • Group 1 is the group that connects to the first close bracket.

Don't worry, it's a common mistake to make.

This site has a very handy regex tester tool as well as lots of RE info - just remember that when you put your Regular expression search string into C# you might need to escape some more characters and verbatim might not interpret things correctly.

For example: I plug (\w+) - \$(\d+\.\d{2}) is as my RE string and get:

First match: Leonardo - $0.99 at position 24 Backreferences: $1 = Leonardo $2 = 0.99 Additional matches: Found: Donatello - $0.25 at position 217 Found: Raphael - $0.15 at position 369 Found: Michelangelo - $0.14 at position 566

Izzy
  • 1,764
  • 1
  • 17
  • 31