1

I have following string (CrLf might be inserted outside {} and ())

{item1}, {item2} (2), {item3}    (4),  {item4}
(1), {item5},{item6}(5)

I am trying to separate each item to their components and create a JSON from it using regular expression.

the output should look like this

{"name":"item1", "count":""}, {"name":"item2", "count":""}, {"name":"item3", "count":""}, {"name":"item4", "count":""}, {"name":"item5", "count":""},{"name":"item6", "count":""}

So far I have following regex, but it does not capture second group.

\{(.[^,\n\]]*)\}\s*[\((.\d)\)]*

I am replacing the matches with

{\"name\":\"${1}\", \"count\":\"${2}\"}

Here is my test link

What I am doing wrong?

Second question

Is it possible to change items without count to zero such that my second capture group read as 0?

For example Instead of changing {item1} to {"name":"item1", "count":""}, it should change to {"name":"item1", "count":"0"}

AaA
  • 3,600
  • 8
  • 61
  • 86

3 Answers3

2

Your second capture group is invalid for capturing numeric information i.e. [\((.\d)\)] which is why nothing is caught. Also, it's recommended when capturing numbers you use [0-9] because \d can also catch unwanted unicode-defined characters.

The following regex will capture the 2 groups only (unlike @revo's answer which captures an unnecessary group inbetween)

\{(.[^,\n\]]*)\}(?:\s*\(([0-9]+)\))?

As for the second requirement, regex is used for capturing information from existing data, as far as I am aware it's not possible to inject information that isn't already present. The simplest approach there would be to fix up the JSON after the regex has run.

Or alternatively, you could include a 0 at the start of your replace, that way any empty captures will always have a value of 0 and any captured ones will still be valid but just include a 0 at the beginning e.g. 04/035 etc.

{\"name\":\"$1\", \"count\":\"0$2\"}
Community
  • 1
  • 1
James
  • 80,725
  • 18
  • 167
  • 237
  • Nice solution for zeroes. I wasn't aware of the unicode situation, however my data is ascii data, so I wouldn't be worried about that. However I am going with @revo's solution, because he was first, hope you don't mind :-) – AaA Feb 14 '15 at 13:48
  • @BobSort I don't mind at all, however, just bare in mind his answer does capture an extra unnecessary group which means in your replace you need to be referencing `$1` & `$3`, mines makes use of a non-capture group to only pull out the name + value which means your replace stays exactly as it is. – James Feb 14 '15 at 13:50
1

1- You're using a malformed version of Regular Expressions. (using captured groups inside characters sequence [])

2- You're not including second captured group in your replacement pattern.

I updated your Regex to:

\{(.[^,\n\]]*)\}\s*(\((\d*)\))?

Live demo

I'm going to offer a better regex for this problem.

Update:

{(\w+)}\s*(\((\d+)[),])?

Live demo

Community
  • 1
  • 1
revo
  • 47,783
  • 14
  • 74
  • 117
  • Oh! I forget to update regex101 (I have `${2}` in question here). I assume it is not possible to replace `${2}` with 0 if second capture group is not available? – AaA Feb 14 '15 at 10:00
  • Yes @BobSort, you've to programmatically replace those empty `count` keys to `0`. I offered my regex version too. – revo Feb 14 '15 at 10:29
0

A solution without regex . I tried to extract data from the string using substring method and it seems to work fine

int start, end;

String a = "{item1}, {item2} (2), {item3}    (4),  {item4}(1), {item5},{item6}(5)";

string[] b = a.Split(',');

foreach (String item in b)
{
     Console.WriteLine(item);

     start=item.IndexOf('{') +1 ;
     end = item.IndexOf('}');

     Console.WriteLine(" \t Name : " + item.Substring(start,end-start));

      if (item.IndexOf('(')!=-1 )
      {    
           start = item.IndexOf('(');

           Console.WriteLine(" \t Count : " + item[start+1] );
       }

 }
Kavindu Dodanduwa
  • 12,193
  • 3
  • 33
  • 46
  • Yes, your solution works. But equal code to your work would be `Regex.Replace(ItemTooltip, @"\[\[\:(.[^,\n\]]*)\]\]\s*(\((\d)\))?", "{\"name\":\"${1}\", \"count\":\"${3}\"}")` which is just one line and if you know how to use regex, you don't need to worry about word indexes. By the way, your method will break if you have comma in item names, that is why they are enclosed in curly brackets. – AaA Feb 14 '15 at 13:39