1

I have data in a non-standard format which I am trying to convert to intelligible json.

Format is like

dataheads@first{
       first_data = <value1@123 value2_456 value3_789>;
       second_data = <<value4_abc_123 value5_ty>>;};

I need in this step:

first_data = <value1@123,value2_456,value3_789>;
second_data = <<value4_abc_123,value5_ty>>;

I tried Regex.Replace(contents, @"(<.*)\ (.*>)", "$1,$2"); but it only works for a single space between <>. \S*(\s)\S* messes up data outside <>. I am not sure why <\S*(\s)\S*> doesn't work. As can be seen, there are a lot more substitutions needed to convert to json so have to be careful not to mess the outsides.

  • Do you already have a tool/api/etc that understands the current data format? I'm concerned that we're going to help you shoot yourself in the foot... In particular my suspicion is that `second_data` has a list for it's value -- implying that you'll need a parser to completely accomplish the task – Gus Jun 20 '22 at 15:23
  • 1
    Is [this](http://www.regexstorm.net/tester?p=+%28%3f%3d%5b%5e%3e%3c%5d*%3e%28%3f!%3e%29%29&i=%3cvalue1%40123+value2_456+value3_789%3e%0d%0a%3c%3cvalue4_abc_123+value5_ty%3e%3e&r=%2c) the expected result? (click on "context") – bobble bubble Jun 20 '22 at 15:33
  • 1
    @bobblebubble or `(?<=(?<!<)<[^<>]*) (?=[^><]*>(?!>))` [demo](http://www.regexstorm.net/tester?p=%28%3f%3c%3d%28%3f%3c!%3c%29%3c%5b%5e%3c%3e%5d*%29+%28%3f%3d%5b%5e%3e%3c%5d*%3e%28%3f!%3e%29%29&i=%3cvalue1%40123+value2_456+value3_789%3e%0d%0a%3c%3cvalue4_abc_123+value5_ty%3e%3e&r=%2c) – The fourth bird Jun 20 '22 at 15:42
  • 1
    Was thinking of this too @4th bird! Depending on data might be already enough to just look ahead, but dunno. Certainly that is more accurate what you show!! Myself I'm even unclear about the `second_data`... – bobble bubble Jun 20 '22 at 15:54
  • @bobblebubble You could make a post out of it if you want. – The fourth bird Jun 20 '22 at 16:48
  • No worries @4th bird! Waiting myself if further detail gets provided, to me it's not clear. If more answers here that's good isn't it, but thank you! – bobble bubble Jun 20 '22 at 17:50
  • @bobblebubble yes that looks correct – Rachit Ajitsaria Jun 21 '22 at 06:11
  • @Gus no, unfortunately there is no such api right now. I did try to talk with the team but hit a wall. – Rachit Ajitsaria Jun 21 '22 at 06:12

1 Answers1

2

You can use this regex:

@"((?<=<[^>]*)[ ](?=[^>]*>))+"

Explanation:

( - start group

(?<=<[^>]*) - look behind for < followed by zero or more characters not being >

[ ] - match a space

(?=[^>]*>) -look ahead for zero or more characters not being > followed by >

)+ - repeat this group

Simply replace with ,.

Update:

You can secure the matches further, so it requires the hardcoded values before the space, by including them in the look ahead, like this:

@"((?<=<[^>]*(@123|_456|_789|_abc_123|_ty))[ ](?=[^>]*>))+");
Poul Bak
  • 10,450
  • 5
  • 32
  • 57