1

I am splitting a string with regex using its Split() method.

var splitRegex = new Regex(@"[\s|{]");

string input = "/Tests/ShowMessage { 'Text': 'foo' }";

//second version of the input: 
//string input = "/Tests/ShowMessage{ 'Text': 'foo' }";

string[] splittedText = splitRegex.Split(input, 2);

The string is just a sample pattern of the input. There are two different structures of input, once with a space before the { or without the space. I want to split the input on the { bracket in order to get the following result:

  • /Tests/ShowMessage
  • { 'Text': 'foo' }

If there is a space, the string gets splitted there (space gets removed) and i get my desired result. But if there isnt a space i split the string on the {, so the { gets removed, what i dont want though. How can i use Regex.Split() without removing the split condition character?

L. Guthardt
  • 1,990
  • 6
  • 22
  • 44

4 Answers4

5

The square brackets create a character set, so you want it to match exactly one of those inner characters. For your desire start off by removing them.

So to match it a random count of whitespaces you have to add *, the result is this one\s*.

  • \s is a whitespace
  • * means zero-or-more

That you don't remove the split condition character, you can use lookahead assertion (?=...).

  • (?=...) or (?!...) is a lookahead assertion

The combined Regex looks like this: \s*(?={)

This is a really good and detailed documentation of all the different Regex parts, you might have a look at it. Furthermore you can test your Regex easy and for free here.

0

In order to not include the curly brace in the match you can put it into a look ahead

\s*(?={)

That will match any number of white spaces up to the position before a open curly brace.

juharr
  • 31,741
  • 4
  • 58
  • 93
  • Exactly what i was looking for. But how do the `[ ]` affect my regex expression? Didnt understand it in this specific case. – L. Guthardt Nov 15 '17 at 13:49
  • Will this not over-split the string thanks to that pipe character ? – Caius Jard Nov 15 '17 at 13:50
  • @L.Guthardt The square brackets are for creating a character set. So `[\s|{]` will actually match any whitespace the pipe character or the open curly brace. – juharr Nov 15 '17 at 13:51
  • @CaiusJard Yes, I've fixed that. – juharr Nov 15 '17 at 13:52
  • 1
    @juharr I was asking because i tried the code with the brackets `[\s*(?={)]` and it removes the `{` again. Why though if it matches the position before the curly brace? – L. Guthardt Nov 15 '17 at 13:56
  • 1
    @L.Guthardt Again the brackets are for a character sets so you are asking it to match one of the characters inside and except for `\s` they lose their special meanings. – juharr Nov 15 '17 at 13:59
  • @juharr Thanks a lot. – L. Guthardt Nov 15 '17 at 14:01
0

You can use regular string split, on "{" and trim the spaces off:

var bits = "/Tests/ShowMessage { 'Text': 'foo' }".Split("{", StringSplitOptions.RemoveEmptyEntries);
bits[0] = bits[0].TrimEnd();
bits[1] = "{" + bits[1];

If you want to use the RegEx route, you can add the { back if you change the regex a bit:

var splitRegex = new Regex(@"\s*{");

string input = "/Tests/ShowMessage { 'Text': 'foo' }";

//second version of the input: 
//string input = "/Tests/ShowMessage{ 'Text': 'foo' }";

string[] splittedText = splitRegex.Split(input, 2);
splittedText[1] = "{" + splittedText[1];

It means "split at occurrence of (zero or more whitespace followed by {)" - so the split operation nukes your spaces (you want), and your { (you don't want) but you can put the { back with certainty that it will mean you get what you want

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • Yes adding it back or going the default `string.split` way is possible and produces my result, but that wasnt what i was looking for. Its more complex than i showed, just some plain sample. – L. Guthardt Nov 15 '17 at 13:51
0
var splitedList = srt.Text.Replace(".", ".#").Replace("?", "?#").Replace("!", "!#").Split(new[] { "#"}, StringSplitOptions.RemoveEmptyEntries).ToList();

This will split text for .!? and will not remove condition chars. For better result just replace # with some uniq char. Like this one for example '®' That is all. Simple as it is. No regex.split which is slow and difficult due to many different task criterias, etc...

passing-> "Hello. I'am dev!"

result (split condition character exist )

  1. "Hello."
  2. "I'am dev!"
Pit
  • 395
  • 2
  • 11