1

I have this string (it's from EDI data):

ISA*ESA?ISA*ESA?

The * indicates it could be any character and can be of any length.

? indicates any single character.

Only the ISA and ESA are guaranteed not to change.

I need this split into two strings which could look like this: "ISA~this is date~ESA|" and

"ISA~this is more data~ESA|"

How do I do this in c#?

I can't use string.split, because it doesn't really have a delimeter.

Rufus L
  • 36,127
  • 5
  • 30
  • 43
Greg Gum
  • 33,478
  • 39
  • 162
  • 233
  • 5
    EDI is regular and can be parsed with regex. I however must defer to a regex (and preferably EDI) guru to give you the answer for this one. I have taken the liberty of adding the regex tag to try to attract some gurus. – hoodaticus Jun 27 '17 at 19:41
  • Typically you can read in the delimiters used by an EDI file in the first record, then act accordingly. – juharr Jun 27 '17 at 19:47
  • Possible duplicate of [Using Regex to split a string in C#](https://stackoverflow.com/questions/21156414/using-regex-to-split-a-string-in-c-sharp) – unconnected Jun 27 '17 at 19:49
  • Typically yes. The issue is that multiple ISA segments are being included in the file, so before the processing is done, the entire file needs to be split into the individual files. The only delimiters are the ISA and ESA segments. – Greg Gum Jun 27 '17 at 19:49
  • 2
    You can use the [Regex.Split](https://msdn.microsoft.com/en-us/library/ze12yx1d(v=vs.110).aspx) call to split the string using a regex. I don't completely follow what you want it split of, else I could have posted a clearer answer – Vikhram Jun 27 '17 at 19:49
  • 1
    Read in the first ISA (should be fixed width) then you'll have the record delimiter, so you search for that delimiter before the next ISA, and you'll have your first break, then repeat if the delimiters can change for each file. Alternatively you might want to search for existing tools that deal with EDI instead of rolling your own code. – juharr Jun 27 '17 at 19:55
  • Maybe [`(?s)ISA(?:(?![IE]SA).)*ESA(?![IE]SA).?`](https://regex101.com/r/rI9ATo/2) is what you need? – Wiktor Stribiżew Jun 27 '17 at 19:56
  • @GregGum: If you need a regex expert help, explain what you need with a clear example (real input text, expected output, why = requirements). – Wiktor Stribiżew Jun 27 '17 at 20:16
  • @Vikhram, your comment is what I ended up using. If you want to post it as an answer, I will accept. – Greg Gum Jun 28 '17 at 14:23
  • 1
    just curious: what happens if by change the content contains 'ISA'? Or 'ESA'? – eppye Jun 29 '17 at 18:03
  • @juharr, that is exactly what I ended up doing. – Greg Gum Jun 30 '17 at 18:56

8 Answers8

1

Simply use the

int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here 

and then just use the substring from 0 to that indexOf, indexOf to length.

Edit: If ? is not known, can we just use the regex Pattern and Matcher.

    Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
    if(matcher.find()) { 
         matcher.find();
         int x = matcher.start();
    }

Here x would give that start index of that match.

Edit: I mistakenly saw it as java one, for C#

  string pattern = @"ISA.*ESA";
  Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);

  Match m = myRegex.Match(whateverString);   // m is the first match
  while (m.Success)
  {
       Console.writeLine(m.value);
       m = m.NextMatch();              // more matches
  }
abstractnature
  • 456
  • 2
  • 9
1

RegEx will probably be the best for this. See this link

Mask would be

ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.

This will give you 2 groups with data you need

Match match = Regex.Match(input, @"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);

if (match.Success)
{
    var data1 = match.Groups["data1"].Value;
    var data2 = match.Groups["data2"].Value;
}

Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

Pablo notPicasso
  • 3,031
  • 3
  • 17
  • 22
  • To explain: .*? means match 0 or more (*) arbitrary characters (.) in a non-greedy manner (?), i.e. stop at the first occurrence the next pattern (here ESA) matches. If you know that between ISA and ESA there is always at least one character you can substitute * by +. The expression (?y) labels the pattern y with the name x. – ckuri Jun 27 '17 at 22:16
  • 1
    If you follow link I gave at the top of the answer, you will see full explanation of Regex on the right. – Pablo notPicasso Jun 28 '17 at 05:05
1

You can use Regex.Split for accomplishing this

string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";

var regex = new Regex($@"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);

foreach (var item in items) {
    Console.WriteLine(item);
}

Output:

ISA~this is date~ESA
ISA~this is more data~ESA|

Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.

To explain the Regex a bit:

(?<=ESA)   Look-behind assertion. This portion is not captured but still matched
(?=ISA)    Look-ahead assertion. This portion is not captured but still matched

Using these look-around assertions you can find the correct | character for splitting

Vikhram
  • 4,294
  • 1
  • 20
  • 32
0

Use a Regex like ISA(.+?)ESA and select the first group

string input = "ISA~mycontent+ESA";

Match match = Regex.Match(input, @"ISA(.+?)ESA",RegexOptions.IgnoreCase);

if (match.Success)
{
   string key = match.Groups[1].Value;              
}
Rene
  • 70
  • 6
0

It's kinda hacky but you could do...

string x = "ISA*ESA?ISA*ESA?";

x = x.Replace("*","~"); // OR SOME OTHER DELIMITER

string[] y = x.Split('~');

Not perfect in all situations, but it could solve your problem simply.

Ethan The Brave
  • 287
  • 4
  • 17
0

You could split by "ISA" and "ESA" and then put the parts back together.

string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";

string start = "ISA",
    end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);

var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";

firstPart = "ISA~this is date~ESA|"

secondPart = "ISA~this is more data~ESA|";

NtFreX
  • 10,379
  • 2
  • 43
  • 63
0

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:


Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$

Explanation:

  • ^ - asserts position at start of the string
    • ( - start capturing group
    • ISA - match string ISA exactly
    • .*?(?=ESA) - match any character 0 or more times, positive lookahead on the string ESA (basically match any character until the string ESA is found)
    • ESA - match string ESA exactly
    • . - match any character
    • ) - end capturing group
    • repeat one more time...
  • $ - asserts position at end of the string

Try it on Regex101


Example:

string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(@"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
    RegexOptions.Compiled);

Match match = regex.Match(input);
if (match.Success)
{
    string firstValue  = match.Groups[1].Value; // "ISA~this is date~ESA|"
    string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}
budi
  • 6,351
  • 10
  • 55
  • 80
0

There are two answers to the question "How to split a string by another string".

var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);

and

var matches = Regex.Split(input, "ISA").ToList();

However, the first removes empty entries, while the second does not.

Greg Gum
  • 33,478
  • 39
  • 162
  • 233