1

Good day,

Can someone help me in the right direction here.

I have a string:

Task 10001:Bring cooldrinks
Task 10005:Waffle Iron,
this should of course be cleaned    
Task 10006:Remember Wife
Task 10000:Leave children

How do I break it up so that I can stick it per task into a list like:

List(0) = Task 10001: Bring cooldrinks
List(1) = Task 10005:Waffle Iron,this should of course be cleaned    
List(2) = Task 10006: Remember Wife
List(3) = Task 10000: Leave children

I would always receive the string like Task [number]: [Message]

Those inside the [] is the variables that will differ.

Joe
  • 46,419
  • 33
  • 155
  • 245
user1702369
  • 1,089
  • 2
  • 13
  • 31

3 Answers3

2

This should do it:

Task (?<number>[0-9]+):(?<message>(?:[^\n]+|\n(?!Task [0-9]+:))+)

It allows the messages to wrap lines, as per your Waffle Iron example.

If the numbers are always five digits, you can use [0-9]{5} instead of [0-9]+

It's using named capture groups (the (?<name>..) bit) for number and message, but of course can use normal capturing groups, or none at all if you're going to split the lines up separately, for example:

Task [0-9]+:(?:[^\n]+|\n(?!Task [0-9]+:))+


The key part of these expressions (matching the message without matching the next task) broken down is:

(?:
    [^\n]+
|
    \n(?!Task [0-9]+:)
)+

The first alternative matches as many non-newline characters as it can, if it fails it looks for a newline which is not followed by a new task, then repeats this as many times as it can (at least once), until it has consumed the message. ( If a message can be empty, change the final + to a * )

Peter Boughton
  • 110,170
  • 32
  • 120
  • 176
  • Jeeze these regular expressions, thanks for the answer.I'll will need to study your explanation. – user1702369 Sep 26 '13 at 13:58
  • No problem, let me know if there's anything that doesn't make sense. btw, [here's some info on doing named groups in C#](http://stackoverflow.com/questions/906493/regex-named-capturing-groups-in-net) – Peter Boughton Sep 26 '13 at 14:00
2
List<String> output=Regex.Matches(input,@"(?s)(?i)\bTask\b\s*\d+:.*?(?=\bTask\b|$)") 
                         .Cast<Match>()
                         .Select(x=>x.Value)
                         .ToList();
Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • This will fail for content like `Task 10008:Find multitasking software.` Also, doing a lazy `.*?` match is generally slower (hence why I went for the negative class / alternation route). – Peter Boughton Sep 27 '13 at 11:27
  • @PeterBoughton no that won't fail..that seems to work well on my pc – Anirudha Sep 28 '13 at 18:58
  • Eh? It's doing a lazy `.*?` with a case-insensitive lookahead for `Task`, so it will match `Task 10008:Find multi` instead of consuming the whole message. – Peter Boughton Sep 28 '13 at 19:44
  • This is doing exactly what I wanted. As an add-on question: Can this regular expression change sothat it can work with another fixed text.For example do exactly the same if the word is not only 'Task', but 'Bug' also.Can it be done or will I have to calculate them seperate and then add them the 2 lists in 1 List? – user1702369 Oct 04 '13 at 11:49
  • @user1702369 change `TASK` to `(TASK|BUG)`...`|` means OR – Anirudha Oct 04 '13 at 11:50
0

The advantage with regex is, it will filter all bad lines. Because they don't match the pattern.

Something like: ^Task\s\d+:.+ will do

Jeroen van Langen
  • 21,446
  • 3
  • 42
  • 57