1

I have this data in format

"NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1"

the format is Item1:Item1Qty_Item2:Item2Qty.........ItemN:ItemNQty

I need to separte the the items and their corresponding quantities and form arrays. I did the item part like this..

var allItemsAry = Regex.Replace(myString, "[\\:]+\\d", "").Split('_');

Now allItemsAry is correct like this [NEW ITEM, BELT, JEANS, BELT, SUIT 3 PCS, SHOES]

But I can't figrure out how to get qty, whatever expression I try that 3 from SUIT 3 PCS comes along with that, like these

var allQtyAry = Regex.Replace(dataForPackageConsume, "[^(\\:+\\d)]", "").split(':') 

This comes up as :1:3:1:13:1:1 (when replaced). So I can't separate by : to get make it array, as can be seen the forth item is 13, while it should be 1, that 3 is coming from SUIT 3 PCS. I also tried some other variations, but that 3 from SUIT 3 PCS always pops in. How do I just get the quantities of clothes (possible attached with : so I can split them by this and form the array?

UPDATE : If I didn't make it clear before I want the numbers that are exactly preceded by : along with the semicolon.

So, what I want is :1:3:1:1:1:1.

Erik Schierboom
  • 16,301
  • 10
  • 64
  • 81
Razort4x
  • 3,296
  • 10
  • 50
  • 88

5 Answers5

3

Instead of removing everything except numerals, how about matching only numerals?

For instance:

Regex regex = new Regex(@":\d+");
string result = string.Empty;
foreach (Match match in regex.Matches(input))
    result += match.Value;
Nolonar
  • 5,962
  • 3
  • 36
  • 55
  • Thanks! It does gets the work done, but still, can you think of any way I could do this with just a single replace? – Razort4x Jun 13 '13 at 10:50
  • @Razort4x Out of curiosity: why do you absolutely want to use a replace? – Nolonar Jun 13 '13 at 10:52
  • @Razort4x - Two good reasons are 1) 'cause it might be (_a lot_) faster; and 2) so I can learn more about regexes :-) – robinCTS Jun 13 '13 at 12:13
  • I don't think `Replace` is going to perform faster (or slower) than my solution, but I agree with trying to learn new things. Unfortunately, I belong to the lazy breed of programmers; I don't like making things more complicated than they need to be, unless I have a really good and practical reason to do so. – Nolonar Jun 13 '13 at 12:23
  • 1) Actually your solution _is_ slower and for a long enough string, could be crippling slow. See [here](http://blog.strictly-software.com/2012/09/using-string-builders-to-speed-up.html) for a nice description of the problem, [here](http://www.dotnetperls.com/stringbuilder) for a C# tutorial, or google `speed up string concatenation` or `stringbuilder` for more examples. Plus the slowdown for the loop. 2) So you're saying 5 statements in 5 lines of code (one of which is a loop) is _less_ complicated than 1 statement (ok,ok, 1 _compound_ statement comprising 2 statements) in 1 line? ;) – robinCTS Jun 17 '13 at 03:02
  • @robinCTS `So you're saying 5 statements in 5 lines of code is less complicated than 1 statement in 1 line?` No, I'm saying I don't know how `Regex.Replace` works internally :p – Nolonar Jun 17 '13 at 03:14
  • 1
    @robinCTS Jokes aside, I know that a string concatenation is slower than a StringBuilder, I just wanted to keep my answer short and simple. Also, I did say I don't like to make things too complicated unless there's a really good and practical reason to do so. The only time I replaced a simple Regex with a complicated StringBuilder construct, was exactly because the Regex was becoming too slow for comfort. You should know, that depending on the code, a simple SelectSort can be faster than a complicated QuickSort. Most IT students make the mistake of considering only complexity and nothing else. – Nolonar Jun 17 '13 at 03:23
  • I'm confused. Are we talking about speed or complexity? So you don't know how `Regex.Replace` works internally, but you do/don't know how `Regex.Matches` does work, and can justify 5 lines / 5 statements as less complex than 1 line / 1 statement? – robinCTS Jun 17 '13 at 03:24
  • 1
    @robinCTS I think there's been a misunderstanding. I never said complex, I said complicated. I was talking about human understanding, not code complexity. – Nolonar Jun 17 '13 at 03:27
  • Gotcha. So my regex is slightly more complicated than yours! Personally, since I know regexes very well, I would never consider using a loop where I could use a single statement. The complexity of the regex does not usually vary much for either case. And the potential speed improvement could be enormous. – robinCTS Jun 17 '13 at 03:32
3

[^\d:]+|:(?!\d)|(?<!:)\d+

[^\d:]+ will match all non-digit non-:s.

:(?!\d) will match all :s not followed by a digit (negative lookahead).

(?<!:)\d+ will match all digits not preceded by a : (negative lookbehind).


Source

NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1

Regular Expression

[^\d:]+|:(?!\d)|(?<!:)\d+

Results Match

NEW ITEM
_BELT
_JEANS
_BELT
_SUIT 
3
 PCS
_SHOES
Patashu
  • 21,443
  • 3
  • 45
  • 53
  • The third part of the alternation, `(?<!:)\d+`, will fail if, as I suspect and you tried to allow for, any qty is greater than 10. The correct regex should be `(?<![:\d])\d+`. The second part of the alternation is almost certainly redundant. I doubt there will be any extraneous `:` in the item descriptions. And, lastly, you still need to remove the leading `:` - "never give the answer the OP asks for, rather the answer he actually needs" ;-) – robinCTS Jun 13 '13 at 12:08
2

You want it only numbers like :1:3:1:1:3:1:1 ?

string s = "NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1";
var output = Regex.Replace(s, @"[^0-9]+", "");
StringBuilder sb = new StringBuilder();
foreach (var i in output)
{
    sb.Append(":" + i);
}
Console.WriteLine(sb); // :1:3:1:1:3:1:1

Here is a DEMO.

Ok, if every char is digit after : then you can use it like;

string s = "NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1";
var array = s.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sb = new StringBuilder();
foreach (var item in array)
{
    if (Char.IsDigit(item[0]))
    {
        sb.Append(":" + item[0]);
    }
}

Console.WriteLine(sb); //:1:3:1:1:1:1

DEMO.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
1

This will work with one replace:

var allQtyAry = Regex.Replace(dataForPackageConsume, @"[^_:]+:", "").split('_')

Explanation:

[^_:] means match anything that's not a _ or a :

[^_:]+: means match any sequence of at least one character not matching either _ or :, but ending with a :

Since regular expressions are greedy by default (ie they grab as much as possible), matching will start at the beginning of the string or after each _:

NEW ITEM: 1_BELT: 3_JEANS: 1_BELT: 1_SUIT 3 PCS: 1_SHOES: 1

Removing the matched parts (the italic bold bits above) results in:

1_3_1_1_1_1

Splitting by _ results in:

[1, 3, 1, 1, 1, 1]

robinCTS
  • 5,746
  • 14
  • 30
  • 37
  • Thanks! That worked. But could you please explain how it works? AFAIK The part after `+` says match anything before it, one or more time, the `:` matches it self, but what does `[^_:]` do? I know `[^]` says don't match anything in here, but how is it returning the numbers? You have no where specified `\d`? Can you please explain? – Razort4x Jun 13 '13 at 10:59
  • @Razort4x First, we match all non-_ non-: strings that are followed by a :, so we match `NEW ITEM:`, `BELT:` and so on. We replace all those matches with "", so now we have only everything else - we have numbers separated by underscores, like 1_3_3_ etc. Then we split by underscores and thus we have an array of quantities. – Patashu Jun 13 '13 at 11:05
  • @Patashu - Thanks for chipping in with a quick reply. I've updated the answer with a more verbose explanation. – robinCTS Jun 13 '13 at 12:30
0

Try this regex [^:\d+?].*?(?=:), it should do the trick

string[] list = Regex.Replace(test, @"[^:\d+?].*?(?=:)", string.Empty).Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);

The regex matches and replaces with an empty string everything preceding the colon : (exclusive) .*?(?=:). It also excludes :# from the match [^:\d+?] thus you end up with :1:3:1:1:1:1 before the split

Jason
  • 3,844
  • 1
  • 21
  • 40
  • Sigh. Whilst it mostly works (try "+SIZE HAT" for the first item ;-)), and you did answer with _exactly_ what the OP wanted, since the OP is clearly still learning regexes: 1) `[` & `]` form a _character_ class not an expression group, thus using `+?` is wrong (see [here](http://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended)); 2) Avoid `.*` if at all possible, even if lazy (see comments 4 & 6 [here](http://stackoverflow.com/q/14541450/1961728)); 3) It's simpler to avoid look-arounds (+ sometimes they're not allowed); 4) Thus your regex should be `[^:\d][^:]+` – robinCTS Jun 15 '13 at 19:20
  • @robinCTS #1-4, pts taken :) but of course I provided exactly what the OP wanted, that's the format we have to go by to generate the regex no? Try this on your regex `NEW ITEM:1_BELT:3_JEANS:1_BELT:1 SUIT 3 PCS:1_SHOES:1` ;) – Jason Jun 15 '13 at 19:56
  • Try this on _your_ regex `NEW ITEM:1_BELT:3_JEANS:1_BELT 1_SUIT 3 PCS:1_SHOES:1` :P It breaks _every_ regex! Both these examples are invalid. As per OP specs, the `:` ***and*** `_` are the delimiters. Omitting them will obviously break most regexes. Nothing was said about the format of the item names, though. The only assumption we can make is that they don't contain any (unescaped) delimiters. As to what the OP wanted; he wanted the quantities in an array. Just because he thought that the string `:1:3:1:1:1:1` was the way to do it doesn't mean a) it's the best way; or b) even possible! – robinCTS Jun 17 '13 at 02:08