0

INPUT : There's string that numbers, and a string, dots and spaces. Notice that e defines a the separator between the numbers.

e.27.3.90.. .e 3.50 2.30..e2.0.1.2. .50..

OUTPUT : I want to remove all the spaces and those extra dots except for the one that makes up following and add a , before e,

,e273.90,e3502.30,e2012.50

PS: There are so many posts regarding regex in various kind. I tried to build one, but seems like no success so far.

  1. Please propose any efficient one shot regex or ideas.
  2. Would like to hear the performance gain/loss of this regex vs multiple replace()

Here is the code I have been gasping ;)..:

      List<string> myList;     
      string s = "";     
      string s2 = "";          
      string str = "e.27.3.90..bl% .e 3.50 2.30. #rp.e2.0.1.2..50..y*x";
      s = Regex.Replace(str, @"\b[a-df-z',\s]+", "");                               
      myList = new List<string>(Regex.Split(s, @"[e]"));
Community
  • 1
  • 1
bonCodigo
  • 14,268
  • 1
  • 48
  • 91
  • 1
    Do you have any code that you have actually tried on your own..? – MethodMan Jan 04 '13 at 21:34
  • What is the precise rule for identifying the "." that should be kept in the string? – mbeckish Jan 04 '13 at 21:34
  • @mbeckish The way I read it, any decimal between two numbers stays. Any decimal next to another decimal or white space goes. – Forty-Two Jan 04 '13 at 21:38
  • 1
    @Forty-Two - That seems incorrect. For example, e.27.3.90.. . -> e273.90 – mbeckish Jan 04 '13 at 21:39
  • ...as well as decimals next to e, then – Forty-Two Jan 04 '13 at 21:40
  • 1
    @Forty-Two - Your explanation doesn't explain why the decimal between the 27 and 3 is removed. – mbeckish Jan 04 '13 at 21:41
  • you're right, I didn't even notice that. Just ignore me then :) – Forty-Two Jan 04 '13 at 21:42
  • s/\.(\d+)[. ]*(?:e|$)/.\1e/ The last decimal+numbers before 'e' stays. –  Jan 04 '13 at 21:43
  • @DJKRAZE I have updated the post with a code. +1 @mbeckish for correcting Forty-two. I can remove other special characters too in my current `regex`. – bonCodigo Jan 04 '13 at 21:48
  • 1
    Are you trying to extract all numbers from the string or do u really need them comma separated? – VladL Jan 04 '13 at 21:50
  • "except for the one that makes up following" - what does that mean??? – mbeckish Jan 04 '13 at 21:50
  • @mbeckish that means, I want to remove all dots, spaces except for the dot (sort of the true decimal separator) that make up the final string (expected output) Also want to add a `,` before each `e` :) e.g. `..e..3.4.5.6.0. ` to `,e345.60` Let me know if it's not clear. – bonCodigo Jan 04 '13 at 21:51
  • 1
    @bonCodigo - "remove all dots...except for the ones that make up the final string ". Obviously, you want to remove all dots except the one that should remain. But how do WE know which dot you want to remain? What is the rule? – mbeckish Jan 04 '13 at 21:53
  • @mbeckish between each number sets there's an `e`. Expected output requires a `.` before last two numbers e.g. ..e..3.4.5.6.0...e 21.45.3.0.` Logic would be to check a pattern like `..6...0. .e .` as the tail. I am well aware it's a *screwed up dirty string*.. :$ – bonCodigo Jan 04 '13 at 21:57
  • 1
    @bonCodigo - "Expected output requires a . before last two numbers" That's the missing piece. Thanks. – mbeckish Jan 04 '13 at 21:57

2 Answers2

2

Last str is your result

     string str = "e.27.3.90..bl% .e 3.50 2.30. #rp.e2.0.1.2..50..y*x";
     str = Regex.Replace(str, "[^e^0-9]", "");
     str = Regex.Replace(str, "([0-9]{2}?)(e|$)", ".$1,$2");

     //str = "," + str.Substring(0, str.Length - 1);
VladL
  • 12,769
  • 10
  • 63
  • 83
  • +1 Vlad You beat me to it; was going to do a Linq Lambda version but that is close enough to help him. – Greg Jan 04 '13 at 23:03
  • @bonCodigo as you said, the string is dirty, so 2 steps are needed, 1st - cleanup, 2nd - add separators – VladL Jan 04 '13 at 23:12
  • @Greg I love linq for one-line-solutions, but I think in this particular case it would make the code not-understandable :) – VladL Jan 04 '13 at 23:14
  • @Greg I am more than willing to see your `Linq Lambda` if you wish to post :) And @VladL I see that you are adding the leading `,` to the string. How can we achieve all (add `e`, `.` and leading `,` by second replace? – bonCodigo Jan 04 '13 at 23:19
  • @bonCodigo I don't see any chance to do it at once. But don't worry, the last commented string is an easy operation and will not cost you much CPU/memory. You can concatenate 2 last code strings into one, but I think you can handle it :) – VladL Jan 04 '13 at 23:50
  • @VladL I am not worried about last commented line at all... It's not about operational exhaution either. If I wanted to do string manupulations without `regex`, doubt I would post this answer. Just wanted to get it over with two repalces at max. Why not shoot the Linq Lambda too? – bonCodigo Jan 05 '13 at 00:00
  • @bonCodigo here was one answer with linq, but the author removed it, probably because it's not so universal/efficient/good readable as my regex solution – VladL Jan 05 '13 at 00:44
  • @VladL since it's removed can't see via any URL you may have pasted in the comment ;) I went around your comment letters like reading a *brail* no URL... – bonCodigo Jan 05 '13 at 00:58
  • @bonCodigo can you tell me what do u need this string for? – VladL Jan 05 '13 at 01:05
  • @bonCodigo I mean how do u want to use it? – VladL Jan 05 '13 at 01:11
  • @vladl not sure what you meant by, "how I want to use this"?... I just want to clean up the numbers to arrive in decimals. That's all. Thanks for the answer ;) – bonCodigo Jan 05 '13 at 17:26
  • @bonCodigo I meant how do you want to use the data, maybe yo want to have it splited or smth. like that. You are welcome :) – VladL Jan 05 '13 at 17:32
  • @VladL a `comma delimtted string` is more than enough. You know what, I actually got things working in this question. Perhaps the few issues I faced, what if there are more than one `e`. So how can to keep just the original expected results out of that. – bonCodigo Jan 05 '13 at 17:36
  • @bonCodigo we need more logic for that, maybe least and max number of digits before or after `e` in original or expected string. – VladL Jan 05 '13 at 17:41
0
  1. Remove all dots from the string.
  2. Split the string into separate items at each "e".
  3. For each item, add a dot before the last 2 digits.
  4. Recombine the items back into one string, placing a comma between items.

These steps are easily performed with the standard String methods, but you could use regexes if you want.

mbeckish
  • 10,485
  • 5
  • 30
  • 55