5

I have a string coming from a telnet client. This string contains backspace characters which I need to apply. Each backspace should remove one previously typed character.

I'm trying to do this in a single replace using regular expression:

string txt = "Hello7\b World123\b\b\b";
txt = Regex.Replace(txt, ".\\\b", "", RegexOptions.ECMAScript);

Which results in "Hello World12". Of course, I want "12" to be removed too, but it obviously doesn't match my expression.

In some way, it should repeat replacing until there are no more matches. Any ideas on how to achieve this with a single regular expression?

huysentruitw
  • 27,376
  • 9
  • 90
  • 133

2 Answers2

4

I wouldn't try to use a regular expression for this, since it's very impenetrable to read and I have the feeling that it's not even possible with plain regular expression without any perl-like regex magic-extensions. My suggestion would be something like (python like pseudocode):

stack = []
for char in str:
    if char == BACKSPACE and not stack.isEmpty():
        stack.pop()
    else:
        stack.push(char)

result = ''.join(stack)

It's immediately clear what happens and how it works.

Christos Lytras
  • 36,310
  • 4
  • 80
  • 113
Martin Thurau
  • 7,564
  • 7
  • 43
  • 80
  • @WouterHuysentruit: I iterate over the input string and manipulate the stack. – Martin Thurau May 17 '13 at 08:43
  • +1 I see, thanks. While I prefer this method for readability, I'll have to select KennyTM's answer because I explicitly asked for a regular expression. – huysentruitw May 17 '13 at 08:48
  • 2
    SO: Where get what you asked for and a dozen comments why it is a bad idea! – Martin Thurau May 17 '13 at 08:58
  • 1
    Well, our current code looks much like your proposal. But I was wondering how to do this with a single regular expression, which I couldn't figure out myself. That's the reason for my question. – huysentruitw May 17 '13 at 09:02
  • There's a bug in the pseudo code: If the stack is empty, you're pushing the backspace char on it. :) – huysentruitw Feb 18 '16 at 06:03
  • This simple logic works on any language and it's fast; we just have to check the stack/string after we check for the backspace char in a separate condition. Magnificent! – Christos Lytras Oct 28 '21 at 12:45
4

This is basically a variant of How can we match a^n b^n with Java regex?, so we could reuse its answer there:

var regex = new Regex(@"(?:[^\b](?=[^\b]*((?>\1?)[\b])))+\1");
Console.WriteLine(regex.Replace("Hello7\b World123\b\b\b", ""));

Additionally, the .NET regex engine supports balancing groups, so we could use a different pattern:

var regex = new Regex(@"(?<L>[^\b])+(?<R-L>[\b])+(?(L)(?!))");

(This means:

  1. Match one or more non-backspaces, assigning them with the name "L",
  2. then followed one or more backspaces, assigning them with the name "R", with the condition that every "R" must have one corresponding "L",
  3. if there are any "L"s left, abandon the match (as (?!) matches nothing).

)

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • In regex, doesn't `\b` mean word break? If you really want to match `\b` you would need to escape: `\\b`. – Buh Buh May 20 '13 at 07:43
  • @BuhBuh: The `\b` is inside a character class, which means `\u0008`. See http://msdn.microsoft.com/en-us/library/4edbef7e.aspx. – kennytm May 20 '13 at 09:27