2

I have a file with multiple backspace characters (^H) in it. I'd like to be able to "apply" those backspaces within perl. I found a few solutions, but none of them worked in my case. The critical line is this one:

test>>M^H ^HManagement.^H^H^H^H^H^H^H^H^H^Hanagement.F^H ^HFiles.^H^H^H^H^Hiles.s^H ^Hs.^H ^Hc^H ^H^H ^Hscript.^H ^H^H^H^H^Hripts^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^Hscripts.^H.s^H ^Hshow_file ^H^H^H^H^H^H^H^H^Hhow_file = transform_factory_to_running^M

The result should look like this:

test>>Management.Files.scripts.show_file = transform_factory_to_running^M

Within vi i am able to transform the text correctly as suggested in https://stackoverflow.com/a/1298728/2837411. But the perl solution, which is also suggested in this question: https://stackoverflow.com/a/1298970/2837411 didnt worked for me (using $_):

s{([^\x08]+)(\x08+)}{substr$1,0,-length$2}eg;

The output for this is:

test>>Management.Files.sscriptriptscripts.show_file = transform_factory_to_running^M

All the backspaces are vanished but it looks like as if a few of them are applied to a another backspace?!

Community
  • 1
  • 1
derhelge
  • 93
  • 1
  • 8

2 Answers2

2

This is simply done in a loop of substitutions

It repeatedly removes all instances of a backspace at the start of the line (where it has no effect) or a non-backspace character followed by a backspace (emulating the deletion of the preceding character)

Note that I have had to use \cH instead of \b inside the regex pattern because the latter is a word boundary anchor in this context

use strict;
use warnings;
use v5.10;

my $s = 'M^H ^HManagement.^H^H^H^H^H^H^H^H^H^Hanagement.F^H ^HFiles.^H^H^H^H^Hiles.s^H ^Hs.^H ^Hc^H ^H^H ^Hscript.^H ^H^H^H^H^Hripts^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^Hscripts.^H.s^H ^Hshow_file ^H^H^H^H^H^H^H^H^Hhow_file = transform_factory_to_running^M';
$s =~ s/\^H/\b/g; # convert `^H` to backspace

1 while $s =~ s/(?:^|[^\cH])\cH//g;

say $s;

output

Management.Files.scripts.show_file = transform_factory_to_running^M

Update

Here's a version that processes the string as a stream of characters, similar to simbabque's solution but going from left to right instead

Essentially any backspace removes a character from the end of the $result buffer, if there is one to remove, while any other character is simply appended

The output is identical to that of the code above

use strict;
use warnings;
use v5.10;

my $s = 'M^H ^HManagement.^H^H^H^H^H^H^H^H^H^Hanagement.F^H ^HFiles.^H^H^H^H^Hiles.s^H ^Hs.^H ^Hc^H ^H^H ^Hscript.^H ^H^H^H^H^Hripts^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^Hscripts.^H.s^H ^Hshow_file ^H^H^H^H^H^H^H^H^Hhow_file = transform_factory_to_running^M';
$s =~ s/\^H/\b/g;

say apply_backspace_characters($s);

sub apply_backspace_characters {

  my $result;

  for my $c ( split //, shift ) {
    if ( $c eq "\b" ) {
      substr($result, -1) = '';
    }
    else {
      $result .= $c;
    }
  }

  $result;
}
Community
  • 1
  • 1
Borodin
  • 126,100
  • 9
  • 70
  • 144
0

Here is a very explicit solution that is probably not the fastest. However, it gets the job done.

sub apply_backspace_characters {
    my $string = shift;

    # replace the ^H characters with one BS char
    $string =~ s/\^H/chr(8)/ge;

    my @output;
    my $backspace_count = 0; # keep track of how many BS we have seen in a row

    # iterate over string by char from the right
    foreach my $char ( reverse split //, $string ) {
        if ( $char eq chr(8) ) {
            # it's a backspace, increase counter and skip
            $backspace_count++;
            next;
        }
        if ($backspace_count) {
            # there are still backspaces on the 'stack', decrease counter and skip
            $backspace_count--;
            next;
        }
        # no backspaces left, keep this character and put at front
        # (because we are going backwards)
        unshift @output, $char;
    }

    return join '', @output;
}

say apply_backspace_characters(
    "test>>M^H ^HManagement.^H^H^H^H^H^H^H^H^H^Hanagement.F^H ^HFiles.^H^H^H^H^Hiles.s^H ^Hs.^H ^Hc^H ^H^H ^Hscript.^H ^H^H^H^H^Hripts^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^Hscripts.^H.s^H ^Hshow_file ^H^H^H^H^H^H^H^H^Hhow_file = transform_factory_to_running^M"
);

This will output the following.

test>>Management.Files.scripts.show_file = transform_factory_to_running^M
simbabque
  • 53,749
  • 8
  • 73
  • 136
  • Cool! Thats a clean answer. Thanks a lot! – derhelge Aug 03 '15 at 09:45
  • I would be happy to add stuff to the answer if the downvoter explains why this is not a useful answer. – simbabque Aug 03 '15 at 14:15
  • You've made things very difficult for yourself by processing the string backwards! See the update on my solution. Well done getting it going! – Borodin Aug 03 '15 at 14:36
  • @borodin I wanted to avoid the `substr` operation and chose to work on an array instead. It's a bit more complicated, but it's also more verbose. That's why I said explicit. – simbabque Aug 03 '15 at 14:50
  • @simbabque: You can use an array or a string -- either is good. The algorithm in my solution would have a `pop @result` if the character is a backspace, and a `push @result, $c` if not – Borodin Aug 03 '15 at 15:56