4

Let's say I'm trying to match a URL with regular expressions:

$text = 'http://www.google.com/';
$text =~ /\bhttp:\/\/([^\/]+)/;
print $1;  # It prints www.google.com

I would like to replace the pattern it matches with one space for each character in it. For instance, considering the example above, I would like to end up with this text:

# http://              /

Is there a simple way to do this? Finding out how many characters the matched pattern has and replacing it with the same number of different characters?

Thank you.

calvillo
  • 892
  • 1
  • 9
  • 23
  • 1
    **This might not be a job for regexes, but for existing tools in your language of choice.** Regexes are not a magic wand you wave at every problem that happens to involve strings. You probably want to use existing code that has already been written, tested, and debugged. In PHP, use the [`parse_url`](http://php.net/manual/en/function.parse-url.php) function. Perl: [`URI` module](http://search.cpan.org/dist/URI/). Ruby: [`URI` module](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html). .NET: ['Uri' class](http://msdn.microsoft.com/en-us/library/txt7706a.aspx) – Andy Lester Jul 25 '13 at 18:56
  • The URL thing was just an example. What I really needed to know was how to replace one pattern with as many characters of the pattern itself. For instance, this could also be used to mask a credit card number or something like that (xxxxxxxxxxxx1234). Thank you for your input. – calvillo Jul 25 '13 at 19:04

1 Answers1

6

One simple way is:

$text =~ s!\b(http://)([^/]+)!$1 . " " x length($2)!e;

The regexp \b(http://)([^/]+) matches a word boundary, the literal string http://, and one or more non-slash characters, capturing http:// in $1 and the non-slash characters in $2. (Note that I've used ! as the regexp delimiter above instead of the usual / to avoid leaning toothpick syndrome.)

The e switch at the end of the s/// operator causes the substitution $1 . " " x length($2) to be evaluated as Perl code instead of being interpreted as a string. It thus evaluates to $1 followed by as many spaces as there are letters in $2.

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • 2
    post-v10, you can use the `\K` regex operator to avoid an unneccessary capture: `s!\bhttp://\K([^/]+)!" " x length($1)!e`. This shortens the regex a bit. – amon Jul 25 '13 at 18:50
  • That's very impressive! I didn't even know the "x" operator existed. Thank you very much. – calvillo Jul 25 '13 at 18:50