So i have this file clip.txt that only contain:
<a href="https://en.wikipedia.org/wiki/Kanye_West">Kanye West</a>,
<a href="http://en.wikipedia.org/wiki/Chris_Martin">Chris Martin</a>
Now i would like to remove everything between <...> so that i end up with
Kanye West , Christ Martin.
with perl i have the current code:
#!/usr/local/bin/perl
$file = 'clip.txt';
open(FILE, $file);
@lines = <FILE>;
close(FILE);
$line = @lines[0];
while (index($line, "<") != -1) {
my $from = rindex($line, "<");
my $to = rindex($line, ">");
print $from;
print ' - ';
print $to;
print ' ';
print substr($line, $from, $to+1);
print '|'; // to see where the line stops
print "\n";
substr($line, $from, $to+1) = ""; //removes between lines
$counter += 1;
}
print $line;
all the "print" lines are rather redundant but good for debugging.
now the result becomes:
138 - 141 </a>
|
67 - 125 <a href="http://http://en.wikipedia.org/wiki/Chris_Martin">Chris Martin|
61 - 64 </a>, |
0 - 50 <a href="https://en.wikipedia.org/wiki/Kanye_West">|
Kanye West
First the script find position between 138 -141, and removes it. Then it finds 67 - 125 but it removes 67 - 137. Next it finds 61 - 64 but it removes 61 - 66.
Why does it do this? On the bottom line it finds 0 - 64, and it removes perfectly. So i cannot find the logic here.