I'm a Perl amateur. Recently I was given a Perl script that takes a text file and removes all formatting except for the individual words follows by a space. The problem is that the script is unclear how to input a file location. I've set up some code to run through an entire directory of files, but haven't been able to get the code to execute yet. I'll post the original code followed by what I added. Thanks for the help!
Original:
while(<>) {
chomp;
s/\<[^<>]*\>//g; # eliminate markup
tr/[A-Z]/[a-z]/; # downcase
s/([a-z]+|[^a-z]+)/\1 /g; # separate letter strings from other types of sequences
s/[^a-z0-9\$\% ]//g; # delete anything not a letter, digit, $, or %
s/[0-9]+/\#/g; # map numerical strings to #
s/\s+/ /g; # these three lines clean up white space (so it's always exactly one space between words, no newlines
s/^\s+//;
s/\s+$/ /;
print if(m/\S/); # print what's left
}
print "\n"; # final newline, so whole doc is on one line that ends in newline
My Changes:
#!/usr/local/bin/perl
$dirtoget="1999_txt/";
opendir(IMD, $dirtoget) || die("Cannot open directory");
@thefiles= readdir(IMD); #
closedir(IMD);
foreach $f (@thefiles)
{
unless ( ($f eq ".") || ($f eq "..") )
{
$fr="$dirtoget$f";
open(FILEREAD, "< $fr");
$x="";
while($line = <FILEREAD>) { $x .= $line; } # read the whole file into one string
close FILEREAD;
print "$x/n";
while(<$x>) {
chomp;
s/\<[^<>]*\>//g; # eliminate markup
tr/[A-Z]/[a-z]/; # downcase
s/([a-z]+|[^a-z]+)/\1 /g; # separate letter strings from other types of sequences
s/[^a-z0-9\$\% ]//g; # delete anything not a letter, digit, $, or %
s/[0-9]+/\#/g; # map numerical strings to #
s/\s+/ /g; # these three lines clean up white space (so it's always exactly one space between words, no newlines
s/^\s+//;
s/\s+$/ /;
print if(m/\S/); # print what's left
}
print "\n"; # final newline, so whole doc is on one line that ends in newline
}}