3

I am writing a ksh script. I have a pair of html files in a directory and I need to check whether or not the files contain one of two strings (the strings are mutually exclusive). I then rename the files based on which of the two strings they contain.

When testing I was able to use the following code on .txt files, but the functionality no longer works when testing for the strings in .html files:

outageString='Scheduled Outage List'
jobString='Scheduled Job List'

for file in `ls -1t $fileNameFormat | head -n 2`
do
    if grep -xq "$outageString" "$file"; then
        mv "$file" "$outageFileName"
    elif grep -xq "$jobString" "$file"; then
        mv "$file" "$jobFileName"
    fi
done

Note: I have tested the ls command above independently and it returns the appropriate files.

File Content:

<html>
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 <title>
 OUS: Scheduled Outage List
 </title>
 </head>
 <body>
 <h3>
 OUS: Scheduled Outage List
 </h3>
 &nbsp; 
   .
   .
   .

Q: Does anyone have any insight as to why grep is not returning the appropriate value when searching for the strings in the two files(i.e., why grep does not recognize that the string exists in the file)?

Similar Question: How to test if string exists in file with Bash shell?

Community
  • 1
  • 1
Marcus Koz
  • 255
  • 5
  • 21

3 Answers3

6

Problem is in your use of:

grep -x

Since grep command with -x attempts to match exact full line. As per man grep:

-x, --line-regexp
    Only input lines selected against an entire fixed string or regular expression are 
    considered to be matching lines.

Just use grep -Fq instead of grep -xq.

PS: It is not recommended to use output from ls like this. Better use globbing directly in your for loop like this:

for file in *.html; do
    echo "processing $file"
done
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    That did the trick, thank you for the quick response. I should have looked more closely into the flags I had selected. – Marcus Koz Feb 25 '15 at 15:26
  • The reason I chose to use `ls` in the `for` loop was because I needed to access only the two most recently added files in the directory. – Marcus Koz Feb 25 '15 at 15:28
  • 1
    Only problem with using `ls` like this is when filenames have whitespace, newline etc. That is just fyi – anubhava Feb 25 '15 at 15:29
  • 1
    Yes, I have already run into this problem at a further point in the script and will attempt to use globbing directly to alleviate. Thanks for the insight – Marcus Koz Feb 25 '15 at 15:38
2

The -x option in grep matches an exact regexp match as a whole line, so because the line in the HTML document begins "OUS:" it won't match.

I can only guess that the .txt file didn't have this.

asimovwasright
  • 838
  • 1
  • 11
  • 28
0

try this:

  for file in $(grep -H "Scheduled Outage List" /path/to/files/*.html | cut -d: -f1);
do
        echo $file;
        # mv files around
 done
candymanuu
  • 110
  • 7