3

Is there any useful combination of commands (sed/grep/find etc.) I can use for detecting .php files not starting with a comment? I could write a little script of course, but I'd rather use shell commands.

Matching pattern:

<?php
/*

I'd like searching in the contents of the file, not the file names.

I have to deal with a hacked website where code-injection follows a certain pattern.

<?php $code....
/*

or

<?php
$code....
/*
christian
  • 180
  • 7
  • [Get inspired for this solution](http://stackoverflow.com/questions/21368838/how-do-i-find-all-files-that-do-not-begin-with-a-given-prefix-in-bash) and give us more code next time. – ODelibalta Jul 20 '16 at 21:20
  • I am **not** searching for filenames. And there is no more code I can provide, since the pattern I'd like to grep all the files for is ` – christian Jul 20 '16 at 21:24
  • If you're not searching for the filenames, what is your expected output? – kenorb Jul 21 '16 at 00:24

5 Answers5

3

Using gnu grep you can use this recursive search:

grep -rvlz $'^[[:space:]]*<?php\n/\*' --include='*.php'
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • I liked the approach but it didn't work out, since it is grepping the whole file, and only returns files which do not contain _any_ comment. I have to search for the first few bytes only. – christian Jul 21 '16 at 06:19
  • 1
    Awesome, exactly what I was looking for. Can you explain the `^[[:space:]]*`-magic, please? – christian Jul 21 '16 at 06:39
  • That is POSIX class property for matching any whitespace including new lines. Anchor ^ makes sure we match it at start only. – anubhava Jul 21 '16 at 07:06
  • 1
    @EdMorton: Fair point. `-E` wasn't needed here and got added as mistake. I made an edit using my mobile this morning without testing. It is fixed now – anubhava Jul 21 '16 at 14:04
1

This will detect all php files that start with a php tag;

find ./ -iname '*.php' | xargs head -v -n 1 | grep -B 1 '<?php'
  • Find all files with php extensions.
  • head the first line and include the filename.
  • grep this output to find any files that start with php tag.
  • -B 1: keep 1 line before the match so we get the filename.

This is quick and dirty, you can get fancy to make the output nicer or make it more robust.

Dan
  • 10,614
  • 5
  • 24
  • 35
0

From the shell you could use this little awk script to find out if the file starts not with the comment:

awk 'NR==2 && f$0!="<?php/*"{print FILENAME}NR>2{exit 1}{f=$0}' file.php

To apply the script recursively to a directory use:

find -name '*.php' \
  -exec awk 'NR==2 && f$0!="<?php/*"{print FILENAME}NR>2{exit 1}{f=$0}' {} \;

Possible spaces are a limitation to the above solution but it can be easily adapted to that by removing all possible spaces before comparing against <php?/*

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
-1

awk is your friend:

 find /your/path/here -type f -iname "*.php" -exec \
 awk 'FNR==2{if($0~/^\/\*/){print FILENAME};exit}' {} \;

Notes

  1. {} is the argument passed to awk by find.
  2. awk builtin FILENAME contains the current file being processed
  3. $0~/^\/\*/ searches for /* in the beginning of the second line
  4. FNR==2 looks chooses the record number to process, exit is used to exit awk after processing the needed record.
sjsam
  • 21,411
  • 5
  • 55
  • 102
-1

The UNIX tool to find files is the very appropriately named find and the UNIX tool do to general purpose text manipulation is awk:

find . -name '*.php' -print |
xargs awk -v RS='^$' 'index($0,"<?php\n/*")==1{print FILENAME}'

The above uses GNU awk for multi-char RS. We use index() to enforce a string rather than regexp search since your target string contains multiple regexp metacharacters so this saves us escaping them all.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thanks. Awk was always a mystery to me but your solution doesn't seem to work completely. I am not quite sure if this might be due to the different kind of line endings (\r\n vs \n). Also I was searching for files **not** matching the pattern. :) – christian Jul 21 '16 at 06:29
  • Yes if your line endings are `\r\n` then just change `\n` to `\r\n` in the script but that can't be it since the solution you accepted assumes `\n` line endings. If you want to search for files that do *not* start with the string, just change `==` to `!=`. – Ed Morton Jul 21 '16 at 11:50