0

I am trying to grep the parts of an html form, specifically the action part i.e. <form action = ….
I originally tried:
grep -E -e 'form\s*action\s*=.*[.]html' ./*
but it did not work (despite the fact that there are such strings.
Then I tried the basic: grep -E -e 'form\s*action\s*=' ./* but this did not work either!
What am I doing wrong?

Jim
  • 18,826
  • 34
  • 135
  • 254

2 Answers2

1

This wont get you the action. It will get you the part just before the action. For example if you have <form id="myForm" action="myFile.php">the regexp will just get you form id="myForm" action=

So try in stead:

grep -E -o -i -e '<form\s+[^>]*action\s*=[^>]*>' ./*

[^>]* means everything except >, zero or more times.
-o means only get the matching part
-i means case insensitive

nl-x
  • 11,762
  • 7
  • 33
  • 61
  • Even this `grep -E -e 'form\\s*action\\s*=' ./*` does not give any results or `grep -E -e '\
    ' ./*` does not work either
    – Jim Aug 28 '13 at 11:38
  • @Jim I was wrong about double escaping. I edited my answer. I tested it, and it works – nl-x Aug 28 '13 at 11:43
  • `grep -E -e '
    ]*action\\s*=[^>]*>' ./*` does not work either
    – Jim Aug 28 '13 at 11:44
  • It needs `-r`! Your test was in the current directory probably! – Jim Aug 28 '13 at 11:57
  • I would add a `\b` after `form`, or replace the first `\s*` with `\s+`, else you can match ``. Maybe it's unnecessary... Else, works fine for me too, +1. – Bentoy13 Aug 28 '13 at 11:57
  • @Jim -r is recursive. You didn't mention you wanted to go recursive. My test was on a single file – nl-x Aug 28 '13 at 11:59
  • @Bentoy13 Tnx. It was indeed supposed to be \s+ ... I'll edit it – nl-x Aug 28 '13 at 12:00
  • All I couldn't manage yet is to only return the contents of the action attribute. I can only return the entire regexp match. I tried making a group with parenthesis `(...)` , but GREP just ignores that... Maybe I should name the group, or add some extra switch – nl-x Aug 28 '13 at 12:02
  • @nl-x I think that you can achieve returning only the action part using lookbehind and adding option `-P` for that. It's up to you! – Bentoy13 Aug 28 '13 at 12:08
  • According to http://stackoverflow.com/a/1891890/1209443 I can use Grep multiple times by piping it. I can see how that would work... – nl-x Aug 28 '13 at 12:19
  • @nl-x Yeah, with grep, only way to do that because you cannot specify variable-length pattern into lookbehind. So yes, pipe seems to be the only way! – Bentoy13 Aug 28 '13 at 12:29
0

Why not use a html parser/xpath implementation? Like my Xidel:

This returns the url in the action part:

xidel ./* -e //form/@action

Or with pattern matching, instead xpath:

xidel ./* -e '<form action="{.}"/>*'

You can even do all further processing in it. E.g. to not only get the action, but also the values of all input-elements url-encoded you can use:

xidel ./* -e //form/form(.)
BeniBela
  • 16,412
  • 4
  • 45
  • 52