I am trying to grep the parts of an html form, specifically the action part i.e. <form action = ….
I originally tried:
grep -E -e 'form\s*action\s*=.*[.]html' ./*
but it did not work (despite the fact that there are such strings.
Then I tried the basic: grep -E -e 'form\s*action\s*=' ./*
but this did not work either!
What am I doing wrong?
Asked
Active
Viewed 263 times
0

Jim
- 18,826
- 34
- 135
- 254
-
For the `./*` part, you mean current directory? And what is the error message? – Juto Aug 28 '13 at 11:32
-
So you want to match the content of the action or you want only ` – Ibrahim Najjar Aug 28 '13 at 11:32
-
`xs` is a file. Like html – Jim Aug 28 '13 at 11:43
-
@nl-x Don't escape the backslash, it's only for grep and well protected with single quotes. Else it will match a backslash followed with a 's'. – Bentoy13 Aug 28 '13 at 11:45
2 Answers
1
This wont get you the action. It will get you the part just before the action. For example if you have <form id="myForm" action="myFile.php">
the regexp will just get you form id="myForm" action=
So try in stead:
grep -E -o -i -e '<form\s+[^>]*action\s*=[^>]*>' ./*
[^>]*
means everything except >
, zero or more times.
-o
means only get the matching part
-i
means case insensitive

nl-x
- 11,762
- 7
- 33
- 61
-
Even this `grep -E -e 'form\\s*action\\s*=' ./*` does not give any results or `grep -E -e '\ – Jim Aug 28 '13 at 11:38
-
@Jim I was wrong about double escaping. I edited my answer. I tested it, and it works – nl-x Aug 28 '13 at 11:43
-
-
-
I would add a `\b` after `form`, or replace the first `\s*` with `\s+`, else you can match `
`. Maybe it's unnecessary... Else, works fine for me too, +1. – Bentoy13 Aug 28 '13 at 11:57 -
@Jim -r is recursive. You didn't mention you wanted to go recursive. My test was on a single file – nl-x Aug 28 '13 at 11:59
-
-
All I couldn't manage yet is to only return the contents of the action attribute. I can only return the entire regexp match. I tried making a group with parenthesis `(...)` , but GREP just ignores that... Maybe I should name the group, or add some extra switch – nl-x Aug 28 '13 at 12:02
-
@nl-x I think that you can achieve returning only the action part using lookbehind and adding option `-P` for that. It's up to you! – Bentoy13 Aug 28 '13 at 12:08
-
According to http://stackoverflow.com/a/1891890/1209443 I can use Grep multiple times by piping it. I can see how that would work... – nl-x Aug 28 '13 at 12:19
-
@nl-x Yeah, with grep, only way to do that because you cannot specify variable-length pattern into lookbehind. So yes, pipe seems to be the only way! – Bentoy13 Aug 28 '13 at 12:29
0
Why not use a html parser/xpath implementation? Like my Xidel:
This returns the url in the action part:
xidel ./* -e //form/@action
Or with pattern matching, instead xpath:
xidel ./* -e '<form action="{.}"/>*'
You can even do all further processing in it. E.g. to not only get the action, but also the values of all input-elements url-encoded you can use:
xidel ./* -e //form/form(.)

BeniBela
- 16,412
- 4
- 45
- 52