0

im trying to print the content of a html table cell.

i thought the easiest way to do this was with grep, but for some reason the regex works on regexr.com but not within Grep.

Maybe something with escaping? i tried escaping al the smaller and larger than <> symbols.

This is the code i'm using

wget -q -O login.html --save-cookies cookies.txt --keep-session-cookies --post-data 'username=sssss&password=fffff' http://ffffff/login

wget -q -O page.html --load-cookies cookies.txt http://ffffff/somepage |grep -P '(?<=<tr><td class=list2>www</td><td class=list2 align=center>A</td><td class=list2 >)(.*?)(?=</td><td class=list2 align=center><input type=checkbox name=arecs5)' |recode html...ascii 

Can anybody help me please? I'm from the netherlands so sorry for my english.

i aslo tried adding the -c option and it printed 0

EDIT:

Added my full code, i found 1 mistake. i didn't have the -O parameter to output the page's html. but it still doesnt work. it prints nothing

R. Leroi
  • 19
  • 1
  • 1
  • 6
  • 1
    Take care that regex and html are not good friends! – Toto Feb 04 '14 at 18:10
  • i've read that yes, but i don't care about security since its not like the page i'm trying to wget is gonna hack me.. or is that not what you mean? – R. Leroi Feb 07 '14 at 13:35
  • 1
    I'd say that parsing html with regex is really hard, have a look at: http://stackoverflow.com/a/4234491/372239 – Toto Feb 07 '14 at 13:38
  • well, that's ok. i'm only using this part, it's a very simple task, it just doesn't work. haha. – R. Leroi Feb 07 '14 at 14:16
  • Regexes work fine until they don't because the HTML changes. See http://htmlparsing.com/regexes longer explanation. – Andy Lester Feb 07 '14 at 15:22

3 Answers3

1

Traditional grep doesn't support lookarounds the way you're using it.

Try using grep -P (PCRE):

grep -P 'pattern' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • I tried adding P but still nothing, my whole line actually is wget --load-cookies cookies.txt url |grep -P 'regex'|recode html...ascii maybe that changes something? maybe i have to select the file path instead of using wget pipelined directly? – R. Leroi Feb 07 '14 at 09:18
  • You can do it as: `wget --load-cookies cookies.txt url | grep "$pattern"` and make sure `$pattern` is set earlier. – anubhava Feb 07 '14 at 10:30
  • tried that too, do i have to use the double quotes around $PATTERN ? when i don't add them it print Grep: missing ')' or something – R. Leroi Feb 07 '14 at 14:17
  • Yes definitely use double quotes as I wrote in my comments. Shell wont expand your variable otherwise. – anubhava Feb 07 '14 at 14:33
0

Consider using Ack or ag that supports natively PCRE.

Yann Moisan
  • 8,161
  • 8
  • 47
  • 91
0

Finally, it works. I added -qO- to wget, i don't know why but when adding a - after the -O it works.

R. Leroi
  • 19
  • 1
  • 1
  • 6