0

EDIT : I have found how to post headers, and I know how to get the value, but need help to do it. Here is it.

  • I have to retrieve the webpage. (this part is ok)
  • Find on which line result "formulaire_action_args" appears and save it in a variable. (looking for a better way than a loop to do it)
  • And retrieve the attribute "value" from this line.

I am trying to get the content of an html page contained in a div like so:

<form id="formulaire_login" method="post" action="/spip.php?page=login&amp;lang=fr" enctype="multipart/form-data">
    <div>
        <input name="page" value="login" type="hidden">
        <input name="lang" value="fr" type="hidden">
        <input name="formulaire_action" type="hidden" value="login">
        <input name="formulaire_action_args" type="hidden" value="random_value">
    </div>
    <fieldset>
        <ul>
            <li class="editer_login obligatoire">
               <input type="text" class="text" name="var_login" id="var_login" value="" size="40">
            </li>
            <li class="editer_password obligatoire">
                <input type="password" class="password" name="password" id="password" value="" size="40">
            </li>
        </ul>
    </fieldset>
</form>

I want to get the content of form with id="formulaire_login" and inside this form get the value of the attribute "value" (random_value) of the input

<input name="formulaire_action_args" type="hidden" value="random_value">

And on a second hand, I am looking for a way to request a URL with POST header data.

ThePH
  • 13
  • 3

1 Answers1

0

If the full text of your page is in pagetext, you can retrieve that value via a Lua pattern match without needing to loop over anything:

value = pagetext:match('name="formulaire_action_args"[^>]*value="([^"]+)"')

print(value) --> random_value

The way Lua patterns (regular expressions, essentially) work is (1) most characters match themselves, (2) there are ways of specifying classes of characters to match, (3) there are ways of specifying how many of a specific character/class to match.

name="formulaire_action_args" --> match this text exactly
[^>]*                         --> match 0 or more characters that are NOT a > character
value="                       --> match this text exactly
([^"]+)                       --> find 1 or more characters that are NOT a quote character and "capture" it

More on Lua patterns.

Mud
  • 28,277
  • 11
  • 59
  • 92
  • @daurinimator: That post is true; you'd never want to actually parse HTML/XML/etc. using regular expressions, so if you wanted to do real web scraping regex are a recipe for pain. But in this particular instance, pulling one value out of one tag with a specific, likely-to-be-unique name, regex is more than up to the task. If you needed to do much more than that, you probably wouldn't even want to use Lua, given that it lacks libraries like Mechanize. – Mud May 13 '12 at 04:28