0

I need to input text into the text box on this website:

http://www.link.cs.cmu.edu/link/submit-sentence-4.html

I then need the return page's html to be returned. I have looked at other solutions. But i am aware that there is no solution for all. I have seen selenium, but im do not understand its documentation and how i can apply it. Please help me out thanks.

BTW i have some experience with beautifulsoup, if it helps.I had asked before but requests was the only solution.I don't know how to use it though

user3530968
  • 11
  • 1
  • 5

3 Answers3

1

First, imho automation via BeautifulSoup is overkill if you're looking at a single page. You're better off looking at the page source and get the form structure off it. Your form is really simple:

<FORM METHOD="POST"
ACTION="/cgi-bin/link/construct-page-4.cgi#submit">
<input type="text" name="Sentence" size="120" maxlength="120"></input><br>
<INPUT TYPE="checkbox" NAME="Constituents" CHECKED>Show constituent tree &nbsp;
<INPUT TYPE="checkbox" NAME="NullLinks" CHECKED>Allow null links &nbsp;
<INPUT TYPE="checkbox" NAME="AllLinkages" OFF>Show all linkages &nbsp;
<INPUT TYPE="HIDDEN" NAME="LinkDisplay" VALUE="on">
<INPUT TYPE="HIDDEN" NAME="ShortLength" VALUE="6">
<INPUT TYPE="HIDDEN" NAME="PageFile" VALUE="/docs/submit-sentence-4.html">
<INPUT TYPE="HIDDEN" NAME="InputFile" VALUE="/scripts/input-to-parser">
<INPUT TYPE="HIDDEN" NAME="Maintainer" VALUE="sleator@cs.cmu.edu">
<br>
<INPUT TYPE="submit" VALUE="Submit one sentence">
<br>
</FORM>

so you should be able to extract the fields and populate them.

I'd do it with curl and -X POST (like here -- see the answer too :)).

If you really want to do it in python, then you need to do something like POST using requests.

Community
  • 1
  • 1
Laur Ivan
  • 4,117
  • 3
  • 38
  • 62
  • yep. this is definitely better to use a simple curl rather than use any framework just to return html. +1 – ddavison Apr 14 '14 at 14:22
  • Well i have to do this in python. There is another answer below, which has made the input a cake walk. If only you could help with extracting the html source back. – user3530968 Apr 14 '14 at 16:26
0

Pulled straight from the docs and changed to your example.

from selenium import webdriver

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# go to the page
driver.get("http://www.link.cs.cmu.edu/link/submit-sentence-4.html")

# the page is ajaxy so the title is originally this:
print driver.title

# find the element that's name attribute is Sentence
inputElement = driver.find_element_by_name("Sentence")

# type in the search
inputElement.send_keys("You're welcome, now accept the answer!")

# submit the form 
inputElement.submit()

This will at least help you input the text. Then, take a look at this example to retrieve the html.

Community
  • 1
  • 1
drez90
  • 814
  • 5
  • 17
  • Thanks Bud. Update. I can access the URL with ".current_url". However your example doesn't work for returning html. – user3530968 Apr 14 '14 at 16:32
0

Following OP's requirement of having the process in python.

I wouldn't use selenium, because it's launching a browser on your desktop and is overkill for just filling up a form and getting its reply (you could justify it if your page would have JS or ajax stuff).

The form request code could be something like:

import requests

payload = {
    'Sentence': 'Once upon a time, there was a little red hat and a wolf.',
    'Constituents': 'on',
    'NullLinks': 'on',
    'AllLinkages': 'on',
    'LinkDisplay': 'on',
    'ShortLegth': '6',
    'PageFile': '/docs/submit-sentence-4.html',
    'InputFile': "/scripts/input-to-parser",
    'Maintainer': "sleator@cs.cmu.edu"
}

r = requests.post("http://www.link.cs.cmu.edu/cgi-bin/link/construct-page-4.cgi#submit", 
                  data=payload)

print r.text

the r.text is the HTML body which you can parse via e.g. BeautifulSoup.

Looking at the HTML reply, I think your problem will be in processing the text within the <pre> tags, but that's an entirely different thing outside the scope of this question.

HTH,

Laur Ivan
  • 4,117
  • 3
  • 38
  • 62