0

I am trying to write a script which read a list o domains from a file and check if a domain is a WordPress site.

I got some errors when I am trying to use the mechanize library about form controls, and after searching the web I was not able to locate any similar solution.

The used code if as follow:

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")] 
base_url = br.open("http://www.isitwp.com/")


with open('domains') as f:
    for line in f:
        rowdict['website'] = str(line)
        br.select_form(nr=0)
        br['q'] = str(line)
        isitwp_response = br.submit()
        isitwp_response = isitwp_response.read()
        if "Good news everyone" in a:
            rowdict['iswordpresswebsite'] = "yes"
        else:
            rowdict['iswordpresswebsite'] = "no"

The errors are as follow:

File "./wp_checker.py", line 26, in <module>
br['q'] = str(line)

File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 796, in __setitem__
self.form[name] = val

File "/usr/local/lib/python2.7/dist-packages/mechanize/_form_controls.py", line 1956, in __setitem__
control = self.find_control(name)

File "/usr/local/lib/python2.7/dist-packages/mechanize/_form_controls.py", line 2348, in find_control
return self._find_control(name, type, kind, id, label, predicate, nr)

File "/usr/local/lib/python2.7/dist-packages/mechanize/_form_controls.py", line 2441, in _find_control
raise ControlNotFoundError("no control matching " + description)
mechanize._form_controls.ControlNotFoundError: no control matching name 'q'
chrysst
  • 347
  • 1
  • 6
  • 22
  • why do you search for br['q'] ? – MEdwin Feb 04 '19 at 15:47
  • @MEdwin is the q parameter used in http://www.isitwp.com/ to check if the domain is a wordpress site – chrysst Feb 04 '19 at 15:49
  • 1
    okay, I think for that site is is actually br['s'] and i found out that the results of the form is JavaScript-generated content. So mechanise might not be the best tool. I am suggesting selenium. – MEdwin Feb 04 '19 at 16:41
  • @MEdwin you have right about the 's' and thank you about this! – chrysst Feb 04 '19 at 16:57
  • @MEdwin do you have any example on how to use selenium in order to check if a specific text appears in the response(like in my code above)? – chrysst Feb 04 '19 at 17:27
  • yes, see this example here. It uses selenium PhantonJS to pull the site and beautifulsoup to find a text. really interesting and similar to what you are trying to achieve: https://stackoverflow.com/questions/13287490/is-there-a-way-to-use-phantomjs-in-python – MEdwin Feb 04 '19 at 17:46

1 Answers1

1

Saw this was in the python-requests section so I made this using requests instead of mechanize.

Nothing to explain, the code is self-explanatory

import requests

url = "https://www.isitwp.com/wp-admin/admin-ajax.php"
with open("domains.txt", "r+") as file: #change domains.txt to whatever your text file is named
    domains = file.read().splitlines()
    file.close()

def iswp(query):
    data = {
    "_ajax_nonce":"f7442b97c8", #you can get it from the website, just do CRTL+F and search for "nonce"
    "action":"get_result",
    "dataType":"json",
    "q":query,
    "recapt":""
    }
    r = requests.post(url, data=data).json()
    if (r["data"]["iswp"] == 0):
        print("{0} is not powered wordpress".format(query))
    else:
        print("{0} is powered by wordpress".format(query))

for domain in domains:
    iswp(domain)
snd
  • 228
  • 2
  • 6
  • I can see that this is working only if the wordpress installation is on the root of the domain. For example the above script cannot detect this: www.mysite.com/blog. Do you know if this can work for all cases? – chrysst Feb 05 '19 at 20:41
  • The code above depends 100% on "www.isitwp.com", therefore I can't do anything related to the detection, it's simply a function that sends a request to "isitwp.com" and checks whether the "iswp" JSON response is a 1 or a 0. – snd Feb 05 '19 at 21:26