4

I've got a file of domain names e.g. equivalent to 2500.

I would like to do a whois on these domain names.

Issue is I've never done that and don't know where to start. If you have any ideas, I'm all ears.

TIA.

Andy K
  • 4,944
  • 10
  • 53
  • 82
  • Do you just want to download each whois record for each domain in your list? Or would you like to parse out registrant information? – Alex Riley Nov 08 '14 at 18:34
  • Hi Ajcr, what do you mean by parse out registrant information? – Andy K Nov 08 '14 at 22:57
  • Hi @AndyK, I just wondered what you were hoping to capture: if you were hoping to simply get the result of a whois lookup for each domain (and say, write the result of each lookup to its own text file) or if you were looking to automatically extract contact information for the domain owners (e.g., names, phones numbers) and store this information in, for instance, a CSV file. – Alex Riley Nov 09 '14 at 11:05
  • Hi @Ajcr, I would say the second option. But I have no clue on how to do it... – Andy K Nov 09 '14 at 11:28
  • Please give an example of a single record from that file and what answer do you expect to get for that record. – Jyrkka Nov 13 '14 at 16:08
  • hi Jyrkka, domain name would be for examples `www.mediapost.com` or `coca-cola.fr`. I would need if possible , the address , an email contact plus telephone. If it is not possible, a basic whois would do fine. – Andy K Nov 13 '14 at 16:17

5 Answers5

8

You can also use the Linux commandtool whois. The following code opens a subprocess and searches for the domain.

But you have to be carefull with many requests in short time. The servers will eventually block you after a while. ;)

import subprocess

def find_whois(domain):
    # Linux 'whois' command wrapper
    # 
    # Executes a whois lookup with the linux command whois.
    # Returncodes from: https://github.com/rfc1036/whois/blob/master/whois.c

    domain = domain.lower().strip()
    d = domain.split('.')
    if d[0] == 'www': d = d[1:]

    # Run command with timeout
    proc = subprocess.Popen(['whois', domain], stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    ans,err = proc.communicate(input)

    if err == 1: raise WhoisError('No Whois Server for this TLD or wrong query syntax') 
    elif err == 2: raise WhoisError('Whois has timed out after ' + str(whois_timeout) + ' seconds. (try again later or try higher timeout)')
    ans = ans.decode('UTF-8')
    return ans


with open('domains.txt') as input:
    with open('out.txt','a') as output:
        for line in input:
            output.write(find_whois(line))

The with open as statement handles the filestream. The 'a' at the output file means the file is opened in append-mode.

Alu
  • 727
  • 5
  • 16
  • Hi Alu, thanks but ... :) I have a file `domains.txt`, which will contains domain name like `www.coca-cola.fr` or `www.apple.com`. Is there a way to process the file and save it to another file, please? – Andy K Nov 17 '14 at 09:04
  • Hi Alu, I got that error `Traceback (most recent call last): File "", line 4, in File "", line 5, in find_whois File "C:\Program Files\python27\lib\subprocess.py", line 710, in __init__ errread, errwrite) File "C:\Program Files\python27\lib\subprocess.py", line 958, in _execute_chil d startupinfo)` – Andy K Nov 17 '14 at 13:17
  • I think I got the issue. There is no who is on windows... -_- Need to double check but that sounds a possible explanation. I'll get you updated. – Andy K Nov 17 '14 at 13:28
  • Yep, in Windows there is no such commandline tool. Maybe you look for the whois tool from the Windows-Sysinternals toolkit. http://technet.microsoft.com/de-de/sysinternals/bb897435.aspx – Alu Nov 17 '14 at 15:03
  • hi Alu, your solution works well. Although, it seems that servers are less and less allowing a `whois` on their server. Regardless of the unexpected poor outcome, as you were the first to provide me with a depth of details on your solution, the 50 points are yours. – Andy K Nov 18 '14 at 10:49
  • Let me add one thing because it causes me great pain but `cygwin` with the option `wget` , `whois` and `pip for python` are a must. To everyone reading this , think about this option if you want a good unix emulator on windows and if you are using this script. – Andy K Nov 18 '14 at 10:58
  • From Python there is no need to use the shell to launch a whois query. There are whois libraries in python or at the bottom of it just open a socket towards port 43. – Patrick Mevzek Jan 02 '18 at 20:31
4

It looks like you've had some helpful answers already, but I thought it might be good to say a little more about the challenges of doing WHOIS lookups in bulk (and in general) and provide some alternative solutions.

The WHOIS lookup

Looking up a single domain name typically involves finding the relevant WHOIS server for that domain and then requesting the information via port 43. If you have access to a unix-like shell (e.g. Bash), you can use whois to do this easily (as noted by others):

$ whois example.com

Very similar WHOIS tools have also been made available as modules for a vast array of programming languages. The pywhois module for Python is one example.

In its simplest form, a bulk WHOIS lookup is just looping over a list of domains, issuing a whois request for each domain and writing the record to an output.

Here is an example in Bash that reads domains from a file domains.txt and writes each WHOIS record into separate files (if you're using Windows, give Cygwin a try).

#!/bin/bash

domain_list="domains.txt"

while read line 
do
    name=$line
    echo "Looking up ${line}..."
    whois $name > ${line}.txt
    sleep 1
done < $domain_list

Beware of the following complications of WHOIS lookups in bulk:

  • Some WHOIS servers may not give you a full WHOIS record. This is especially true for country-specific domains (such as .de and .fr) and domains registered with certain registrars (such as GoDaddy).

    If you want the fullest possible record, you'll often have to go to the registry's website or to a third-party service which may have cached the record (e.g. DomainTools). This is much more difficult to automate and may have to be done manually. Even then, the record may not contain what you want (e.g. contact details for the registrant).

  • Some WHOIS servers impose restrictions on the number of requests you can make in a certain time frame. If you hit the limit, you might find that you have to return a few hours later to request the records again. For example, with .org domains, you limited to no more than three lookups in a minute and a few registrars will bar you for 24 hours.

    It's best to pause for a few seconds between lookups, or try to shuffle your list of domains by TLD so you don't bother the same server too many times in quick succession.

  • Some WHOIS servers are frequently down and the request will time out, meaning that you might need to go back and re-do these lookups. ICANN stipulates that whois servers must have a pretty decent uptime, but I've found one or two servers that are terrible at giving out records.

Parsing the record

Parsing WHOIS records (e.g. for registrant contact information) can be a challenge because:

  • The records are not always in a consistent format. You'll find this with the .com domains in particular. A .com record might be held any one of thousands of registrars worldwide (not by the .com registry, Verisign) and not all choose to present the records in an easy-to-parse format recommended by ICANN.

  • Again, the information you want to extract might not be in the record you get back from the lookup.

Since it's been mentioned already, pywhois is one option to parse WHOIS data. Here's a very simple Python script which looks up the WHOIS record for each domain and extracts the registrant name (where possible*), writing the results to a CSV file. You can include other fields too if you like:

import whois
import csv

with open("domains.txt", "r") as f:
    domains = f.readlines()
    
with open("output_file.csv", "wb") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["Domain", "Registrant Name"])
    for domain in domains:
        domain = domain.rstrip()
        record = whois.whois(domain)
        try:
            r_name = record.registrant_name
        except AttributeError:
            r_name = "error"
        writer.writerow([domain, r_name])

* When I tested this script quickly, pywhois wasn't very reliable in extracting the registrant name. Another similar library you could try instead is pythonwhois.

Community
  • 1
  • 1
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • Hi `ajcr`, many thanks for the depth of your answer. I've tried to do a whois on `google.com` or `oracle.com` but they won't let me do that. Damn. Thanks for everything. – Andy K Nov 18 '14 at 10:54
  • @AndyK - no problem at all. A few .com and .net domains (like google.com) can be tricky to query because the Versign WHOIS server returns details for the domain _and_ servers containing the domain name. Some parsers don't know how to handle this and get confused (this might well be the case here). Let me know if there's anything else I can contribute. – Alex Riley Nov 18 '14 at 11:09
  • Nah `ajcr`. I should be good. I needed to see if my way could do the trick but as it is not, I've got a B-plan. So we are good. Many thanks. – Andy K Nov 18 '14 at 11:11
2

Assuming the domains are in a file named domains.txt and you have pywhois installed, then something like this should do the trick:

import whois

infile = "domains.txt"

# get domains from file
with open(infile, 'rb') as f:
    domains = [line.rstrip() for line in f if line.rstrip()]

for domain in domains:
    print domain
    record = whois.whois(domain)

    # write each whois record to a file {domain}.txt
    with open("%s.txt" % domain, 'wb') as f:
        f.write(record.text)

This will output each whois record to a file named {domain}.txt


Without pywhois:

import subprocess

infile = "domains.txt"

# get domains from file
with open(infile, 'rb') as f:
    domains = [line.rstrip() for line in f if line.rstrip()]

for domain in domains:
    print domain
    record = subprocess.check_output(["whois", domain])

    # write each whois record to a file {domain}.txt
    with open("%s.txt" % domain, 'wb') as f:
        f.write(record)
Céline Aussourd
  • 10,214
  • 4
  • 32
  • 36
Martin Ogden
  • 872
  • 4
  • 14
  • Hi Martin, almost working well until I got that error `record = whois.whois(domain) AttributeError: 'module' object has no attribute 'whois'`. I've tried to change it with `pywhois` and I got that error instead `record = whois.whois(domain) NameError: name 'whois' is not defined` – Andy K Nov 14 '14 at 09:25
  • @AndyK Did you install pywhois? You can install it this way: `pip install python-whois` – Céline Aussourd Nov 14 '14 at 10:42
  • Bonjour Céline, I've installed `pywhois`. When using `pywhois`, I've got this error `record = whois.whois(domain) NameError: name 'whois' is not defined`. Misère ... – Andy K Nov 14 '14 at 10:43
  • hmm you may have the same problem as this guy: https://code.google.com/p/pywhois/issues/detail?id=53 Two pywhois installed maybe? (Note that if you did `easy_install pywhois` you would get the CLI tool which is different from the python library) – Céline Aussourd Nov 14 '14 at 10:48
  • 1
    If you're having trouble with the whois package then you could use `record = subprocess.check_output(["whois", domain])` instead of `record = whois.whois(domain)`. Also change `import whois` to `import subprocess`. This should work on Unix but I'm not sure about Windows. – Martin Ogden Nov 14 '14 at 15:32
  • Hi Martin , almost there : Why do I have this error when running the script? `Traceback (most recent call last): File "./tst2.py", line 17, in f.write(record.text) AttributeError: 'str' object has no attribute 'text'` – Andy K Nov 16 '14 at 20:49
  • @AndyK if you use subprocess (instead of pywhois), you should also remove `.text` from the last line: `f.write(record)` – Céline Aussourd Nov 17 '14 at 17:20
  • Merci Céline. I owe you. – Andy K Nov 17 '14 at 17:21
1

Download and install Microsoft's whois tool from http://technet.microsoft.com/en-us/sysinternals/bb897435.aspx

Create a text file with the list of domain names, with a header row.

name
google.com
yahoo.com
stackoverflow.com

Create a powershell script:

$domainname = Import-Csv -Path "C:\domains.txt"
foreach($domain in $domainname) 
{
   .\whois.exe $domain.name Export-Csv -Path "C:\domain-info.csv" -Append
}

Run the powershell script.

Paul
  • 1,188
  • 1
  • 11
  • 21
egemen
  • 11
  • 2
0

You can do that with a simple "one liner" with the command xargs.

xargs -n 1 -a valid_dns.txt -I {} sh -c 'echo "Domain: {}"; whois {}'

++

nitnit
  • 1