decoding a .txt - 'utf-8' codec can't decode byte 0xf3

Question

I am taking data, domains, from an excel file to a text file and then check the availability of the domains. The problem pops up when I try to use that text file after taking the data from the excel file.

This is the data in the excel file

arete.cl
cbsanbernardo.cl
ludala.cl
puntotactico.cl
sunriseskateboard.cl
ellegrand.cl
turismosantodomingo.cl
delotroladof.cl
produccionesmandala.cl

So, basically if I type manually the domains in the text file the script works fine. But if I take the domains from an excel file to a text file and then run the script this errors pops up:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 194: invalid continuation byte

The same happens if I try to check the domains directly from the excel file. So should I decode the .txt or the .xlsx? How can I do it?

#!/usr/bin/python

import pythonwhois
import openpyxl
from openpyxl import load_workbook
import os

pathx = 'path'
filex = 'file.xlsx'

print('**Availability of domains**')
os.chdir(pathx)
workbook = openpyxl.load_workbook(filex, data_only = True)
sheet = workbook.get_sheet_by_name('Dic')


domainsz = io.open(pathx + '\\domains.txt', 'a')

for i in range(1, 10):
    domainx = sheet["A" + str(i * 2)].value
    if domainx is not None:
        domainsz.write(domainx + '\n')
        print(domainx)

domainsz.close()

with gzip.open('domains.txt' + ".gz", "wb") as outfile:
    outfile.write(bytes(plaintext, 'UTF-8'))

domains = []
available = []
unavailable = []


def getDomains():
    with io.open('domains.txt', 'r', encoding='latin-1') as f:
        for domainName in f.read().splitlines():
            domains.append(domainName)

def run():   
    for dom in domains:
        if dom is not None and dom != '':
            details = pythonwhois.get_whois(dom)
            if details['contacts']['registrant'] is not None:
                unavailable.append(dom)
            else:
            available.append(dom)

def printAvailability():
    print ("-----------------------------")
    print ("Unavailable Domains: ")
    print ("-----------------------------")
    for un in unavailable:
        print (un)
    print ("\n")
    print ("-----------------------------")
    print ("Available Domains: ")
    print ("-----------------------------")
    for av in available:
        print (av)


if __name__ == "__main__":
    getDomains()
    run()
    printAvailability()

Check this question http://stackoverflow.com/questions/491921/unicode-utf-8-reading-and-writing-to-files-in-python and maybe try add this in second line `# -*- encoding: utf-8 -*-` — DanteVoronoi, Feb 01 '17 at 17:55
There is a lot of prose and code here, but there is no [Minimal, Complete, and Verifiable](http://stackoverflow.com/help/mcve) example, which makes it much harder to help. — Stephen Rauch, Feb 02 '17 at 02:06
It is easier to help if you know to use pythonwhois properly. I just realized that whois is able to extract data only for popular TLDs (com, org, net, biz, info, pl, jp, uk, nz, …) and "cl" is not in the list. Thanks anyways guys. — Hans Schmidt, Feb 03 '17 at 23:21

decoding a .txt - 'utf-8' codec can't decode byte 0xf3

0 Answers0