Python, Selenium, CSV, and UTF-8 (French) characters

Question

I have a CSV file that contains French words, such as "immédiatement". I'm using Python plus Selenium Webdriver to write those words into a text field. Basically, using the required Selenium packages plus csv:

Start up Selenium and go to the correct area.
Open the CSV file.
For each row:
- Get the cell that contains the French word.
- Write that word in the textarea.

The problem:

"UnicodeDecodeError: 'utf8' codec can't decode byte 0x82 in position 3: invalid start byte"

I've tried:

declaring "coding: utf-8" at the top of the file, and leaving it out
once I set a variable to the contents of the cell, appending .decode("utf-8")
once I set a variable to the contents of the cell, appending .encode("utf-8")

No love.

(I can't set it to "ignore" or "replace", because I need to actually type the word out. It doesn't appear to be Selenium itself, because when I put the list directly in the script, typing goes fine. (I could put this in as a dict in the script, but jesus, why.))

What am I missing?

[edit] Sample CSV content:

3351,Payé/Effectué,Link1
45922,Plannifié,Link1
3693,Honoraires par Produit,Link2

And generalised code:

# -*- coding: utf-8 -*-

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import unittest, time, re, csv

csvdoc = "C:\path\to\sample.csv"

class Translations(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.Firefox()
        self.driver.implicitly_wait(30)
        self.base_url = "https://baseURL.com/"
        self.verificationErrors = []
        self.accept_next_alert = True

    def test_translations(self):
        driver = self.driver
        driver.get(self.base_url + "login")
        driver.find_element_by_id("txtUsername").clear()
        driver.find_element_by_id("txtUsername").send_keys("username")
        driver.find_element_by_id("txtPassword").clear()
        driver.find_element_by_id("txtPassword").send_keys("password")
        driver.find_element_by_id("btnSubmit").click()
        # Navigate to the correct area.
        # - code goes here -
        # Open the file and get started.
        with open(csvdoc, 'r') as csvfile:
            csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
            for row in csvreader:
                elmID = row[0]
                phrase = row[1]
                arealink = row[2]
                driver.find_element_by_xpath("//a[text()='%s']" % arealink).click()
                time.sleep(1)
                driver.find_element_by_id(elmID).clear()
                driver.find_element_by_id(elmID).send_keys(phrase)
                driver.find_element_by_id("btnSavePhrase").click()

    def is_element_present(self, how, what):
        try: self.driver.find_element(by=how, value=what)
        except NoSuchElementException, e: return False
        return True

    def tearDown(self):
        self.driver.quit()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

I have a bit of experience in all these areas. I use Selenium regularely at work, I am speaking norwegian which has crazy letters too (æ ø å), i've coded selenium in python, but I've only once parsed a CSV file and that was in java. Would you mind posting a copy of your CSV-file's content? or a snippet of it? Also the code you use to retrieve the csv-data — jumps4fun, Jul 22 '15 at 15:21
It would help if you posted the complete traceback. In what line of the code does the `UnicodeDecodeError` occur? — Zenadix, Jul 22 '15 at 17:02
Are you sure the source file is UTF-8-encoded? The error indicates it is not valid UTF-8. Also, what version of Python? 2.X and 3.X handle Unicode differently. — Mark Tolonen, Jul 23 '15 at 04:01
For example, `\x82` decodes as `LATIN SMALL LETTER E WITH ACUTE` in common OEM code pages (not UTF-8) and that letter is at offset 3 in the second field of the first line of your .CSV example, which suspiciously correlates with your error message. — Mark Tolonen, Jul 23 '15 at 04:07
I've been able to recreate the issue in a short 20-liner (including print statements and imports. The problem is definitely not loading the values from the csv, but trying to input them into seleniums element from the webdriver... Continuing — jumps4fun, Jul 23 '15 at 13:51
I don't have any more time to look at this today, but this post might be helpful: http://stackoverflow.com/questions/1846135/general-unicode-utf-8-support-for-csv-files-in-python-2-6?rq=1 — jumps4fun, Jul 23 '15 at 14:06

score 0 · Answer 1 · answered Jul 24 '15 at 13:22

After several hours of trying, I found a way to do this, but I had to completely move away from the csv.reader. The problem you are facing is a classic python byte-string-vs unicode-string-problem. I am not fluent in python unicode vs byte strings yet, and the csv.reader used some kind of encoding in the background that I just could not figure out. However:

from selenium.webdriver.chrome.webdriver import WebDriver
import io

csvdoc = "your_path/file.csv"

driver = WebDriver("your_path/chromedriver.exe")
driver.get("http://google.com")
element = driver.find_element_by_id("lst-ib")
with io.open(csvdoc, 'r') as csvfile:
    csvcontent = csvfile.read()
    print(csvcontent)
    for l in csvcontent.splitlines():
        line = l.split(',')
        element.send_keys(line[0])
        element.send_keys(line[1])
        element.send_keys(line[2])

When I chose to fetch the contents of the file without the csv.reader, I was able to get a predictable String to work with. Then it was just a matter of splitting it up in the right loops. Finally, my strings were accepted by seleniums send_keys()-method.

i also changed the open() as into io.open() as. This was originally to be able to include an encoding value as the third parameter (I am using python 2.7). When I removed the third parameter, the script still worked, but removing the io. did not work.

I know this is a primitive way of solving your problem, but at least it works, and for now it is the only answer.

Python, Selenium, CSV, and UTF-8 (French) characters

1 Answers1