Detect URL decode with URL parse in Python

Question

I've tried to decode the URL if present, but it doesn't seem to work:

Here is the error shown below:

Traceback (most recent call last):
  File "E:\Users\Francbicon\Desktop\Bots\Master Copy\Shopee Endless Loop.py", line 185, in <module>
    clickpy()
  File "E:\Users\Francabicon\Desktop\Bots\Master Copy\Shopee Endless Loop.py", line 75, in clickpy
    print(all_urls)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 878-878: Non-BMP character not supported in Tk

Here is the whole code:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import csv
import urllib.parse

import time
url = 'https://shopee.com.my/search?keyword=mattress'

driver = webdriver.Chrome(executable_path=r'E:/users/Asashin/Desktop/Bots/others/chromedriver.exe')
driver.get(url)
time.sleep(0.8)

# Select language
driver.find_element_by_xpath('//div[@class="language-selection__list"]/button').click()
time.sleep(3)


# Scroll a few times to load all items 
def clickpy():
    for x in range(10):
        driver.execute_script("window.scrollBy(0,300)")
        time.sleep(0.1)

    # Get all links (without clicking)

    all_items = driver.find_elements_by_xpath('//a[@data-sqe="link"]')

    all_urls = []

    s=["-Dr.Alstone-","-Dr.-Alstone-","-Lutfy-Paris-"]

    for item in all_items:
        # This give you whole url of the anchor tag
        url = item.get_attribute('href')
        if "-Dr.Alstone-" in url:
            continue
        else:
            if "-Dr.-Alstone-" in url:
                continue
            else:
                if "/Dr.Alstone-" in url:
                    continue
                else:
                    if "-Simoni-" in url:
                        continue
                    else:
                        if "-Lütfy-" in url:
                            continue
                        else:
                            # You need to remove the preceding values in order to verify href later for clicking
                            urlfinal=url.split('https://shopee.com.my')[1]
                            c = urllib.parse.unquote(urlfinal)
                            all_urls.append(c)
    print(all_urls)
    
    a= len(all_urls)

    print('len:' + str(a))

Here is the thing I've tried: try and except, if-else and normal looping, but it doesn't seem to work quite well. The error keeps popping it out.

How do I fix it?

if this is full code then when do you run `clickpy()` ? I don't see it in code but error show it makes problem. — furas, Jan 07 '20 at 01:18
when I run `clickpy()` then it works correctly on Linux. Maybe problem is not code but tool which you use to run it. I see `Tk` in error. Do you use `Tk`, `tkinter`, or `IDLE` which use `tkinter` (which uses `Tk`) ? Maybe it will work correctly if you run it directly in console or using other editor/IDE (ie. PyCharm) — furas, Jan 07 '20 at 01:22
@furas I do not use any tk this is a selenium webdriver robot for getting url links but I've recently added "import urllib.parse" and then the error comes. — Francabicon franc, Jan 07 '20 at 02:05
it has problem to print one of the links from list `all_urls` - you could use `for`-loop to print every url separatelly to see which one makes problem. You could alos print url before you uses `urllib.parse.unquote()` - maybe it shows something more. you can also use `print(repl(url))` and it should uses hex codes for not ascii chars. — furas, Jan 07 '20 at 02:17

score 0 · Answer 1 · edited Jul 20 '20 at 08:30

Your code works fine on my Ubuntu machine. With the following output:

['/-Exclusive-Free-Shipping-Sleeper-BASIC-8-Mattress-Queen-King-Reinforced-HD-Foam-Tech-US-Aus-Euro-Export-Quality-i.217131670.6614097006', '/Anti-Decubitus-Bubble-Ripple-Rehab-Mattress-Tilam-Bedsore-Prevention-With-Pump-i.182856617.6911828477', '/Anti-Decubitus-Bubble-Ripple-Rehab-Mattress-Tilam-Bedsore-Prevention-With-Pump-i.53160130.6011827660', '/(Limited-Deal)-Dr.Macio-Spinopedic-Queen-Size-Mattress-i.132909000.2005518495', '/Mattress-9CM-Thicker-Natural-Latex-Tilam-Tatami-Mattress-Foldable-Mattress-i.170335123.2765865311', '/Tatami-Mattress-Tilam-9CM-Thickness-Cotton-Mattress-Single-Queen-Mattress-Foldable-Mattress-i.170335123.2765935029', '/Tatami-Mattress-Tilam-Foldable-Mattress-Single-Queen-Mattress-i.155066435.2759640258', '/(Limited-Deal)-Dr.Macio-Spinopedic-Mattress-Direct-Export-to-Germany-Single-Size-i.132909000.2005659513', '/READY-STOCK-Solid-Color-Tatami-Mattress-Toppers-Queen-Size-150cm-x-200cm-i.22674668.1729882060', "/JFH-3'X-5'-SINGLE-FOAM-MATTRESS-TILAM-BUJANG-(FABRIC)-i.29936425.2633629020", '/Latex-Tilam-Home-Mattress-Home-Tatami-Tilam-Single-Queen-King-Size-Foldable-Mattress-i.155066435.2717643698', '/Tilam-Bujang-Gulung-Single-Mattress-i.112251620.1854465897', '/Tatami-Mattress-Tilam-9CM-Thickness-Cotton-Mattress-Single-Queen-Mattress-Foldable-Mattress-i.170335123.2765934533', '/Latex-Mattress-9cm-Thickness-Tatami-Mattress-Queen-Single-Foldable-Mattress-i.155066435.2718145468', '/Tatami-Mattress-Tilam-Foldable-Latex-Mattress-Queen-Single-Mattress-i.170335123.2863147813', '/Tilam-Tatami-Mattress-Foldable-Mattress-Thicker-9CM-Mattress-Queen-Single-Mattress-i.155066435.2770903993', '/Cassa-Sunpillo-S99-Foldable-Rubber-Foam-Thick-3inch-Single-Mattress-(Free-hand-carry-bag)-i.2572727.21085735', '/Tatami-Mattress-Tilam-Queen-Single-Mattress-Foldable-Mattress-i.155066435.2753157909', '/QUEEN-MATTRESS-TILAM-QUEEN-HIGHT-DENSITY-SYNTHETIC-LATEX-10-inch-i.99306039.1926542809', '/Tatami-Mattress-Tilam-Single-Queen-King-Mattress-Foldable-Mattress-Hotel-Mattress-i.155066435.2790149370', '/Summer-Cotton-Mattress-Topper-Tatami-Mattress-Tilam-Queen-Single-King-i.90635073.2321251942', '/Cartoon-mattresses-thicker-tatami-lazy-sand-bed-single-and-double-creative-bed-i.44244851.903412439', '/Cassa-Spinahealth-By-Goodnite-8.5-Inch-Gemilang-Queen-Posture-Spring-Mattress-(With-Yellow-QC-By-Goodnite)-i.2572727.309374991', '/Tatami-Mattress-Queen-Mattress-Non-slip-Natural-Latex-Tilam-Mattress-Protecetor-i.155066435.2718041010', '/TILAM-SINGLE-KEKABU-MIX-SINGLE-MATTRESS-READY-STOK-i.10261583.1326906670', '/Cassa-4D-All-Season-Flexible-Japanese-Tatami-Style-Single-Mattress-Topper-Only-Thick-4-cm-i.2572727.372586007', '/5D-Tatami-Mattress-Tilam-Foldable-Matress-Queen-Single-Thicker-Mattress-i.189732857.5403986851', '/3-Years-Warranty-Spinalhealth-by-Goodnite-4.5-inch-I-Foam-Single-High-Quality-Compact-Rubber-Foam-Rebond-Mattress-Only-i.2572727.309374979', '/Memory-Latex-Mattress-Topper-Thicker-Soft-Tilam-Single-Queen-King-Size-Matress-i.145155560.2352273747', '/JFH-Goodnite-Spinahealth-V3-Rebond-Foam-Foldable-Mattress-with-carry-bag-(2inch)-i.29936425.3108488054', '/Foldable-mattress-tilam-lipat-i.64444893.1941280212', '/(Limited-Stock)-Dr.Macio-Universo-Mattress-Direct-Export-to-Germany-Single-Size-i.132909000.2003283032', '/Thicker-10cm-Tatami-Matress-Tilam-Single-Queen-King-Size-Lamb-Cashmere-Mattress-Bed-Soild-Topper-Protector-i.172014519.6301580490', "/JFH-High-Density-5'x8''-Foam-Mattress-Queen-Size-(RANDOM-COLOR)-i.29936425.2352530598", '/Goodnite-Spinahealth-Limited-Edition-Royal-Grandeur-10-Posture-Spring-Queen-King-Single-Super-Single-Mattress-Only-i.2572727.1830626612', "/JFH-HIGH-DENSITY-3'x4''-SINGLE-FOAM-MATTRESS-TILAM-BUJANG-(COLOR-DESIGN-RANDOM)-i.29936425.433213855", '/Goodnite-SpinaHealth-Posture-Spring-Mattress-Euro-Top-10.5-Inch-Queen-Mattress-i.69535835.1831244756', '/LaFamille-PureFoam-8-inches-(20cm)-High-Density-Foam-Mattress-Queen-5ft-x-8--i.39909483.1649340831', '/Latex-Mattress-Tilam-Thickness-9cm-Tatami-Mattress-Single-Queen-Foldable-Mattress-i.170335123.2765883602', '/(Limited-Deal)-Dr.Macio-Spinopedic-Queen-Size-Mattress-i.132909000.2005518495', '/Cassa-Mimo-Foldable-Queen-6-Inch-Thick-Foam-Mattress-2-Seater-Sofa-Bed-4-In-1-(Blue-Red-Green-Stripe)-i.2572727.246503674', '/Single-Queen-King-Tilam-Thicker-Latex-Mattress-Foldable-Soft-Matress-Tatami-i.145155560.2347173409', '/5-Single-mattress-HIGH-DENSITY-FOAM-(LIMITED-OFFER)-Tilam-single-i.107251758.1683866775', '/(Limited-Stock)-Dr.Macio-Universo-Mattress-Direct-Export-to-Germany-Queen-Size-i.132909000.2003259828']
len:44

The problem is on the Windows machine. You can get details of these error here.

What version of Ubuntu? What version of Python? – Peter Mortensen Jul 20 '20 at 08:31 — Peter Mortensen, Jul 20 '20 at 08:31

Detect URL decode with URL parse in Python

1 Answers1