2

I am trying to scrape the wester union send money-Website in order to get the current "euro-blue" exchange rate with the Argentinian pesos. Western Union is the only company that gives you the true exchange rate that is also traded on the streets. Look up Dollar-Blue in case you are interested how a second market developed for trading currencies in Argentina.

My goal is to get the current exchange rate of the Euro to the Argentinian pesos. If one goes onto the website, you have to first click the Accept Button, then type in the Name of the Country where you would like to send the money to and only after that step you can see the exchange rate.

I was trying it first with requests, since this doesn't handle java-script I switched to selenium and are pretty close now.

My code looks as follows:

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

WesternUnion = 'https://www.westernunion.com/de/en/web/send-money'

# create a new Chrome session
driver = webdriver.Chrome()
driver.implicitly_wait(30)
driver.get(WesternUnion)

python_button = driver.find_element_by_id('button-fraud-warning-accept')
python_button.click()

time.sleep(0.25)
python_button = driver.find_element_by_id('country')
python_button.click() #click fhsu link
time.sleep(0.15)
text_area = driver.find_element_by_id('country')
text_area.send_keys("Argentina")

soup = BeautifulSoup(driver.page_source, 'lxml')

div = soup.find('div', id="OptimusApp")
div2 = soup.find('div', class_="text-center")

The problem is that it doesn't show the exchange rate if I do it with python (screenshot navigated automatic with python) whereas it does show the exchange rate if I do exactly the same thing by hand (screenshot navigated by hand).

I am very new to scraping and python, does anyone have a simple solution for this problem?

Guy
  • 46,488
  • 10
  • 44
  • 88
Kev
  • 23
  • 3
  • @Guy Not sure why you have to remove _Chrome_ and _ChromeDriver_ tags, because fundamentally this question is all about _ChromeDriver_ driven _Chrome_ getting detected. Not sure why you don't want _Chrome_ and _ChromeDriver_ contributors to look at this question. – undetected Selenium Jan 08 '20 at 06:51
  • @DebanjanB It's not about Chrome or ChromeDriver, the OP could use FF or IE and have the exact same problem. The only reaon you even know he is using chrome is because he posted the driver initialization code. – Guy Jan 08 '20 at 07:25
  • Any news on this? I was searching exactly the same information and didn't find a solution. I would like to build a time series for EUR:ARS exchange rates. WU also has an API, see https://developer.westernunion.com/#/swagger/Western%20Union/61548a402f2bc6429793e702 > Fee Survey. This seems to contain the necessary information. But I can't find how to get an api-key/ how to register. – LarS Oct 09 '21 at 02:57
  • OK, it seems API is not an option, see https://stackoverflow.com/a/7240948/880188 > Western Union allows API but their API is not for free. You need to setup business account with them, and it roughly costs you around USD 15,000. – LarS Oct 09 '21 at 03:02

3 Answers3

2

I modified your code a bit adding a couple of optional arguments and on execution I got the following result:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.westernunion.com/de/en/web/send-money')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#button-fraud-warning-accept"))).click()
    python_button = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#country")))
    python_button.click()
    python_button.send_keys("Argentina")
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#smoExchangeRate"))).text)
    
  • Console Output:

    1.00 EUR = Argentine Peso (ARS)
    
  • Observation: My observation was similar to your's that the exchange rate wasn't shown:

snapshot


Deep Dive

While inspecting the DOM Tree of the webpage you will find that some of the <script> and <link> tag refers to JavaScripts having keyword dist. As an example:

  • <script src="/content/wucom/dist/2.7.1.8f57d9b1/js/smo-configs/smo-config.de.js"></script>
  • <link rel="stylesheet" type="text/css" href="/content/wucom/dist/2.7.1.8f57d9b1/css/responsive_css.min.css">
  • <link rel="stylesheet" href="https://nebula-cdn.kampyle.com/resources/dist/assets/css/liveform-web-vendor-f84dfc85d6.css">
  • <link rel="stylesheet" href="https://nebula-cdn.kampyle.com/resources/dist/assets/css/kampyle/liveform-web-style-a4ce961d15.css">
  • <script src="https://nebula-cdn.kampyle.com/resources/dist/assets/js/liveform-web-vendor-919a2c71c3.js"></script>
  • <script src="https://nebula-cdn.kampyle.com/resources/dist/assets/js/liveform-web-app-2c4e3adeb6.js"></script>

Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.


Distil

As per the article There Really Is Something About Distil.it...:

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

Further,

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


Reference

You can find a couple of detailed discussion in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • I think you did not quite get my problem, this is exactly what I am getting. However I would like to get the amount of Argentine Peso. Have a look at the 2 screenshots I attached. One has the amount and the other one is has this error message like your approach – Kev Jan 07 '20 at 22:44
  • @Kev Checkout the updated answer and let me know the status. – undetected Selenium Jan 07 '20 at 23:11
1

Change rate comes from https://www.westernunion.com/wuconnect/prices/catalog with a POST request. Eg:

  • Assuming a $payload variable containing:
{
  "header_request": {
    "version": "0.5",
    "request_type": "PRICECATALOG",
    "correlation_id": "web-x",
    "transaction_id": "web-x"
  },
  "sender": {
    "client": "WUCOM",
    "channel": "WWEB",
    "cty_iso2_ext": "DE",
    "curr_iso3": "EUR",
    "funds_in": "*",
    "send_amount": 300,
    "air_requested": "Y",
    "efl_type": "STATE",
    "efl_value": "CA"
  },
  "receiver": {
    "curr_iso3": "ARS",
    "cty_iso2_ext": "AR",
    "cty_iso2": "AR"
  }
}
  • And assuming an innocent user-agent
  • Then curl -s 'https://www.westernunion.com/wuconnect/prices/catalog' --data-raw "$payload" | jq '.services_groups[0].pay_groups[0] | .fx_rate' would get it.

It used to work (until a couple of weeks ago).

BUT the endpoint is now protected: It expects a custom set of crypto headers computed from the browser and relying heavily upon obfuscated and involved Javascript. Here is what they look like:

X-NYUPe9Cs-a: IExHQTfwEnWwuyWbWjmR2fyBEQW9X9nnqFqIio78zzCKFA78iBDudN=NnOpQd=725d_urqfAN2sKK7UOdTnkCpUqFvQ9TF2nK=M1jDmrMBYy-4iq5kUqSdEN1PjBjEC=Nx742P1np7qAKK8q8qWd5UQIQ8Wqnqx51np7kIavPFenB9dSvnKou0A2nfv7qE-q7k_2EdNyuKffAYxcqbnjnCYIDfe=IKCc8JdPzpDecynafP1fVKq=z2SJCKiaMXu-Dxp2z5CpfznOPcs4WFH2D4C5JTTnDDUQ7vOPFVKnKCdcamPqOnK8wOQb9FYoxWs=Pksn4vmeC5Ia9EoVReH8uj0q_PRu2q522kk-9jnRTYJIP9VWP_50hhxPMds9eX_kAC2DbBnKzy24sICkO7bkkyAT82s5YuKECP=fnzXixxC8=81WX4jqnNBJ_qxbbqV=InUWmKYWimbUaB5qwOCA2iqSXNDw25PmHq8_2XEAx7nTnjkwYS2qvNBa8sAjxxHU8ibNFr_iiZH=4JuS2Q=RJrnTDonA1vFxKe812s-CMJ8HFay0VqrC2kQZVzCV2w0bqZyEuJksehxE22W8-Smd5V5XnvENHFcn72wkeN=boc=PIbv=XYNqEknrCyEX2r8BJvYCipnKdnkohrIvPovqfJMB7emybSTy2Eeu9h9VBrqYMW2NrXb2wc1kxC5WJAFv_cXE_vqsvRqeS-wYJ9vD1Y-1Cvo8RRqkFWAXuq1CBYXndSQ_A1e0aqO7sTB=nyKFd1=rJ4=z15z-qFMEQfy_x=qedJTzvWf8SE9yMqVCYUuSrhMnpEFdeJYiEdX-KS2In0-uZ0zzrn2qn27zY-jo7qkrvrq8V8v2aACd7PFEnMbCyUUUI-MdTcD8nCDiC2yuPOpbUcwID7Y1d=2aIubdAhErSn82C9FnSm9IVj8Z_WHwBvBPCI_o=_2pdRVk0jS5qYb_OjyVrrxqXnZOp9TVnAVnWZOWn798a8qhX-hYuFjJ-z84rzQRo2M70vHAMuNSMT_8yqkrujEr7JcyU2CmY1NKpev0w8R19227=qVqdemsq00nx-UAYz0=UYA2hT2IaqoqRie7Jbzjikb2snnnQynoHUpnYxRVs9ORc7I2MVhqqCVonnVk5Pi1xns2--iqqSKH8Rhium-nRcWurBu=TFiZ-5Qq-_WDiMQ5n7BqmAZkjWZM97MNkqakw8nq9CXav2fq4OqUok997VTOFkP7DEm-W5ckkwInQNMBNqTrK25DnSHRiyP5m5zqh1RjWp48f_9QCO2HiPS9A8j58zoF_8abn0H1qUERd_Cq8-7zqOnkEeAAWCywi18wUD5qfbQd22BJDNq90sMSbNVsJy0P2CBf-hq9fjSCB=uA5y8xT2-CJunFwUCx85ujxiq-bu5BAbSpqUCAXDP8iq02ET5-xRq7CD22n=E4keqVnKpzq2=RUKWP_jDnsiKRn4xxsRM0QYnbCC=m2KjCE9BjJ1nrn8EDvUS52bmaixqosRq5SNOPEHKyrQy8nqI9E9OAMYm5=TpVNvn-oqeDF_-jkcqIdyHqn1QYxaZbn4xVFqIOzQ9eV7A9QbC5zPcPeD=qqpqqK=YxNzKwTSCOnA70SrhiB2r1VkqKuuBJQYZoIC_87Mmuo8znpQnH29fI7Oh99sKO5aoEQIMOrIDwQDZvWqwwH=ZKnnn8T=5o9MTdDkpr472DPdqOEq8Ffii0q00r8OwkZX_oXY2UEKdCaX88zZamSqaY8iZzqiIYdeMjqMFKqVAv-82PxBWQv1Kr1OibYSh0QTp14BqBhEf-WKrVECI_y7517nZa8ndFpjznkfcnY2KufY0iFwnx2zx99iuUbF84nerZH88Rxx=pKBbsjeqJZ-0xZScnrn9hReJ--oh40mcxMXn1V0PzwcMaEACo0dWouDZeZYHViqd9RQAnso2DIF-wI-Pe_q5srKK8nmCZNI2hZqwjzOM7bwF4_4-S=9BzYFDaYw0SknMJTq9VReaM297ir-CYsdM9VN29TpDRnC=8aQ5o9yXZpEDyfqmJuwzs7N7he8FPrfIdDVK5iaW8Jm8YcHnqnno7EHSqKeTRNuzkeHqcn0u87OX=ByhQMQJ4QacaxqqFVmPqQEHSVbx1PsQDq780PWDKbvK5PBMnZksBZm0VIOHxu_q2xnfPWsixuqaIm2sXn2Jz2yByvdNeT5r2F14zEaiiEFfNqICZ_DHCXpr2K4HURNd5n_vyJTe2UVakZE_9T01W9cFUxBOur0xfN0=h4vmOoUAnwISSDxc5EmAefWviW2PvqevpnnS7YuMPMY5aHi2c2RrP=i-mfPpKzRSHpAn82sJ9izMdWcWq=qI5O_UBm==vFHrFOzHQK8AH9qcRM8=KHpwyoV-b0WzuErxZhZmMV_iKors2JCAeWn-jn-q_Mrqau1Xz88nTBQFO=vnKPfFoqY9Z81KUqyAn2N5dwbnKWHUZh4Ke4OnyOr=22=rKZneB9PmQDUDq=97vOSqqNq=bHNriSf=xT48cXy7AqWOnncwEqwbVcA25ds8O8S0WI9=ipEfIyiiJ7qSMoHY=kn7rwiE94jsVx5n7Syj=m58Fqvi=HCFI0Bwf8byFhWbeJsAK5UaDqchCY5qC9n-OUqmeJHay8OAqm-HQPnP9qBfyd08nini0FsrdvHmru4qA=sK4OKmzcY_wSj8D8D2jBQWHF2avq4UP8-D2Ysh4C_bXXhqmqK9RPyuXRoeC5Oad-FmUXy_5F_r0OKEnrAMC
X-NYUPe9Cs-f: A_v7kP18AQAAbfq9_kCtmTqfX2Eq0otHnwqUQCck5dPjX88Nxz2rTVnAnVxYAcmzs1ScuAA7wH8AADQwAAAAAA==
X-NYUPe9Cs-b: -8qa21q
X-NYUPe9Cs-c: AOBWjv18AQAAqntYtdrBc9F0C0KawiRISfcOH_ruhEoV4NNn-IemnXnq5vi1
X-NYUPe9Cs-d: AAaixIihDKqOocqASZAQjICihCKHpi15Rub4tUEPqzn1Pxi1AAd7zRXqBBDKOTmM_r5nbhq
X-NYUPe9Cs-z: q

This set of headers is only valid a limited time (no more than 24h AFAICT).

I'm curious anyone would further pinpoint where the logic lives (some crypto initialization vector may be provided by the cookie conveyed during initial page load). If so, node.js could compute that set of headers.

drzraf
  • 451
  • 4
  • 11
0

Just to add a little bit of informatión about this. I managed to make it work for a little while by using the westernunion.ru node which looks like wasn't protected (because I could get this information without all these headers) Unfortunately westernunion.ru endpoint has been taken down or at least is not working anymore. So a solution could be to find an endpoint for the API that is not protected yet.

dantebarba
  • 1,396
  • 1
  • 12
  • 22