1

I'm trying to automatically pull information from this website for a set of values. I have a list of start and destination ports e.g. THEODOSIA and KERCH and I need to extract the calculated distance, speed and days for each start-destination combination. Can someone please advise on how to achieve this in Python? Another potential hurdle is that the ports in my list have 'short names' e.g. THEODOSIA which stands for Port of Theodosia, Ukraine. When you enter THEODOSIA in the search the website offers an auto-complete suggestion so that's fine for a manual search. However, I'm not sure how that works in automated searches.

I'm completely inexperienced in web scraping/searching so started writing the below code after reading a few online articles but have reached a dead end and don't think my code is of any use.

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
from webdriver_manager.chrome import ChromeDriverManager
import requests

#Example start and destination port values
df = pd.DataFrame({'StartPort':['THEODOSIA', 'ROSTOV'], 'DestinationPort':['KERCH', 'MARSEILLE']})

r = requests.get('http://ports.com/sea-route/')
soup = BeautifulSoup(r.content, 'html.parser')
rows = soup.findAll('tr', {"class": "span-7 prepend-top"})

startport = []
for a in soup.findAll('a',href=True, attrs={'class':"span-7 prepend-top"}):
    startport=a.find('div', attrs={'class':"span-7 title ac_input"})
Chipmunk_da
  • 467
  • 2
  • 9
  • 27

1 Answers1

1

You can use their API to get full port names. Then use these names to obtain distance, speed and days at sea.

For example:

import requests
from bs4 import BeautifulSoup


from_ = 'Theodosia'
to_ = 'Kerch'

find_port_url = 'http://ports.com/aj/findport/'
route_url = 'http://ports.com/aj/sea-route/'

def find_port(port_name):
    return requests.get(find_port_url, params={'q': port_name, 'limit': 1}).text.split('|')[0]

def find_route(f, t):
    data = requests.get(route_url, params={'a':0, 'b':0, 'c': f.split(',')[0], 'd': t.split(',')[0]}, headers={'X-Requested-With': 'XMLHttpRequest'}).json()
    return data['cost']['nauticalmiles'], data['default_speed'], data['days_at_sea']


f = find_port(from_)
t = find_port(to_)

nm, speed, days = find_route(f, t)
print('Distance: {} nm Speed: {} Days at sea: {:.1f}'.format(nm, speed, days))

Prints:

Distance: 70 nm Speed: 10 Days at sea: 0.3
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    Awesome thanks! I'll try this out for all the ports in my dataframe. Hopefully there's no limit on using the API and I don't need to pay to run my larger dataset! – Chipmunk_da Jul 06 '20 at 22:34
  • Out of curiosity, where did you find out the `find_port_url` and `route_url` from? – Chipmunk_da Jul 07 '20 at 08:26
  • @Python_newbieash I observed the URLs in Firefox developer tools -> Network tab. There are all requests that the page is making. Chrome has something similar. – Andrej Kesely Jul 07 '20 at 08:28