The problem:
Inspecting the HTML of your shown URL, it appears that the page content is mostly loaded dynamically. You can use a tool that can run the JavaScript to generate the page content so you can then extract the relevant information.
The requests
library won't do this for you. You can instead use the selenium
library.
Using Selenium:
Firstly, observe that the HTML for the vessel names looks like this:
<div data-v-5c859f2b="" class="names">
<div data-v-5c859f2b="">中海太平洋</div>
<div data-v-5c859f2b="">CSCL PACIFIC OCEAN</div>
</div>
The code below find_elements_by_class_name()
to extract the HTML tags with the class names
(which is used for the the vessel names).
Then, find_elements_by_tag_name()
is used to find the child div
tags, which contain the Chinese and English names.
from selenium import webdriver
import textwrap
url = 'http://lines.coscoshipping.com/home/Services/ship/0'
driver = webdriver.Firefox(executable_path='YOUR PATH') # or Chrome
driver.get(url)
for vessel in driver.find_elements_by_class_name('names'):
chinese, english = vessel.find_elements_by_tag_name('div')
print(textwrap.dedent(f'''
Chinese: {chinese.text}
English: {english.text}
'''))
I've also used textwrap.dedent()
to prettify the output.
Example output:
Chinese: 中海太平洋
English: CSCL PACIFIC OCEAN
Chinese: 中海印度洋
English: CSCL INDIAN OCEAN
Chinese: 中海大西洋
English: CSCL ATLANTIC OCEAN
Chinese: 中海之星
English: CSCL STAR
...
See also this post about how to download a driver (either Chrome or Firefox) and add it to the $PATH
.
An alternative way:
Using splitlines()
, we can extract the Chinese and English names of the vessels more succinctly from each of the div
s:
from selenium import webdriver
import textwrap
url = 'http://lines.coscoshipping.com/home/Services/ship/0'
driver = webdriver.Firefox(executable_path='YOUR PATH') # or Chrome
driver.get(url)
for vessel in driver.find_elements_by_class_name('names'):
chinese, english = vessel.text.splitlines()
print(textwrap.dedent(f'''
Chinese: {chinese}
English: {english}
'''))
This is a bit more presumptive, but often does work (as in this case).