0

Firstly, I'd like to scrape the values off of this table on this URL:

https://www.trademap.org/Product_SelCountry_MQ_TS.aspx?nvpm=1%7c076%7c%7c%7c%7cTOTAL%7c%7c%7c2%7c1%7c1%7c2%7c2%7c3%7c1%7c1%7c1

Image of table:

After inspecting the source code, this is essentially where I need to scrape from:

Source code of target values:


Here's the code that I've written:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://www.trademap.org/Product_SelCountry_MQ_TS.aspx?nvpm=1%7c076%7c%7c%7c%7cTOTAL%7c%7c%7c2%7c1%7c1%7c2%7c2%7c3%7c1%7c1%7c1"

uClient = uReq(my_url)
brazil_monthly_exports_html = uClient.read()
uClient.close()
brazil_monthly_export_soup = soup(brazil_monthly_exports_html, "html.parser")
brazil_monthly_export_soup.body.form.div.table

Now, if I go down any deeper I don't get much. Should I add a ".tbody" behind, I get nothing at all. Adding ".tr" or "findAll("tr") doesn't show me any of the values as well.


moron
  • 51
  • 1
  • 9
  • I'm not seeing the table ! – αԋɱҽԃ αмєяιcαη Dec 14 '19 at 03:54
  • this page uses JavaScript to add items and `urllib` and `BeautifulSoup` can't run `JavaScript` and maybe it is problem. Besides DevTools in Firefox found 10 `` - which one do you want to get?. There is also many `
    ` - which one do you want to get?
    – furas Dec 14 '19 at 03:58
  • @furas are you able to see the table which he talking about ? – αԋɱҽԃ αмєяιcαη Dec 14 '19 at 04:00
  • I don't see table which you show on image - probably you have to first fill form and click some button to get it. – furas Dec 14 '19 at 04:00
  • @furas that's what am talking about. – αԋɱҽԃ αмєяιcαη Dec 14 '19 at 04:01
  • @αԋɱҽԃαмєяιcαη first I saw `form.div.table` in code so I thought OP asks for `` tag in code - which I can find - but later I saw image with table - which I can't find on this page. I can see table after clicking any button in form - but it has different url.
    – furas Dec 14 '19 at 04:05
  • you should check your url in code - it redirect me to page with form. And it can be your problem - you get wrong page and you can't find your table. – furas Dec 14 '19 at 04:11
  • if you print brazil_monthly_export_soup = soup(brazil_monthly_exports_html, "html.parser").encode('utf8') print(brazil_monthly_export_soup) you will see that there is not your table id="ctl00_PageContent_MyGridView1" ----I think must to use selenium like a option – GiovaniSalazar Dec 14 '19 at 04:19
  • 1
    @furas I've found what the `OP` talking about by manual move like `rat` haha. – αԋɱҽԃ αмєяιcαη Dec 14 '19 at 04:21
  • @GiovaniSalazar no need for selenium. – αԋɱҽԃ αмєяιcαη Dec 14 '19 at 04:22
  • when I try to get montly data then it shows me message that these data are restricted and I whould have to create account and login. So your script would have to also login to server to get montly data. – furas Dec 14 '19 at 04:22
  • Please include information as text in your post, not in images. – AMC Dec 14 '19 at 06:36

1 Answers1

1
import requests
from bs4 import BeautifulSoup

payload = {'nvpm': '1|076||||TOTAL|||2|1|1|2|2|1|1|1|1'}
r = requests.get(
    "https://www.trademap.org/Product_SelCountry_TS.aspx", params=payload)
soup = BeautifulSoup(r.text, 'html.parser')


for item in soup.findAll('font', {'color': '#002B54'}):
    print(item.get_text(strip=True))

Output:


TOTAL
All products
225,098,405
191,126,886
185,235,399
217,739,218
239,889,210

12
Oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit; industrial or medicinal. . .
23,500,132
21,207,738
19,557,938
26,008,460
33,517,529

27
Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral. . . 
25,202,959
16,553,500
11,581,278
21,222,938
29,670,809

26
Ores, slag and ash
28,402,213
16,693,435
15,816,099
22,397,927
23,663,011

84
Machinery, mechanical appliances, nuclear reactors, boilers; parts thereof
12,727,864
11,361,268
11,647,181
13,848,545
14,791,209

02
Meat and edible meat offal
15,417,191
13,077,586
12,655,793
13,953,388
13,292,305

87
Vehicles other than railway or tramway rolling stock, and parts and accessories thereof
9,808,166
9,604,507
10,971,033
14,724,004
12,652,789

72
Iron and steel
9,605,030
8,927,018
7,892,012
10,761,292
11,804,871

47
Pulp of wood or of other fibrous cellulosic material; recovered (waste and scrap) paper or. . .
5,298,146
5,603,405
5,575,279
6,355,349
8,360,265

23
Residues and waste from the food industries; prepared animal fodder
7,363,381
6,171,801
5,538,918
5,394,736
7,168,012

17
Sugars and sugar confectionery
9,616,253
7,781,310
10,585,665
11,566,378
6,672,492

89
Ships, boats and floating structures
2,167,168
1,985,490
3,841,358
932,484
5,765,291

09
Coffee, tea, maté and spices
6,536,042
6,046,077
5,228,087
5,010,002
4,699,592

10
Cereals
4,438,189
5,724,924
4,109,624
4,980,607
4,621,016

28
Inorganic chemicals; organic or inorganic compounds of precious metals, of rare-earth metals,. . .
3,346,932
3,403,950
3,301,028
3,852,041
4,185,991

88
Aircraft, spacecraft, and parts thereof
4,050,744
4,503,206
4,803,093
4,045,347
3,973,881

85
Electrical machinery and equipment and parts thereof; sound recorders and reproducers, television. . .     
4,216,053
3,649,815
3,239,912
3,435,462
3,458,453

39
Plastics and articles thereof
3,610,243
3,483,327
3,501,806
3,656,340
3,426,433

71
Natural or cultured pearls, precious or semi-precious stones, precious metals, metals clad. . .
2,875,116
2,797,462
3,375,746
3,335,568
3,346,363

44
Wood and articles of wood; wood charcoal
2,243,112
2,271,395
2,361,478
2,779,920
3,182,251

20
Preparations of vegetables, fruit, nuts or other parts of plants
2,258,080
2,150,306
2,209,211
2,273,080
2,516,669

29
Organic chemicals
3,214,660
2,263,941
1,855,794
2,366,033
2,241,528

99
Commodities not elsewhere specified
180,703
149,803
2,124,969
140,501
2,222,181

48
Paper and paperboard; articles of paper pulp, of paper or of paperboard
1,922,180
2,020,963
1,871,020
1,913,082
2,072,495

24
Tobacco and manufactured tobacco substitutes
2,501,868
2,186,217
2,123,366
2,092,161
1,988,179
  • Oh wow, interesting use of the font colour... do you know why my approach didn't work? Also, I intend to scrap the pages well into the past, meaning going through each page. However, as I toggle through pages, the URL does not change which leads me to believe this is some kind of embedded table? – moron Dec 14 '19 at 04:34
  • @moron provide me with the steps to reach the table. as I've reached it only via Programmatic request because the link which you've shared for us is included a parameters which need to be locally generated. – αԋɱҽԃ αмєяιcαη Dec 14 '19 at 04:36
  • Oh i see what you mean... I tried loading the URL again in my browser and I'm brought back to the homepage with the form. The parameters are (from top to bottom, left to right): "Exports", "Product", "TOTAL - All products", "Brazil", NIL, Yearly Time Series I actually intend to grab the monthly timeseries but you will require an existing accounting to access it. Monthly is free – moron Dec 14 '19 at 04:44
  • Actually... I'm trying to learn how I can re-apply this to 65 other countries so I'm looking to learn how to fetch all the data based on given parameters. Apologies for my incompetence, I've just learnt about BeautifulSoup today – moron Dec 14 '19 at 04:48
  • @moron it's 92 country, not 65 – αԋɱҽԃ αмєяιcαη Dec 14 '19 at 05:00
  • Not for all countries – moron Dec 14 '19 at 05:02
  • Can you include some explanations in this answer? I do agree with you that @moron needs to do some work themselves, and open a new question if necessary. – AMC Dec 14 '19 at 06:35