Python loop in beautiful soup regex list

Question

When i run the code below, i get three lists, one below the other vertically. I want them to be horizontal, separated by a comma (similar to the last print list statement, where the data is separated by a comma). I tried rearranging the for loop statements, and I get all sorts of combinations, but nothing like I wanted that I described above. Please help!

import bs4 as bs
import urllib.request
import re

sauce = urllib.request.urlopen('http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27&sectionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3').read()
soup = bs.BeautifulSoup(sauce,'lxml')

regexQ = re.compile('.*Date1 Qty.*')
regexC = re.compile('.*Footnote.*')
regexV = re.compile('.*Date1 Val.*')

for countryPart in soup.findAll("a",{"href":regexC}):
        Country = countryPart.text.strip()
        print(Country)
for DatePart in soup.findAll("td",{"headers":regexQ}):
        Quantity = DatePart.text.strip()
        print(Quantity)
for ValPart in soup.findAll("td",{"headers": regexV}):
        Value = ValPart.text.strip()
        print(Value)

list = [Country,Quantity,Value]
print(list)

score 1 · Answer 1 · answered Mar 03 '18 at 07:36

Have a look at List Comprehensions.

Also, you don't need .* to match any character while using regex in BeautifulSoup.

Use this to get what you want:

regexQ = re.compile('Date1 Qty')
regexC = re.compile('Footnote')
regexV = re.compile('Date1 Val')

country = [x.text.strip() for x in soup.find_all("a", {"href": regexC})]
quantity = [x.text.strip() for x in soup.find_all("td", {"headers": regexQ})]
value = [x.text.strip() for x in soup.find_all("td", {"headers": regexV})]

total_list = [list(x) for x in zip(country, quantity, value)]
for item in total_list:
    print(item)

Output:

['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']

score 1 · Answer 2 · answered Mar 03 '18 at 10:21

You can do it without using regex. Try the below approach instead to achieve the same. I used list comprehensions.

Using urllib:

from urllib.request import urlopen
from bs4 import BeautifulSoup

res = urlopen("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27&sectionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
soup = BeautifulSoup(res.read(),"lxml")
for items in soup.find_all(class_="ResultRow"):
    data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
    print(data)

Using requests:

import requests
from bs4 import BeautifulSoup

res = requests.get("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27&sectionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find_all(class_="ResultRow"):
    data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
    print(data)

Output:

['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']

metamemelord · Answer 3 · 2018-03-03T06:45:51.477

0

Try collapsing your Country and other results to a list.

Then try this:

for mylist in lists:
    print(*mylist, end=", ")

edited Mar 03 '18 at 06:45

answered Mar 03 '18 at 06:37

metamemelord

500
1
7
19

Python loop in beautiful soup regex list

3 Answers3