1

Trying to extract text from a tag based on href containing a certain string, below is part of my sample code:

Experience = soup.find_all(id='background-experience-container')

Exp = {}

for element in Experience:
    Exp['Experience'] = {}


for element in Experience:
    role = element.find(href=re.compile("title").get_text()
    Exp['Experience']["Role"] = role


for element in Experience:
    company = element.find(href=re.compile("exp-company-name").get_text()
    Exp['Experience']['Company'] = company

It doesn't like the syntax for how I've defined the Exp['outer_key']['inner_key'] = value it is returning SyntaxError.

I'm trying to buld a Dict.dict which contains info on role and company, will also look to include dates for each but haven't got that far yet.

Can anyone spot any glaringly obvious mistakes in my code?

Really appreciate any help with this!

D_usv
  • 433
  • 7
  • 21
  • Possible duplicate of [How to define two-dimensional array in python](http://stackoverflow.com/questions/6667201/how-to-define-two-dimensional-array-in-python) – Lupinity Labs Nov 23 '16 at 00:01
  • It seems like `Exp['Experience']["Role"] = role` does not work because it's essentially uninitialized. From the other ticket, it seems that you may use `.append(...)` instead or initialize the array beforehand. – Lupinity Labs Nov 23 '16 at 00:02
  • @Lupinity mine is a slightly different ask - I want to build an output like the following: {Experience : {role: role_name, company: company_name}, {role: role_name, company: company_name},....} – D_usv Nov 23 '16 at 15:41

1 Answers1

1

find_all can return many values (even if you search by id) so better use list to keep all values - Exp = [].

Experience = soup.find_all(id='background-experience-container')

# create empty list
Exp = []

for element in Experience:
    # create empty dictionary
    dic = {}

    # add elements to dictionary
    dic['Role'] = element.find(href=re.compile("title")).get_text()
    dic['Company'] = element.find(href=re.compile("exp-company-name")).get_text()

    # add dictionary to list
    Exp.append(dic)

# display

print(Exp[0]['Role'])
print(Exp[0]['Company'])

print(Exp[1]['Role'])
print(Exp[1]['Company'])

# or

for x in Exp:
    print(x['Role'])
    print(x['Company'])

if you sure that find_all gives you only one element (and you need key 'Experience') then you can do

Experience = soup.find_all(id='background-experience-container')

# create main dictionary
Exp = {}

for element in Experience:
    # create empty dictionary
    dic = {}

    # add elements to dictionary
    dic['Role'] = element.find(href=re.compile("title")).get_text()
    dic['Company'] = element.find(href=re.compile("exp-company-name")).get_text()

    # add dictionary to main dictionary
    Exp['Experience'] = dic

# display

print(Exp['Experience']['Role'])
print(Exp['Experience']['Company'])

or

Experience = soup.find_all(id='background-experience-container')

# create main dictionary
Exp = {}

for element in Experience:
    Exp['Experience'] = {
       'Role': element.find(href=re.compile("title")).get_text()
       'Company': element.find(href=re.compile("exp-company-name")).get_text()
    }

# display

print(Exp['Experience']['Role'])
print(Exp['Experience']['Company'])
furas
  • 134,197
  • 12
  • 106
  • 148
  • Thanks for your response, I've tried amending my code using your first resolution however I'm getting an error dic['Company'] = element.find(href=re.compile("exp-company name").get_text() ^ SyntaxError: invalid syntax – D_usv Nov 23 '16 at 15:11
  • I forgot `)` before `.get_text()` - ie. `element.find(href=re.compile("exp-company-name")).get_text()` – furas Nov 23 '16 at 22:06