Extracting link text and writing to file

Question

I have crawler that extract links from page only if the link text include given text and I'm writing the output to html file. Its working but I would like to add whole link text next to these links like this - "Junior Java developer - https://www.jobs.cz/junior-developer/" How can I do this?

Thanks

import requests
from bs4 import BeautifulSoup
import re

def jobs_crawler(max_pages):
    page = 1
    file_name = 'links.html'

    while page < max_pages:
        url = 'https://www.jobs.cz/prace/praha/?field%5B%5D=200900011&field%5B%5D=200900012&field%5B%5D=200900013&page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        page += 1
        file = open(file_name,'w')

        for link in soup.find_all('a', {'class': 'search-list__main-info__title__link'}, text=re.compile('IT', re.IGNORECASE)):
            href = link.get('href') + '\n'
            file.write('<a href="' + href + '">'+ 'LINK TEXT HERE' + '</a>' + '<br />')
            print(href)
        file.close()

    print('Saved to %s' % file_name)

jobs_crawler(5)

Possible duplicate of [Python: BeautifulSoup extract text from anchor tag](https://stackoverflow.com/questions/11716380/python-beautifulsoup-extract-text-from-anchor-tag) — GPhilo, Mar 19 '18 at 13:28

score 1 · Accepted Answer · answered Mar 19 '18 at 13:47

1

This should help.

file.write('''<a href="{0}">{1}</a><br />'''.format(link.get('href'), link.text ))

answered Mar 19 '18 at 13:47

Rakesh

81,458
17
76
113

Thanks, this is amazing!! – user1994521 Mar 20 '18 at 15:00

score 0 · Answer 2 · answered Mar 19 '18 at 13:39

0

Try this:--

 href = link.get('href') + '\n'
 txt = link.get_text('href') #will give you text

answered Mar 19 '18 at 13:39

Narendra

1,511
1
10
20

Extracting link text and writing to file

2 Answers2