-1

I used two methods to get Page Source of an internal application link.

  1. first Method - used Robot Framework Keyword ${html_page} =. Get Source
  2. Second Method -
    • using request -- visit_url_content = urllib.request.urlopen(url).read().decode('utf-8') and
    • visit_url_content = requests.get(url, 'html.parser').text

After getting page source i am extracting all links with tag a and attribute as 'href' using beautifulsoup. soup = BeautifulSoup(html_page, "html.parser")

with first method i get about 20 links but with second method i get 2 links only... I need to process this in python so cannot use robot framework option. Any help as to why it might be happening

Mahak Malik
  • 165
  • 2
  • 11

1 Answers1

0

It is a bit unclear how your code exactly looks like, since you only posted a few code snippets. I assume it looks something like this:

import urllib.request
from bs4 import BeautifulSoup

URL = "your-url"

html = urllib.request.urlopen(URL).read().decode('utf-8')

soup = BeautifulSoup(html, "html.parser")

for a in soup.find_all('a', href=True):
    print(a["href"])

Based on StackOverflow: BeautifulSoup getting href

Does this code differ in some way from yours? Can you share the complete code of yours that crawls the website / the URL you want to crawl? Otherwise it is hard to find out what the problem is.

t4khosu
  • 1
  • 2