background:
I am learning about web scraping and decided to use python and beautiful soup to scrape, this program will ask the user for a link and will narrow down their HTML search in the webpage.
problem:
When I ask the user to define their own extension for soup page( EX .div.div.a ) and I append this to the whole string and try and execute it in a print function it always returns None. How would I go about running the extension and printing it from the gathered user input? for this example, I am scraping a Newegg search for graphics cards.
example link:https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20cards
keep in mind from the code below, I had already used findAll for div class="item-info", so it would be searching the extension in that code block.
I have already tried exec() the string but this does not seem to work
isdone = ""
while isdone != "done":
try:
route = "container"
userinput = input("what extensions would you like to search for?\n seperate each denotion with a space \n ex: div div img[\"title\"]\n: ")
inputRoute = userinput.split(' ')
for i in range(len(inputRoute)):
route += "." + inputRoute[i]
print("---\n"+route+"\n---")
print("Current Route ^\n---")
print("output:\n", exec(route),"\n---")#actual resaults if user had inputed a
print(container.a) # what i actually want to output (if the user only inputed a)
#add the abilitie to add extensions ex: container.div.a.img["foo"] -ignore this stackoverflow
isdone = input("are you happy with these extensions? \n type 'done' when happy\n or enter to change extension\n: ")
except Exception as e:
print(e)
input("Make sure their is no leftover spaces\npress enter to continue")
'#' are my comments throughout output THIS IS CONSOLE OUTPUT:
'what extensions would you like to search for?
seperate each denotion with a space
ex: div div img["title"]
: a # <--what I put in the input
---
container.a #what
---
Current Route ^
---
output:
None # <-- what actually outputs when i use exec()
---
<a class="item-brand" href="https://www.newegg.com/EVGA/BrandStore/ID-1402">
<img alt="EVGA" class="lazy-img" data-effect="fadeIn" data-src="//c1.neweggimages.com/Brandimage_70x28//Brand1402.gif" src="//c1.neweggimages.com/WebResource/Themes/2005/Nest/blank.gif" title="EVGA">
</img></a>
are you happy with these extensions?
type 'done' when happy
or enter to change extension
:'