3

As an example lets say I wanted to record all bios of users on SO.

Lets say I loaded up: How to click an element in Selenium WebDriver using JavaScript

I clicked all users: .user-details a (11 of them)

I wrote Extracted text -> to a csv.

driver.get(‘Version compatibility of Firefox and the latest Selenium IDE (2.9.1.1-signed)’)

I read from csv the users.

user: Ripon Al Wasim [Is present again, do not click him] ??? How can this be achieved. As its text.

Is something like this accomplish-able or is this a limitation of selenium python?

You could click all of them, but lets say you had to scrape 200 pages and common name Bob popped up 430 times. I feel like it is unnecessary to click his name. Is something like this possible with Selenium?

I feel like I'm missing something and this is achievable but I am unaware how.

You could compare the text of text file and print(elem.get_attribute("href")) -> write that to a file and compare them. If elements were present, delete them but this is text. You could (maybe) put the text in an excel file. I'm not entirely sure if this is possible but you could write the css elements individually beside the text in the excel. And Delete rows where there are matched strings. And then get Selenium to load that up into Webdriver.

I'm not entirely convinced even this would work.

Is there a sane way of clicking css but ignoring names in a text file you have already clicked.

1 Answers1

1

There's nothing special here with Selenium. That is your tool for interacting with the browser. It is your program that needs to decide how to do that interaction, and what you do with the information from it.

It sounds like you want to build a database of users, so why not use a database? something like SQLite or PostgreSQL might work nicely for you. Among the user details, store the name as it appears in the link (assuming it will be unique for each user), and index that name. when scraping your page, pull that link text, then use SQL statements to search if the record exists by that name, if not, then click the link and add a new record.

Breaks Software
  • 1,721
  • 1
  • 10
  • 15
  • Alright, that makes sense. Very simple approach. HMM... yes so if its href. You can get the text of href as well as the link itself. Then you can use a database as you mention or excel you'd delete rows that exist elsewhere in excel), but I can sort of see how that can be done. Good thing its href. Sounds like a pain to set up as I've unfamiliar with this approach, but in terms of saving the amount of pages clicking its a help. I like it! :D –  Nov 30 '17 at 12:38
  • I believe SQLite is local, right? No server needed to buy? I will look into this further. –  Nov 30 '17 at 12:45
  • Glad you like it. any sort of database back end will do. If you're familiar with excel, you can go with that. I would just recommend that you design your solution to be modular so that you can easily swap out other back end if excel doesn't work for you later on. For instance, I don't know what the performance will be like. – Breaks Software Nov 30 '17 at 12:46
  • Excellent point. I'd rather avoid heavy cpu usage. I don't think SQLlite or PostgreSQL are heavy cpu though. Do they require excel? I must admit I've been using vbscript and I'm learning Pandas to get away from cpu intensive methods. –  Nov 30 '17 at 12:49
  • tinydb? If you're trying to use Pandas, then perhaps just using a DataFrame data structure within pandas will be sufficient (as long as you've got enough memory to work with). – Breaks Software Nov 30 '17 at 12:53
  • oh wow, the choice is overwhelming :). I feel like there's a lot of good approaches for this and all are quite interesting. –  Nov 30 '17 at 12:57