-1

I need to copy thousands of 'mini tables' into a CSV. Essentially every 'mini table' should actually be a single row in an CSV table. The issue is, the code from the website looks like this:

<li class="searchResult"> <div> <strong> <a href="www.link.com/">Junior Sales Rep</a> </strong> 
</div> <div class="tableTable"> <div class="tableRow"> <div class="tableCell"> Date of notification: 2022-09-23 <br> End date of waiting period: 2022-09-28 <br> Company Name <br> Toronto (Ontario) 
</div> <div class="tableCell"> PB-78 <br> Selection process: <span>22-563-ZB-B7S/span> </div> 
</div> </div> <div> <br><strong> Name of person being considered: </strong> Samuel Adams </div> <hr class="searchJobHrLine"> </li>

Just from your expertise, is this something that requires custom extensive code to scrape and convert to CSV, or is there a premade way of doing this? I was considering using Beautiful Soup, but before I proceeded I would want a smart person's guidance on the direction I should take, or if this is a lost cause?

Aedam
  • 141
  • 1
  • 9
  • "The issue is, the code from the website looks like this:" Okay; and **why is this an issue**? If we have this input, **what should be the corresponding row** in the output, and **what difficulty do you encounter** in creating that row? Please read [ask] and note well that this is **not a discussion forum**. We do not offer "a smart person's guidance"; we answer a **specific question**. – Karl Knechtel Sep 30 '22 at 21:01
  • @KarlKnechtel the specific question is " I was considering using Beautiful Soup, but before I proceeded I would want a smart person's guidance on the direction I should take, or if this is a lost cause?" aka would BS4 be capable of extracting this information – Aedam Sep 30 '22 at 21:05
  • This is **specifically** what BS4 is for, yes. But we don't take questions about selecting a tool to use. We take questions about how to use a tool that you have already chosen. For the kind of guidance you seek, please try Reddit or Quora. – Karl Knechtel Sep 30 '22 at 21:06
  • 1
    The classes such as `class="tableTable"` are a good way to go. Look into getting values by CSS selector instead of just trying to find elements. – tdelaney Sep 30 '22 at 21:07
  • If doing this with BeautifulSoup, `for table in doc.find_all(class_="tableTable"):` gets you the tables, `for row in table.find_all(class_="tableRow"):` its rows, and `for cell in row.find_all(class+="tableCell"):` gets you its cells. Add some code to put that into python lists and write via the `csv` module, and you are there. Its pretty straightforward. – tdelaney Sep 30 '22 at 21:26

2 Answers2

0

How about:

  1. View the webpage in a browser
  2. Copy the text into a code editor like Sublime or VScode
  3. Use multi-line select (or "tall cursor") to put a cursor at the end (or beginning) of every line
  4. Put a comma at the end of each line, then delete the newline
  5. If there's already an extra newline between the tables, then you have your record separator (you might have to delete some commas). Or you might have to locate some part of the data which you can find/replace to add newlines.
John Skiles Skinner
  • 1,611
  • 1
  • 8
  • 21
  • Stack Overflow is about programming. If this were considered a valid solution to the problem, that would mean the question is off topic. – Karl Knechtel Sep 30 '22 at 21:03
0

I ended up successfully using BS4.

Aedam
  • 141
  • 1
  • 9