I'm scraping a website in order to store data in a database that has 3 columns. The part of the webstsite i'm scraping looks like one of either of the three examples below
# Example 1:
<div>
<a href="sample1">text1</a>
</div>
# Example 2:
<div>
<a href="sample1">text1</a>
<a href="sample2">text2</a>
</div>
# Example 3:
<div>
<a href="sample1">text1</a>
<a href="sample2">text2</a>
<a href="sample3">text3</a>
</div>
I'm trying to assign
- "text1" to var1,
- either an empty string or "text2" to var2,
- either an empty string or "text3" to var3.
What is the best method to do this??
A few things I've tried are
### FIRST ATTEMPT
var1, var2, var3 = '','',''
# could also do var1, var2, var3 = ('',)*3
all = soup.find_all('a')
var1 = all[0].text
try:
var2 = all[1].text
except:
pass
try:
var3 = all[3].text
except:
pass
#### SECOND ATTEMPT
all = [s.text for s in soup.find_all('a')]
# This is where i get stuck... This could return a list of length 1, 2, or 3, and I need the output to be a list of length 3 so i can use the following line to assign variables
var1, var2, var3 = all
#### THIRD ATTEMPT
all = [s.text for s in soup.find_all('a')]
var1, var2, var3 = '','',''
n = len(all)
var1 = all[0].text
if n = 2:
var2 = all[1].text
else:
var2 = all[1].text
var3 = all[2].text
EDIT: The reason i'm trying to have three fields in my db is because I want to be able to filter by each of these different variables. var1 is the most accurate label, var2 is slightly more accurate, and var3 is accurate at a high level. Think of it like clothing... var1 could be grey-slacks, var2 could be business-slacks, and var3 could be pants.