I made a small program in pyhton that searches through a music website and collects music data. The music has a format of [artist] - [music name] [music file format]
. At first I used re.search
to find a certain artist (I used regex because there are some other characters and irregularities in the music info above, and the only indicator for finding the artist was the -
following the artist).
Somehow it didn't work so I changed it to re.findall
just in case but it still didn't work. since I'm a beginner at python I thought I sis something wrong so I wrote some test code to study what was wrong. And this is what I got.
when I changed the x
string (which would be the music info) and ran re.findall
again it gave me a different result(none). I 100% thought the result would be the same. why is this behaving like this? And could this be the reason why my original code's re.serach
, re.findall
wasn't working?
I've included the code just in case. (used selenium)
idx = 1
while True:
try:
hxp1 = "(//h3[@class='entry-title td-module-title']/a)[" + str(idx) + "]"
text = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, hxp1)))
# info = eg) 'Michael Jackson - Beat it [FLAC, MP3, WAV]'
info = text.get_attribute('title') # get 'info' as string
# ARTIST = eg) 'Michael Jackson'
regex = ARTIST + ' - '
match = re.findall(regex, info) # or use re.search
# do something with 'match'...
idx += 1
except:
# do something...
break