0

I'm trying to extract a text from the forum website, it works good but there's a problem with split the text

s = "Username[^\"]+<br"
r = requests.get("https://example.com/threads/73956/page2", headers=headers, cookies=cookies)
soup = BeautifulSoup(r.content, "html.parser")
comments = soup.find_all('div',{'class':'wwCommentBody'})
for div in comments:
    bq = div.find('blockquote',{'class':'postcontent restore'})
    result = re.findall(s, r.text)
    print (result)

It prints like this:

['Username: Marvel<br']

How can I make it extracts to the last letter of the username value Marvel without the <brand ['']?

<div class="wwCommentBody">
     <blockquote class="postcontent restore " style="padding: 10px;">Username: 
     leetibrahim<br>
    Number: 2       
     </blockquote>
</div>
Marvel
  • 29
  • 5

1 Answers1

1

To get rid of the captured <br, change your regex to the following:

s = '(Username[^"]+)(?:<br)'

Or, with the manual escape:

s = "(Username[^\"]+)(?:<br)"

as those are equivalent. More about non-capturing groups.

It's printing as ['Username: leetibrahim'] because it is a list. To get the first element of the list:

print(result[0])

To get all elements of the list:

for item in result:
    print(item)

To get just the last 'm' from the username:

result[0][-1]

That works by getting the first element (index 0) of the list and then the last element (index -1) of the string.

Ezra
  • 471
  • 3
  • 14
  • You saved me!! thank you so much, please I've a last little question, if I want to get another string with its value which it will be like this `Username: leetibrahim Number: 2` How can i do it – Marvel Jun 28 '21 at 17:19
  • 1
    If I understand you correctly: `print(result[0])` – Ezra Jun 28 '21 at 17:20
  • didn't work actually, it prints only `Username: leetibrahim` there's something should I edit it in my regax? – Marvel Jun 28 '21 at 17:23
  • 1
    You can edit your regex, like this: `s = "(Username[^\"]+)(?:)"` and then remove the `
    ` like this: `print(result[0].replace("
    ", ""))`.
    – Ezra Jun 28 '21 at 17:27
  • 2
    you can also do it with bs4: `print(bq.get_text())` – Ezra Jun 28 '21 at 17:28
  • Bro I don't know what to tell you but thank you!! ❤ – Marvel Jun 28 '21 at 17:29
  • 1
    No problem! Just upvote my comments/answer if they were helpful – Ezra Jun 28 '21 at 17:32