1

I'm trying to extract digits from a unicode string. The string looks like raised by 64 backers and raised by 2062 backers. I tried many different things, but the following code is the only one that actually worked.

backers = browser.find_element_by_xpath('//span[@gogo-test="backers"]').text
match = re.search(r'(\d+)', backers)
print(match.group(0))

Since I'm not sure how often I'll need to extract substrings from strings, and I don't want to be creating tons of extra variables and lines of code, I'm wondering if there's a shorter way to accomplish this?

I know I could do something like this.

def extract_digits(string):
    return re.search(r'(\d+)', string)

But I was hoping for a one liner, so that I could structure the script without using an additional function like so.

backers = ...
title = ...
description = ...
...

Even though it obviously doesn't work, I'd like to do something similar to the following, but it doesn't work as intended.

backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text)

And the output looks like this.

<_sre.SRE_Match object at 0x000000000542FD50>

Any way to deal with this?!

Andrei Suvorkov
  • 5,559
  • 5
  • 22
  • 48
oldboy
  • 5,729
  • 6
  • 38
  • 86
  • where are `title` and `description` coming from? It'd be better if you can give some input and expected output. – Ashish Acharya Jul 06 '18 at 04:59
  • @AshishAcharya you don't need to worry about `title` and `description`, that was to simply show how i'd like to structure my code without the use of an additional function, ideally – oldboy Jul 06 '18 at 05:03
  • @Anthony, What about Regex `raised by (.*) backers` to extract only digit ? may be like `import re re.match(r"raised by (.*) backers", string)` – NarendraR Jul 06 '18 at 05:05

2 Answers2

2

As an option you can skip using regex and use built-in Python isdigit() (no additional imports needed):

digit = [sub for sub in browser.find_element_by_xpath('//span[@gogo-test="backers"]').text.split() if sub.isdigit()][0]
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • Why doesn't `backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text)[0]` or `backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text)(0)` or something like that work, but yours does and the other buddy's whose answer is so similar to that? – oldboy Jul 06 '18 at 05:34
  • AFAIK `backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text)[0]` actually should work also... – Andersson Jul 06 '18 at 05:46
  • @Andersson No it should be `group(0)` in the end, please check my answer – Andrei Suvorkov Jul 06 '18 at 05:47
  • i'm gonna try `backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text).group(0)` tomo when i wake up n c if that works! – oldboy Jul 06 '18 at 05:49
  • @AndreiSuvorkov , `[0]` should work as well – Andersson Jul 06 '18 at 05:55
  • @AndreiSuvorkov it doesn't. i get a `TypeError` – oldboy Jul 06 '18 at 19:39
  • i actually prefer this answer because then i don't need to import yet another module – oldboy Jul 07 '18 at 17:30
  • hey, would you mind breaking down the syntax for me? `for sub in browser.element.text.split()` obviously breaks the string up into chunks where there are spaces and creates a list of the items. the very first `sub` is what is returned `if sub.isdigit()`??? also, how come you have to contain it all in a list??? – oldboy Jul 07 '18 at 21:45
  • Yeah, it breaks the string into list of sub-strings and check each sub-string in list. If it's a digit `sub.isdigit()` returns `True` - the sub-string is added to new list. If there are more than one digit in a string and you want to get all of them in a list - simply remove element index as `digits = [sub for sub in browser.find_element_by_xpath('//span[@gogo-test="backers"]').text.split() if sub.isdigit()]` – Andersson Jul 08 '18 at 06:04
  • @Anthony it is your choice to mark an answer you like. Andersson has provided a good answer, which I have of course upvoted – Andrei Suvorkov Jul 08 '18 at 07:44
1

You can try this:

number = backers.findall(r'\b\d+\b', 'raised by 64 backers')

output:

64

So the method could be like this:

def extract_digits(string):
    return re.findall(r'\b\d+\b', string)

DEMO here

EDIT: since you want everything in one line, try this:

import re

backers = re.findall(r'\b\d+\b', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text)[0]

PS:

search ⇒ find something anywhere in the string and return a match object
findall ⇒ find something anywhere in the string and return a list.

Documentation:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

Documentation link: docs.python.org/2/library/re.html

So to do the same with search use this:

backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text).group(0)
Andrei Suvorkov
  • 5,559
  • 5
  • 22
  • 48
  • i'll try that right away. i don't need to `import re` for that do i? – oldboy Jul 06 '18 at 05:12
  • i'd prefer to stick it ALL in one statement, though. something like: `backers = browser.find_element_by_xpath('//span[@gogo-test="backers"]').text.findall(r'\b\d+\b', string)` even though i know that isn't a thing – oldboy Jul 06 '18 at 05:14
  • I think you can do this as well, if it is neccesary to do this in one string. But it will be not very good for reader – Andrei Suvorkov Jul 06 '18 at 05:15
  • i've edited my question. i tried doing that, but it doesn't work as intended – oldboy Jul 06 '18 at 05:16
  • the output of `backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text)` is `<_sre.SRE_Match object at 0x000000000542FD50>` :..( – oldboy Jul 06 '18 at 05:18
  • I have edited my code please have a look – Andrei Suvorkov Jul 06 '18 at 05:20
  • `<_sre.SRE_Match object at 0x000000000542FD50>` its return a list of elements, so at the end of statement just add `[0]` to get first element. The answer is edited – Andrei Suvorkov Jul 06 '18 at 05:23
  • i've tried adding `[0]` and even `(0)` to the end of `backers = re.search(r'(\d+)', browser.find_element_by_xpath('//span[@gogo-test="backers"]').text)`, but i get the following error: `TypeError: '_sre.SRE_Match' object has no attribute '__getitem__'`. i tried yours and it works, but i don't like the idea of using `findall` instead of `search` since there will only ever be one match... why does yours (`findall`) work and mine (`search`) doesn't? explain that in your answer and i'll mark it as correct! – oldboy Jul 06 '18 at 05:27
  • `search ⇒ find something anywhere in the string and return a match object` and `findall ⇒ find something anywhere in the string and return a list`. Docu: Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. Documentation https://docs.python.org/2/library/re.html – Andrei Suvorkov Jul 06 '18 at 05:36
  • yeah i read the docs. so how would i access the substring/digits from that MatchObject?!?! – oldboy Jul 06 '18 at 05:39
  • I have added info in the answer, please have a look – Andrei Suvorkov Jul 06 '18 at 05:44
  • i actually prefer the other answer since i don't have to import yet another module – oldboy Jul 07 '18 at 17:31