So basically I've gotten into web scraping and I've started collecting data with numbers and strings that I can't seem to separate, with this in mind, how does one see if something is greater/equal/less than something. For example 500 cars, 7000 rocket parts, etc... you get the point
Asked
Active
Viewed 204 times
-1
-
2use regular expression and extract the numbers like `re.findall(r'\d+')` – Epsi95 Feb 16 '21 at 15:24
-
so let's say I would want numbers only for one of the examples above, how would i format it? – Hussein Feb 16 '21 at 15:36
-
1can you be more cleare , maybe wit han example or woth a needed input output? – Leonardo Scotti Feb 16 '21 at 15:38
-
let's say I want to separate 100 cars, I want the numbers only – Hussein Feb 16 '21 at 15:40
2 Answers
1
After extracting your number strings, you convert them to integers or floats. You can then do comparison and numerical operations on them as with any other number. As an example, the following code populates a dictionary which each item found in your text and its integer count:
sample = "For example 500 cars, 7000 rocket parts, etc... you get the point"
items = {}
for idx, word in enumerate(sample.split()):
if word.isnumeric() and idx + 1 != len(sample.split()):
items[sample.split()[idx + 1]] = int(word)
print(items)
Output:
{'cars,': 500, 'rocket': 7000}
1
you can easy find numbers followed by a word like "500 cars"
and extract them with regex, then , using dict comprehension, you can store them in a dictionary
import re
str = "For example 500 cars, 7000 rocket parts, etc... you get the point"
pattern = '(\d+)\s([^\s,.;]+)[^\s]*\s'
yourdict = {obj: int(num) for num, obj in re.findall(pattern, str)}
print(yourdict)
output:
{'cars': 500, 'rocket': 7000}
instead if you only want th integers you can do:
import re
str = "For example 500 cars, 7000 rocket parts, etc... you get the point"
pattern = '(\d+)\s([^\s,.;]+)[^\s]*\s'
yourlist = [int(num) for num, obj in re.findall(pattern, str)]
print(yourlist)
output:
[500, 7000]

Leonardo Scotti
- 1,069
- 8
- 21