-1

So basically I've gotten into web scraping and I've started collecting data with numbers and strings that I can't seem to separate, with this in mind, how does one see if something is greater/equal/less than something. For example 500 cars, 7000 rocket parts, etc... you get the point

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Hussein
  • 97
  • 7

2 Answers2

1

After extracting your number strings, you convert them to integers or floats. You can then do comparison and numerical operations on them as with any other number. As an example, the following code populates a dictionary which each item found in your text and its integer count:

sample = "For example 500 cars, 7000 rocket parts, etc... you get the point"

items = {}
for idx, word in enumerate(sample.split()):
    if word.isnumeric() and idx + 1 != len(sample.split()):
        items[sample.split()[idx + 1]] = int(word)
print(items)

Output:

{'cars,': 500, 'rocket': 7000}
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
pakpe
  • 5,391
  • 2
  • 8
  • 23
1

you can easy find numbers followed by a word like "500 cars" and extract them with regex, then , using dict comprehension, you can store them in a dictionary

import re
str = "For example 500 cars, 7000 rocket parts, etc... you get the point"
pattern = '(\d+)\s([^\s,.;]+)[^\s]*\s'
yourdict = {obj: int(num) for num, obj in re.findall(pattern, str)}
print(yourdict)

output:

{'cars': 500, 'rocket': 7000}

instead if you only want th integers you can do:

import re
str = "For example 500 cars, 7000 rocket parts, etc... you get the point"
pattern = '(\d+)\s([^\s,.;]+)[^\s]*\s'
yourlist = [int(num) for num, obj in re.findall(pattern, str)]
print(yourlist)

output:

[500, 7000]
Leonardo Scotti
  • 1,069
  • 8
  • 21