0

I am currently trying to use regex to isolate values within string values from list and append the only the numbers to a new list. Yes, I am aware of this post (Regular Expressions: Search in list) and am using one of the answers from it but for some reason it is still including the text part of the values in the new list.

[IN]:
['0.2 in', '1.3 in']

snowamt = ['0.2 in', '1.3 in']
r = re.compile("\d*\.\d*")
newlist = list(filter(r.match, snowamt)) # Read Note
print(newlist)

[OUT]:
['0.2 in', '1.3 in']

I have tried so many combinations of regex and I just can't get it. Can someone please correct what I know is a stupid mistake. Here are just a few of the regex's I've tried:

"(\d*\.\d*)"
"\d*\.\d*\s"
"\d*\.\d*\s$"
"^\d*\.\d*\s$"
"^\d*\.\d*\s"

My end goal is to sum up all the values in the list generated above and I was initially able to get around using re.compile by using re.split :

inches_n = [ ]
i = 0
for n in snowamt:
    split = re.split(" ", n, maxsplit=0, flags=0)
    inches_n.append(split[0])
i += 1

print(inches_n) 

The problem is that the value '-- in' may show up in the original list as I am getting the numbers by scraping a website (weather underground which is okay to scrape) and it would less steps if I could just select for the numbers initially with regex because with re.split I have to add an extra step to reiterate through the new list and only select for the numbers.

Anyway can someone please correct my regex so I can move on with my life from this problem, thank you!

Rachel Cyr
  • 429
  • 1
  • 5
  • 15
  • In the first code example, what do you want the output to be instead? – Karl Knechtel Mar 31 '21 at 22:23
  • Just to explain the correct answer below, what YOUR code is doing is asking "Does this string CONTAIN a number? If so, keep it". You aren't EXTRACTING the number. – Tim Roberts Mar 31 '21 at 22:23
  • 1
    So your basic problem was not your regex but your use of filter which passed all strings that contained a number. To get just the numbers from the string you could use map rather than filter as in `list(map(lambda x: r.match(x).group(), snowamt))`. (using your definition of r). But, its simpler to use list comprehension as in the posted answer. – DarrylG Mar 31 '21 at 22:40

1 Answers1

1

To get only digits from the list, you can use this example:

import re

snowamt = ["0.2 in", "1.3 in"]
r = re.compile(r"(\d+\.?\d*)")

newlist = [m.group(1) for i in snowamt if (m := r.match(i))]
print(newlist)

Prints:

['0.2', '1.3']
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91