1

I have the below code

import re
age = []

txt = ('9', "10y", "4y",'unknown')
for t in txt:
    if t.isdigit() is True:
        age.append(re.search(r'\d+',t).group(0))
    else:
        age.append('unknown')
print(age)

and I get: ['9', 'unknown', 'unknown', 'unknown']

So the 9 I get, but I also need to get the 10 in the second position, the 4 in the third and unknown for the last.
Can anyone point me in the right direction? Thank you for your help!

pauliec
  • 406
  • 2
  • 8
  • 1
    I don't know why it is selected as duplicate I don't see any duplication.a – Mehdi Golzadeh Oct 30 '20 at 00:09
  • I think the flag is correct. I did go through an hour of stack overflowing before submitting the question getting hung up on different items..I didn't see the answer that my question is similar to. The answer from that question is similar to @Erfan pandas solution. I must have missed it. Thank you all for the help – pauliec Oct 30 '20 at 11:41

3 Answers3

2

We can make use of the fact that re.search returns None when not finding any digit:

txt = ('9', "10y", "4y",'unknown')
age = []
for t in txt:
    num = re.search('\d+', t)
    if num:
        age.append(num.group(0))
    else:
        age.append('unknown')
['9', '10', '4', 'unknown']

Since you tagged pandas, if you have a column, use str.extract:

pd.Series(txt).str.extract('(\d+)')
0      9
1     10
2      4
3    NaN
dtype: object

Erfan
  • 40,971
  • 8
  • 66
  • 78
  • Thank you!!! That's it! Geesh...I should have put more context in the question....I have been doing some volunteer work for a pet shelter...one of the columns in the data frame is xY yM for years and months. I only need the age to do some analysis so the pandas idea is probably the way to go. thanks again! – pauliec Oct 29 '20 at 23:22
0
import re
age = []

txt = ('9', "10y22", "4y", 'unknown')

for t in txt:
    res = re.findall('[0-9]+', t)
    if res:
        age.append(res[0])
    else:
        age.append("unknown")
sahasrara62
  • 10,069
  • 3
  • 29
  • 44
0
import re


age = []

txt = ('9', "10y", "4y",'unknown')
for t in txt:
    if len(t) > 1 and not t.isdigit():
        t = t.replace(t[-1], '')
    if t.isdigit() is True:
        age.append(re.search(r'\d+',t).group(0))
    else:
        age.append('unknown')
print(age)

Check this out. So the len function checks if the string is bigger than one, and then if the last letter of the string is not a digit, then the string's last letter is being replaced with an empty space. And then it follows the rest of your algorithm. You can modify it more to fit your requirements, since you didn't specify that much.

todovvox
  • 170
  • 10