0

I have a dataset with many columns. I would like to search in one of these any numbers:

Column_to_look_at

10 days ago I was ...
How old are you?
I am 24 years old
I do not know. Maybe 23.12?
I could21n  .... 

I would need to create two columns: one which extracts the numbers included in that column and another one which just has boolean values if a row contains or does not a number.

Output I would expect

Column_to_look_at                 Numbers          Bool

10 days ago I was ...               [10]            1
How old are you?                    []              0
I am 24 years old                   [24]            1
I do not know. Maybe 23.12 or 23.14?   [23.12, 23.14]  1
I could21n  ....                     [21]           1

The code I applied to select numbers is this

df[df.applymap(np.isreal).all(1)]

but actually this is not give me the outpuut expected (at least for number selection). Any suggestions on how to extract digits from that column would be appreciated. Thanks

  • 2
    You need a regex pattern match to fetch you numerical data from each line. – Serial Lazer Nov 03 '20 at 13:27
  • thanks. Something like this? `df.Column_to_look_at.str.extract('(\d+)')`. How could I assign a boolean value? –  Nov 03 '20 at 13:32
  • 1
    This would be helpful: https://stackoverflow.com/questions/4289331/how-to-extract-numbers-from-a-string-in-python/4289415#4289415 – Serial Lazer Nov 03 '20 at 13:34

1 Answers1

0

This will do

def checknum(x):
    num_list = re.findall(r"[+-]?\d+(?:\.\d+)?", x['Column_to_look_at'])
    return num_list

df['Numbers'] = df.apply(checknum, axis=1)
df['Bool'] = df.apply(lambda x: 1 if len(x['Numbers']) > 0 else 0, axis=1)