-1

I want to extract specific content from string. Consider the following dataframe:

data = {'time': [0, 1, 2, 3, 4], 'id': ["bike0", "bike10", "veh0", "veh10", "moto100"]}  
df = pd.DataFrame(data)

I would like to extract with a regular expression the digit value in the string. The final result should be:

data = {'time': [0, 1, 2, 3, 4], 'id': [0, 10, 0, 10, 100]}  
df = pd.DataFrame(data)

The difficulty here is that the length of the string and the number of digits to extract are variable.

Thanks for help.

  • 1
    Does this answer your question? [How to extract numbers from a string in Python?](https://stackoverflow.com/questions/4289331/how-to-extract-numbers-from-a-string-in-python) – Kylian Jan 17 '22 at 14:43

2 Answers2

0

You can grab a sequence of digits at the end of each string in the id column, then covert them integers and reassign to the id column.

df['id'] = df.id.str.extract(r'(\d+)$').astype(int)
James
  • 32,991
  • 4
  • 47
  • 70
  • are you sure? it raises an `AttributeError: 'dict' object has no attribute 'id'` – XxJames07- Jan 17 '22 at 14:50
  • Yes, I am sure. The error indicates you are trying access the `id` attribute of a dictionary, not a data frame. Did you run it on `data` by mistake? – James Jan 17 '22 at 14:52
  • @XxJames07-, the answer from @James also works (python3.6) so I've upvoted it. For sure you have used `data.id` instead of `df.id`. Maybe you can post also the line above `AttributeError: 'dict' object has no attribute 'id'` from stack trace. – Marcel Preda Jan 17 '22 at 17:54
  • yes, sorry for my mistake, i forgot to say i've tested it and it works, thanks for your time. – XxJames07- Jan 17 '22 at 18:39
-1

I hope that below code is OK. It removes all alpha characters. You can extend it to special chars.

import pandas as pd
data = {'time': [0, 1, 2, 3, 4], 'id': ["bike0", "biKe10", "veh0", "veh10", "moto100"]}  
df = pd.DataFrame(data)
df["id"] = df["id"].str.replace(r"[a-z]","", case=False)
print(df)
Marcel Preda
  • 1,045
  • 4
  • 18