I am trying to extract only numbers and only strings in two different dataframes. I am using regular expression to extract numbers and string.
import pandas as pd
df_num = pd.DataFrame({
'Colors': ['lila1.5', 'rosa2.5', 'gelb3.5', 'grün4', 'rot5', 'schwarz6', 'grau7', 'weiß8', 'braun9', 'hellblau10'],
'Animals': ['hu11nd', '12welpe', '13katze', 's14chlange', 'vo15gel', '16papagei', 'ku17h', '18ziege', '19pferd',
'esel20']
})
for column in df_num.columns:
df_num[column] = df_num[column].str.extract('(\d+)').astype(float)
print(df_num)
I have also tried using '([\d+][\d+\.\d+])' and '([\d+\.\d+])'
Here I am getting output but not what I am expecting. Though I am expecting float numbers I am not getting 1.5 or 2.5.
I am getting something like below image:
df_str = pd.DataFrame({
'Colors': ['lila1.5', 'rosa2.5', 'gelb3', 'grün4', 'rot5', 'schwarz6', 'grau7', 'weiß8', 'braun9', 'hellblau10'],
'Animals': ['hu11nd', '12welpe', '13katze', 's14chlange', 'vo15gel', '16papagei', 'ku17h', '18ziege', '19pferd',
'esel20']
})
for column in df_str.columns:
df_str[column] = df_str[column].str.extract('([a-zA-Z]+)')
print(df_str)
In this case when the number is at the end or in the beginning then I am getting the string but if the number placed in the middle or any other place then the result which I expect I am not getting. Current output is like below image:
I think my regular expression is not correct. Which will be the right regular expression to solve these problems? Or is there any other way to extract only numbers and only strings in pandas dataframe?