-2

I have a pandas dataframe, which contains some pretty infiltered data

df['Q53']
OUTPUT:
0      Hvor mange timer træner din virksomhed medarbe...
3                                                    NaN
4                                                    NaN
5                                                    NaN
6                                                      2
7                                                    NaN
8                                                     10
9                                                    NaN
10                                                    50
11                                                   NaN
12                                                     ?
13                                                     ?
14                                                     8
15                                                   NaN
16                                                     2
17                                                     0
18                                                     1
19                                                     1
20                                                     5
21                                             7x3 timer
22                                                   NaN
23                                               8 timer
24                                                   NaN
25                                                     0
26                                                     8
27                                                   NaN

the issue now, is that i want to just have the integers left in the column, and then cast them as a float, so i can do some data visualization with the column.

I was wondering if i could do some standardized filtering, but i could not get get it to work.

Is there an operation, where i can filter out all NaN and String values, and just be left with a value, that could be casted into a float or int

Kristoffer Tølbøll
  • 3,157
  • 5
  • 34
  • 69

2 Answers2

2

You can check if isdigit to select only True columns.

df[df['Q53'].apply(lambda x: str(x).isdigit())]
BernardL
  • 5,162
  • 7
  • 28
  • 47
  • 1
    Based on OP's additional request in the comments it is obvious this will not work, rather than using `eval`, I think its better to understand all the criteria in your combinations before updating the answer. – BernardL Sep 26 '19 at 08:58
2

Convert values to numeric and then remove misisng values:

df['Q53'] = pd.to_numeric(df['Q53'], errors='coerce')
df = df.dropna(subset=['Q53'])
print (df)
     Q53
6    2.0
8   10.0
10  50.0
14   8.0
16   2.0
17   0.0
18   1.0
19   1.0
20   5.0
25   0.0
26   8.0

EDIT: I sugggest not use eval, here is solution for extract numeric with x between, convert to floats and multiple between.

Then replace missing values with all numeric extracted by \d+ patter in regex and converted to numbers.

Last remove missing values by DataFrame.dropna:

dfx = df['Q53'].astype(str).str.extract('(\d+)x(\d+)').astype(float).prod(axis=1, min_count=1)

num = df['Q53'].astype(str).str.extract('(\d+)', expand=False)

df['Q53'] = dfx.fillna(pd.to_numeric(num, errors='coerce'))
df = df.dropna(subset=['Q53'])
print (df)
     Q53
6    2.0
8   10.0
10  50.0
14   8.0
16   2.0
17   0.0
18   1.0
19   1.0
20   5.0
21  21.0
23   8.0
25   0.0
26   8.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252