Since pd.to_numeric
is primarily used to convert strings to numeric values, I'm going to work under the assumption that you want to convert strings of literal boolean values.
Consider the dataframe df
df = pd.DataFrame([
['1', None, 'True'],
['False', 2, True]
])
print(df)
0 1 2
0 1 NaN True
1 False 2.0 True
My Choice
This is what I'd propose. Further below, I break it down in an attempt to explain what is going on.
def try_eval2(x):
if type(x) is str:
try:
x = literal_eval(x)
except:
x = np.nan
if type(x) is not bool:
x = np.nan
return x
vals = df.values
v = vals.ravel()
a = np.array([try_eval2(x) for x in v.tolist()], dtype=object)
pd.DataFrame(a.reshape(vals.shape), df.index, df.columns)
0 1 2
0 NaN NaN True
1 False NaN True
Timing
You'll notice that my proposed solution is pretty fast
%%timeit
vals = df.values
v = vals.ravel()
a = np.array([try_eval2(x) for x in v.tolist()], dtype=object)
pd.DataFrame(a.reshape(vals.shape), df.index, df.columns)
10000 loops, best of 3: 149 µs per loop
%timeit df.astype(str).applymap(to_boolean)
1000 loops, best of 3: 1.28 ms per loop
%timeit df.astype(str).stack().map({'True':True, 'False':False}).unstack()
1000 loops, best of 3: 1.27 ms per loop
Explanation
Step 1
Now I'll create a simple function using ast.literal_eval
to convert strings to values
from ast import literal_eval
def try_eval(x):
try:
x = literal_eval(x)
except:
pass
return x
Step 2
applymap
with my new function. It's going to look the same!
d1 = df.applymap(try_eval)
print(d1)
0 1 2
0 1 NaN True
1 False 2.0 True
Step 3
Use where
and applymap
again to find where values are actually bool
d2 = d1.where(d1.applymap(type).eq(bool))
print(d2)
0 1 2
0 NaN NaN True
1 False NaN True
Step 4
You can drop columns with all NaN
print(d2.dropna(1, 'all'))
0 2
0 NaN True
1 False True