in Pandas, how to do string cleaning on all columns in a data frame

Question

I have a data frame with many columns, some are objects including texts. I want to do some cleaning on all of the text columns like lower(), strip() etc. How can I get it by a loop over all text columns?

I have written this which works as I expect:

for column in t1.loc[:, t1.dtypes == np.object].columns:
    t1.loc[:,column] = t1[column].str.lower().str.strip()

I was just wondering if there is a better way to write this. I am trying to improve my skills in pandas.

A better way is to use [`pandas.DataFrame.select_dtypes`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html). Usage given in the docs. Something like `df.select_dtypes(include=['object']).columns` — jpp, Mar 27 '18 at 14:35
If you want to avoid the loop, you can use `apply` which makes it syntactially simpler — Quickbeam2k1, Mar 27 '18 at 14:36
@jpp wouldn't that fail when trying to to apply `string` functions if the `object`-column if the column in fact did not contain strings? — Karl Anka, Mar 27 '18 at 14:45
@KarlAnka. Potentially, but OP should know this about their data. As such, looking for object type seems to work, this is just a potentially better way. — jpp, Mar 27 '18 at 14:46
@novice_007, I've marked as duplicate as I struggle to see a materially different answer from the one 4 years ago. And if there is one, it should be added to that post. — jpp, Mar 27 '18 at 14:52
@jpp I'm not sure I agree with the duplicate mark here as OP want to apply string functions. Applying `str` to an `object`-column would replace all non-string values with `NaN`. However, I would say that this question is more of a duplicate to: https://stackoverflow.com/questions/43191832/checking-if-a-data-series-is-strings — Karl Anka, Mar 27 '18 at 15:14
@KarlAnka, but OP says `t1.loc[:, t1.dtypes == np.object].columns` *works* with his data. However, I'll add this dup as well to the list. — jpp, Mar 27 '18 at 15:17

score 0 · Answer 1 · answered Mar 27 '18 at 14:44

0

You can use applymap to apply a function to all columns:

# select all columns
col = df.select_dtypes(include=['object']).columns

# apply function on those columns
df1[col] = df1[col].applymap(lambda x: x.lower().strip())

answered Mar 27 '18 at 14:44

YOLO

20,181
5
20
40

in Pandas, how to do string cleaning on all columns in a data frame

1 Answers1