0

I have a data frame with many columns, some are objects including texts. I want to do some cleaning on all of the text columns like lower(), strip() etc. How can I get it by a loop over all text columns?

I have written this which works as I expect:

for column in t1.loc[:, t1.dtypes == np.object].columns:
    t1.loc[:,column] = t1[column].str.lower().str.strip()

I was just wondering if there is a better way to write this. I am trying to improve my skills in pandas.

  • 2
    A better way is to use [`pandas.DataFrame.select_dtypes`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html). Usage given in the docs. Something like `df.select_dtypes(include=['object']).columns` – jpp Mar 27 '18 at 14:35
  • 1
    If you want to avoid the loop, you can use `apply` which makes it syntactially simpler – Quickbeam2k1 Mar 27 '18 at 14:36
  • @jpp wouldn't that fail when trying to to apply `string` functions if the `object`-column if the column in fact did not contain strings? – Karl Anka Mar 27 '18 at 14:45
  • @KarlAnka. Potentially, but OP should know this about their data. As such, looking for object type seems to work, this is just a potentially better way. – jpp Mar 27 '18 at 14:46
  • @novice_007, I've marked as duplicate as I struggle to see a materially different answer from the one 4 years ago. And if there is one, it should be added to that post. – jpp Mar 27 '18 at 14:52
  • @jpp I'm not sure I agree with the duplicate mark here as OP want to apply string functions. Applying `str` to an `object`-column would replace all non-string values with `NaN`. However, I would say that this question is more of a duplicate to: https://stackoverflow.com/questions/43191832/checking-if-a-data-series-is-strings – Karl Anka Mar 27 '18 at 15:14
  • @KarlAnka, but OP says `t1.loc[:, t1.dtypes == np.object].columns` *works* with his data. However, I'll add this dup as well to the list. – jpp Mar 27 '18 at 15:17

1 Answers1

0

You can use applymap to apply a function to all columns:

# select all columns
col = df.select_dtypes(include=['object']).columns

# apply function on those columns
df1[col] = df1[col].applymap(lambda x: x.lower().strip())
YOLO
  • 20,181
  • 5
  • 20
  • 40