Removing spaces from a column in pandas

Question

This is very closely related to Removing space from columns in pandas so I wasn't sure whether to add it to a comment to that... the difference in my question is specifically relating to the use of a loc locator to slice out a subset...

df['py'] = df['py'].str.replace(' ','')

-- this works fine; but when I only want to apply it on the subset of rows where the column subset is 'foo':

df.loc[df['column'] == 'foo']['py'] = df.loc[df['column'] == 'foo']['py'].str.replace(' ','')

...doesn't work.

What am I doing wrong? I can always slice out the group and re-append it, but curious where I'm going wrong here.

A dataset for trials:

df = pd.DataFrame({'column':['foo','foo','bar','bar'], 'py':['a b','a b','a b','a b']})

Thanks

You should be getting a huge red warning explaining that the issue is chained **assignment** `][`. You need to assign properly with `df.loc[df['column'] == 'foo', 'py'] = ` (Since on the RHS you are just _selecting_ the chaining is _okay_ and doesn't cause problems, but still for best practices just select within the one loc call there too) — ALollz, Oct 06 '21 at 14:41

score 2 · Accepted Answer · edited Oct 06 '21 at 14:46

2

You want:

df.loc[df['column'] == 'foo', 'py'].apply(lambda x: x.replace(' ',''))

Note the notation of loc.

edited Oct 06 '21 at 14:46

ALollz

57,915
7
66
89

answered Oct 06 '21 at 14:42

vtasca

1,660
11
17

I don't like `apply()` for performance reasons. – Freek Wiekmeijer Oct 06 '21 at 14:46
2

@FreekWiekmeijer the `.str` accessor operations themselves are essentially loops so there's little difference between an apply and the Series.str operations (in contrast to most of the vectorized math operations where `.apply` is to be avoided at all costs). For reference: https://stackoverflow.com/questions/54028199/are-for-loops-in-pandas-really-bad-when-should-i-care – ALollz Oct 06 '21 at 14:48

score 0 · Answer 2 · answered Oct 06 '21 at 14:43

Pandas StringAccessor also supports regex

>>> pd.DataFrame({"column_1": ["hello ", " world", "space in the middle", "two  spaces", "one\ttab"]}).column_1.str.replace(r"\s+", "")

0               hello
1               world
2    spaceinthemiddle
3           twospaces
4              onetab

Combine that with numpy.where() and I think you have what you need.

np.where(
   <condition>,  # defines the loc which rows to edit
   df[column_name].str.replace(r"\s+", ""),  # the substitution to make in that loc
   df[column_name]  # the default value used on other rows
)

Removing spaces from a column in pandas

2 Answers2