Remove unwanted parts from strings in a range of columns

Question

Part 1

Similar to this question but i have a range of colums named A - J all of which have hyphens in place of NaN on Series with numeric values (see example below). How do I replace all of them rather than go through each column at a time as demonstrated in that answer.

Sample column A
1000
2000
3000
-
1000

Part 2

Is there a way to use regex to remove all '-', ',' , '.' from a dataframe?

Showing a sample of your data will help with understanding what you want — Nobi, Apr 23 '16 at 16:33
try this: `df.replace(r'[\s\-,\.]+', '', regex=True, inplace=True)` or just use [pd.to_numeric()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html#pandas.to_numeric) — MaxU - stand with Ukraine, Apr 23 '16 at 16:42

Thanos · Accepted Answer · 2016-04-24T01:10:07.043

1

Part #1

You can define '-' to be a NaN value when reading in the data to your DataFrame. More specifically by use of na_values in your pd.read_csv() call.

See docs here

Part #2

As earlier suggested by MaxU you can use .replace() like this:

df.replace(r'[\s\-,\.]+', r'', regex=True, inplace=True)

Note that this will not have any effect on non-strings.

Hope this helps!

edited Apr 24 '16 at 01:10

answered Apr 23 '16 at 21:18

Thanos

2,472
1
16
33

If that helped you can accept the answer or vote it up or both. Good luck on your project. – Thanos Apr 23 '16 at 22:13

score 0 · Answer 2 · answered Apr 23 '16 at 16:52

You can iterate over the columns.

Selected columns:

>>> df = pd.DataFrame({'a': ['a-b', 'c-d'], 'b': ['x-y', 'z-z'], 'c': ['x-y', 'z-z']})
>>> df
     a    b    c
0  a-b  x-y  x-y
1  c-d  z-z  z-z
>>> for col_name in 'ab':
        df[col_name] = df[col_name].str.replace('-', '0')
>>> df
     a    b    c
0  a0b  x0y  x-y
1  c0d  z0z  z-z

All columns:

>>> df = pd.DataFrame({'a': ['a-b', 'c-d'], 'b': ['x-y', 'z-z'], 'c': ['x-y', 'z-z']})

>>> for col_name in df.columns:
        df[col_name] = df[col_name].str.replace('-', '0')
>>> df
     a    b    c
0  a0b  x0y  x0y
1  c0d  z0z  z0z

Remove unwanted parts from strings in a range of columns

2 Answers2