0

Part 1

Similar to this question but i have a range of colums named A - J all of which have hyphens in place of NaN on Series with numeric values (see example below). How do I replace all of them rather than go through each column at a time as demonstrated in that answer.

Sample column A
1000
2000
3000
-
1000

Part 2

Is there a way to use regex to remove all '-', ',' , '.' from a dataframe?

Community
  • 1
  • 1
mapping dom
  • 1,737
  • 4
  • 27
  • 50
  • 1
    Showing a sample of your data will help with understanding what you want – Nobi Apr 23 '16 at 16:33
  • try this: `df.replace(r'[\s\-,\.]+', '', regex=True, inplace=True)` or just use [pd.to_numeric()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html#pandas.to_numeric) – MaxU - stand with Ukraine Apr 23 '16 at 16:42

2 Answers2

1

Part #1

You can define '-' to be a NaN value when reading in the data to your DataFrame. More specifically by use of na_values in your pd.read_csv() call.

See docs here

Part #2

As earlier suggested by MaxU you can use .replace() like this:

df.replace(r'[\s\-,\.]+', r'', regex=True, inplace=True)

Note that this will not have any effect on non-strings.

Hope this helps!

Thanos
  • 2,472
  • 1
  • 16
  • 33
0

You can iterate over the columns.

Selected columns:

>>> df = pd.DataFrame({'a': ['a-b', 'c-d'], 'b': ['x-y', 'z-z'], 'c': ['x-y', 'z-z']})
>>> df
     a    b    c
0  a-b  x-y  x-y
1  c-d  z-z  z-z
>>> for col_name in 'ab':
        df[col_name] = df[col_name].str.replace('-', '0')
>>> df
     a    b    c
0  a0b  x0y  x-y
1  c0d  z0z  z-z

All columns:

>>> df = pd.DataFrame({'a': ['a-b', 'c-d'], 'b': ['x-y', 'z-z'], 'c': ['x-y', 'z-z']})

>>> for col_name in df.columns:
        df[col_name] = df[col_name].str.replace('-', '0')
>>> df
     a    b    c
0  a0b  x0y  x0y
1  c0d  z0z  z0z
Mike Müller
  • 82,630
  • 20
  • 166
  • 161