1

I'm trying to create a function that removes the ' #1' from a column within a dataframe:

def formatSignalColumn(df):
    for i,signal in enumerate(df['Signal list']):
        df = df.set_value(i, 'Signal list', signal.replace(" #1", ""))
        df = df.set_value(i, 'Signal list', signal.replace(" #2", ""))
    return df

However, when I pass my DataFrame through this, it does not change anything.

tlog = formatSignalColumn(tlog)

Interestingly, when I run the for loop outside the function, it doesn't work either, but when I specifically choose the i and signal values it works...

i = 0
signal = tlog['Signal list'][i]
tlog= tlog.set_value(i, 'Signal list', signal.replace(" #1", ""))
tlog= tlog.set_value(i, 'Signal list', signal.replace(" #2", ""))

This doesn't make any sense to me. Anyone have any ideas?

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
G Lockwood
  • 35
  • 4
  • You can just do `df['Signal list']=df['Signal list'].str.replace(' #1| #2','')` – EdChum Dec 21 '16 at 16:35
  • The problem here is that you're modifying your data as you're iterating, so it looks like you're working on a copy. – EdChum Dec 21 '16 at 16:36

1 Answers1

2

You can just use vectorised str.replace and pass a regex pattern to do this in a single line:

In [231]:    
df = pd.DataFrame({'something':[' #1blah', ' #2blah', '#3blah']})
df

Out[231]:
  something
0    #1blah
1    #2blah
2    #3blah

In [232]:
df['something'] = df['something'].str.replace(' #1| #2','')
df

Out[232]:
  something
0      blah
1      blah
2    #3blah

What you discovered was that you were operating on a copy of the passed in df, additionally modifying the data object as you're iterating is not a good idea.

On top of this, one should always seek a vectorised method and avoid loops as loops are rarely the only method available

EdChum
  • 376,765
  • 198
  • 813
  • 562