-1

I have a column which may contain values like abc,def or abc,def,efg, or ab,12,34, etc. As you can see, some values end with a , and some don't. What I want to do is remove all such values that end with a comma ,.

Assuming the data is loaded and a data frame is created. So this is what I do

df[c] = df[c].astype('unicode').str.replace("/,*$/", '').str.strip()

But it doesn't do anything.

What am I doing wrong?

Souvik Ray
  • 2,899
  • 5
  • 38
  • 70

2 Answers2

2

The way you were trying to do it, would be something like this:

df[c] = df[c].str.rstrip(',')

rstrip(',') will remove comma just from the end of the string.

strip(',') will remove it from start and end positions both.

The above will replace the text. It will not let you drop the rows from the dataframe. So you should do below:

Use str.endswith:

df[~df['col'].str.endswith(',')]

Consider below df:

In [1547]: df
Out[1547]: 
         date id  value  rolling_mean   col
0  2016-08-28  A      1           nan    a,
1  2016-08-28  B      1           nan    b
2  2016-08-29  C      2           nan    c,
3  2016-09-02  B      0          0.50    d
4  2016-09-03  A      3          2.00    ee,ff
5  2016-09-06  C      1          1.50    gg,
6  2017-01-15  B      2          1.00    i,
7  2017-01-18  C      3          2.00    j
8  2017-01-18  A      2          2.50    k,

In [1548]: df = df[~df['col'].str.endswith(',')]    
In [1549]: df                               
Out[1549]: 
         date id  value  rolling_mean    col
1  2016-08-28  B      1           nan      b
3  2016-09-02  B      0          0.50      d
4  2016-09-03  A      3          2.00  ee,ff
7  2017-01-18  C      3          2.00      j
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
1

Your regex is wrong as it contains regex delimiter characters. Python regex uses plain strings, not regex literals.

Use

df[c] = df[c].astype('unicode').str.replace(",+$", '').str.strip()

The ,+$ will match one or more commas at the end of string.

See proof.

Also, see Regular expression works on regex101.com, but not on prod

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37