9

using Pandas to remove all but last period in a string like so:

s = pd.Series(['1.234.5','123.5','2.345.6','678.9'])
counts = s.str.count('\.')
target = counts==2
target
0     True
1    False
2     True
3    False
dtype: bool

s = s[target].str.replace('\.','',1)
s
0    1234.5
2    2345.6
dtype: object

my desired output, however, is:

0    1234.5
1    123.5
2    2345.6
3    678.9
dtype: object

The replace command along with the mask target seem to be dropping the unreplaced values and I can't see how to remedy this.

cs95
  • 379,657
  • 97
  • 704
  • 746
seanysull
  • 720
  • 1
  • 8
  • 24

2 Answers2

9

Regex-based with str.replace

This regex pattern with str.replace should do nicely.

s.str.replace(r'\.(?=.*?\.)', '')

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

The idea is that, as long as there are more characters to replace, keep replacing. Here's a breakdown of the regular expression used.

\.     # '.'
(?=    # positive lookahead
.*?    # match anything
\.     # look for '.'
)

Fun with np.vectorize

If you want to do this using count, it isn't impossible, but it is a challenge. You can make this easier with np.vectorize. First, define a function,

def foo(r, c):
    return r.replace('.', '', c)

Vectorize it,

v = np.vectorize(foo)

Now, call the function v, passing s and the counts to replace.

pd.Series(v(s, s.str.count(r'\.') - 1))

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Keep in mind that this is basically a glorified loop.


Loopy/List Comprehension

The python equivalent of vectorize would be,

r = []
for x, y in zip(s, s.str.count(r'\.') - 1):
    r.append(x.replace('.', '', y))

pd.Series(r)

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Or, using a list comprehension:

pd.Series([x.replace('.', '', y) for x, y in zip(s, s.str.count(r'\.') - 1)])

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746
  • So this replaces any periods as long as another is matched ahead of it? I can accept in 3 minnutes haha, your too quick. – seanysull Dec 14 '17 at 12:24
  • @seanysull Hmm, with `replace` and regex? Hmm, whatever the case, you need to know whether there is a character in front, so the lookahead cannot be avoided. – cs95 Dec 14 '17 at 12:25
  • I mean without regex just using str.count and a mask etc – seanysull Dec 14 '17 at 12:26
  • 1
    @seanysull I can be done but it will feel like going from London to NY through Moskow. You could count the dots in every entry (say `n`), and replace the first `n-1` instances of it with `''`. – Ma0 Dec 14 '17 at 12:28
  • @Ev.Kounis Yep, that would basically be falling back to pure python at that point. – cs95 Dec 14 '17 at 12:35
  • 1
    @cᴏʟᴅsᴘᴇᴇᴅ Great job as always! – Ma0 Dec 14 '17 at 12:38
  • Thanks guys that was an elnightening post and discussion – seanysull Dec 14 '17 at 12:56
0

You want to replace the masked items and keep the rest untouched. Thats exactly what Series.where does, except it replaces the unmasked values so you need to negate the mask.

s.where(~target, s.str.replace('\.','',1))

Or you can make the changes in-place by assigning the masked values, this is probably cheaper but destructive.

s[target] = s[target].str.replace('\.','',1)
Stop harming Monica
  • 12,141
  • 1
  • 36
  • 56