0

I have a dataframe like this:

df = 

number
+123 1234
+123 0123
+123+123 01234
+123 0123 0023

I want to remove only 0 only after first space as a new column. Desired output:

number            filtered
+123 1234         +123 1234
+123 0123         +123 123
+123+123 01234    +123+123 1234
+123 0123 0023    +123 123 0023

My try is:

df['filtered'] = df['number'].replace(r'\s(.)', '', regex=True)

But I realized that it is removing first character after space not only zero


I am ok even with different approach, not regex only

Mamed
  • 1,102
  • 8
  • 23

2 Answers2

2

You can use this regex to match from beginning of line up to a zero 0 after the first space, and then replace the match with capture group 1:

^(\S*\s)0

Regex demo on regex101

In python:

df['filtered'] = df['number'].replace(r'^(\S*\s)0', r'\1', regex=True)

Output:

    +123 1234
     +123 123
+123+123 1234
+123 123 0023
Nick
  • 138,499
  • 22
  • 57
  • 95
  • Thanks! I found that regex is powerful thing. Where I can read about documentation or combinations of this elements to get desired result? – Mamed Oct 14 '22 at 06:16
  • @Mamed there's a good reference q&a [here](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) and a good tutorial [here](https://www.regexbuddy.com/tutorial.html) and I highly recommend playing with regexes on sandbox sites such as [regex101](https://regex101.com) – Nick Oct 14 '22 at 06:20
1

Try this

\s0

Python

df['filtered'] = df['number'].replace(r'\s0', ' ', regex=True)
  • Notice that your regex filters out one extra `0` in the bottom row: `initial: +123 0123 0023`, `expected: +123 123 0023`, `yours: +123 123 023`. This happened because you coded to remove any `0` preceded by whitespace in a row. – Nikita Shabankin Oct 18 '22 at 01:30