Replace consecutive repeated characters with one - Column-wise operation - `pandas.DataFrame`

Question

How do I remove repeated characters in string and just leave one of them.

e.g:-

"Bertuggggg Mete"

to

"Bertug Mete"

I've just read data like this:

dataFrame = pd.read_excel("C:\\Users\\Bertug\\Desktop\\example.xlsx")

Name 0 Bertuggggg Mete

Input is read from .xlsx file. I have tried split and strip functions but they don't work seem to work as expected.

How I can solve this problem ?

Have look here: http://stackoverflow.com/questions/18799036/python-best-way-to-remove-duplicate-character-from-string — Gurupad Hegde, Mar 30 '17 at 06:39
Check this post and see if it helps: http://stackoverflow.com/questions/9841303/removing-duplicate-characters-from-a-string — hasanzuav, Mar 30 '17 at 06:40
I've looked that but it contains just two characters. My question is for more than two — Bertug, Mar 30 '17 at 06:40
Possible duplicate of [Python: Best Way to remove duplicate character from string](http://stackoverflow.com/questions/18799036/python-best-way-to-remove-duplicate-character-from-string) — Gurupad Hegde, Mar 30 '17 at 06:41
@Bertug, you can use idea from stackoverflow.com/questions/18799036/ . Also, from stackoverflow.com/questions/9841303 : If you look at the regex in the solution, you will get the answer. Hint: You need to use `\1` instead of `\1\1` — Gurupad Hegde, Mar 30 '17 at 06:45

Devi Prasad Khatua · Answer 1 · 2017-03-30T07:00:28.240

4

Check this out:

Replace column_name with whatever is the column name you want to apply the replacement.

min_threshold_rep = 2
column_name = 'Name'
dataframe[column_name]= dataframe[column_name].str.replace(r'(\w)\1{%d,}'%(min_threshold_rep-1), r'\1')

NOTE: this would replace every min_threshold_rep number of consecutive character with one character.

edited Mar 30 '17 at 07:00

answered Mar 30 '17 at 06:50

Devi Prasad Khatua

1,185
3
11
23

It is working thank you so much. Can you explain in here (r'(\w)\1*', r'\1'). How did you solve this :) – Bertug Mar 30 '17 at 06:59
`\1` represents the first found group in the string here-`(\w)` which replaces multiple consecutive instances into one. – Devi Prasad Khatua Mar 30 '17 at 07:04
Just go to the official docs : https://docs.python.org/2/library/re.html#regular-expression-syntax – Devi Prasad Khatua Mar 30 '17 at 07:05

chile · Answer 2 · 2017-03-30T06:53:02.997

0

python code :

if __name__ == '__main__':
    s = 'Bertuggggg Mete'
    if len(s) == 0:
        print('wrong!')
        exit()
    r = s[0]
    for c in s:
        if r[len(r) - 1] != c:
            r += c
    print(r)

java code :

public class Test {

public static void main(String[] args) {
    String s = "Bertuggggg Mete";
    StringBuffer sb = new StringBuffer();
    for (int i = 0, j = s.length(); i < j; i++) {
        if (i == 0) {
            sb.append(s.charAt(0));
        }
        if (s.charAt(i) != sb.charAt(sb.length() - 1)) {
            sb.append(s.charAt(i));
        }
    }
    System.out.println(sb);
}

}

edited Mar 30 '17 at 06:53

answered Mar 30 '17 at 06:43

chile

141
1
12

you just gave java solution to a Python problem :P – Gurupad Hegde Mar 30 '17 at 06:44
So, I think. Now you can move this code somewhere in gist for your future reference and delete from here :P – Gurupad Hegde Mar 30 '17 at 06:49
with python 3.5 ? – chile Mar 30 '17 at 06:57

Replace consecutive repeated characters with one - Column-wise operation - `pandas.DataFrame`

2 Answers2