2

The idea is to remove full stop, commas, quotation if it is available at the beginning and last string in Pandas.

Given a df as below

data = {'Name': ['"Tom hola.', '"nick"', 'krish here .','oh my *']}

The expected output is

Tom hola
nick
krish here
oh my

I tried the following code, but it did not work as intended

import pandas as pd
df = pd.DataFrame(data)
df['Name'] = df['Name'].str[-1:].replace({"\. ": "Na"},regex=True)

May I know how this objective can be achieved?

Also, can the approach extended for it to be applied across different columns?

mpx
  • 3,081
  • 2
  • 26
  • 56

2 Answers2

2

You can use pd.Series.str.replace if you want replace only colum else use df.replace.

# Using `pd.Series.str.replace`
df['Name'] = df['Name'].str.replace(r'\.$','')
df          Name
0     Tom hola
1   secondx //
2         nick
3  krish here

# Using `df.replace`
df.replace(r'\.$', '', regex=True)
          Name
0     Tom hola
1   secondx //
2         nick
3  krish here

EDIT:

You can use pd.Series.str.strip to strip ", . and *

df['Name'].str.strip(r'\"\.\*')

0       Tom hola
1           nick
2    krish here
3         oh my
Name: Name, dtype: object

# OR
df.Name.str.replace(r'^\W+|(.*?)\W+$',r'\1') # Replaces only values in `Name`
# df.replace(r'^\W+|(.*?)\W+$',r'\1',regex=True) Replaces for whole df
  • More about regex pattern used in second case here
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
  • Thanks @Ch3steR. I would like to extend this approach so that it can remove the symbol `"` as well. But replacing `r'\.$' ` into `r'\"$'` do nothing.. Also, if `$` indicate end of a string, is there any special symbol to indicate the first string? – mpx Jul 18 '20 at 08:21
  • Yes, use `^` for start of the string. @balandongiv – Ch3steR Jul 18 '20 at 08:22
  • Thanks for the prompt reply @Ch3steR. But,still unable to replace the opening quote which exist at the end of the string while using `df.replace(r'\"$', '', regex=True)`. This also apply for removing the same special char at the front using df.replace(r'\"^', '', regex=True) – mpx Jul 18 '20 at 08:39
  • @balandongiv Can you post an example to the question exactly what you want to do? Make an example such that it covers all test cases like ending with `.`, starting with `"`. – Ch3steR Jul 18 '20 at 08:45
  • I have edited slightly the expected output above. Thanks for your time – mpx Jul 18 '20 at 08:47
  • 1
    @balandongiv Edited the answer. See if it helps. ;) Feel free to ask if you have any queries. – Ch3steR Jul 18 '20 at 09:20
  • 1
    @balandongiv Added one more alternative. Not a Regex expert there might exist a much regex than what I posted but this should get you started. – Ch3steR Jul 18 '20 at 09:32
1

use (\W)*$ if you want to match all specials characters at the end of the string

df = pd.DataFrame({'Name': ['Tom hola.', 'secondx //', 'nick', 'krish here .']})
df['Name'] = df['Name'].replace({r'(\W)*$': ""}, regex=True)

Output :

         Name
0     Tom hola
1    secondx 
2        nick
3  krish here

You can use https://regex101.com to test and better understand what your regex is doing

AlexisG
  • 2,476
  • 3
  • 11
  • 25
  • Thanks for the response, just to make this approach go further, I replace the `{r'(\W)*$': ""}` into `{r'(\W)*^': ""}` to remove special char at the beginning,but, it does not work as intended. – mpx Jul 18 '20 at 08:41
  • 1
    To remove at the beginning you can use `r'^(\W)*'`. And if you want to do both at the same time: `^(\W)*|(\W)*$` – AlexisG Jul 18 '20 at 10:41