How to remove constant prefix and suffix character

Question

I have a data frame where numeric data is stored in String with some Prefix character which I need to remove. On top of this it has double quotes inside the quotes i.e. ' "" '.

dict_1 = {"Col1" : [1001, 1002, 1003, 1004, 1005], 
          "Col2" : ['"Rs. 5131"', '"Rs. 0"', '"Rs 351157"', '"Rs 535391"', '"Rs. 6513"']}
a = pd.DataFrame(dict_1)
a.head(6)

|    | Col1     |  Col2       |       
|----|----------|-------------|
| 0  |1001      |"Rs. 5131"   |
| 1  |1002      |"Rs. 0"      | 
| 2  |1003      |"Rs 351157"  |    
| 3  |1004      |"Rs 535391"  |    
| 4  |1005      |"Rs. 6513"   |

As you can see I want to remove Quotes defined inside Col2 and along with this I have to remove Rs.

I tried following code to subset

b = a['Col2'][0]
b = b[5:]
b = b[:-1]
b

But the issue in some observation it is defined as Rs. and in some Rs without period.

The result should be a column of integers.

All of the existing answers are to focused on prefix / suffix. The easiest solution is to extract the digits, and convert to int: `a['Col2'] = a['Col2'].str.extract('(\d+)').astype(int)` — Trenton McKinney, Apr 21 '22 at 17:17

ROOP AMBER · Answer 1 · 2022-04-21T18:23:02.820

2

You can simply use removeprefix and removesuffix methods for string after you get the value of the particular columns For a complete answer as comments are demanding

col3=[]
lis = dic['col2']
for b in lis:
    b=b.removeprefix('"').removesuffix('"').removeprefix("Rs.").removeprefix("Rs ")
    col3.append(int(b))
dic['col2']=col3

By this even if there will be Rs. with a period or without period both will be removed without any error. Edit: Change suggested by @Jhanzaib Humayun. I found an easier answer out there on this link for whole of the series alltogether extract number from string

edited Apr 21 '22 at 18:23

answered Apr 21 '22 at 13:03

ROOP AMBER

330
1
8

This is not correctly implemented. Check your code. (1) These are `.str` methods, (2) `[0]` should not be on `b = a['Col2'][0]`, (3) the strings are not correct, (4) `'Rs '` and `'Rs. '` are both prefixes, and (5) this should be assigned back to the same column in the dataframe or a new dataframe column, not a separate Series. – Trenton McKinney Apr 21 '22 at 17:04
1

The current implementation results in `AttributeError: 'Series' object has no attribute 'removeprefix'` – Trenton McKinney Apr 21 '22 at 17:05
@TrentonMcKinney All the code I have written is after the person asking the question gets the particular cell and hence its value would be a string. As 'Rs ' and 'Rs. ' are both prefixes it will still work as removeprefix only removes and there is a prefix and doesn't written an error for clarity i have edited my answer – ROOP AMBER Apr 21 '22 at 17:48
Test your code, it does not work. Also _after the person asking the question gets the particular cell_ is not the correct way. It should be applied to the Series, not the cell. – Trenton McKinney Apr 21 '22 at 17:52
@TrentonMcKinney any of the answers prescribed will not work on a series this is because you will have to implement a for loop to go in every cell of the column 1 by 1 and than remove its prefix – ROOP AMBER Apr 21 '22 at 17:56
Your comment is not correct, and demonstrates a lack of understand of vectorized operations and how to work with pandas. – Trenton McKinney Apr 21 '22 at 17:57
@TrentonMcKinney I am sorry sir, but I still can't get what you mean to say the questionaire just wants us to tell how to remove prefix and not how to go in each and every column – ROOP AMBER Apr 21 '22 at 17:58
`a['Col2'] = a['Col2'].str.removeprefix('"').str.removesuffix('"').str.removeprefix("Rs. ").str.removeprefix("Rs ").astype(int)` is the correct implementation using `removeprefix` and `removesuffix`. Your answer is not correct. – Trenton McKinney Apr 21 '22 at 18:02
@TrentonMcKinney You can convert it to int later by b = int(b), if you want – ROOP AMBER Apr 21 '22 at 18:03

Jonathan · Answer 2 · 2022-04-25T12:08:28.017

1

Or use .str.replace():

a["Col2"] = a["Col2"].str.replace('Rs. ', '').replace('"', '')

Update use replace:

a["Col2"].replace(r"Rs\.?\s+", '', regex=True, inplace=True).astype(int)

edited Apr 25 '22 at 12:08

answered Apr 21 '22 at 13:04

Jonathan

748
3
20

score 1 · Answer 3 · edited Apr 21 '22 at 17:26

1

Given the sample data in the OP, use .replace

a['Col2'] = a['Col2'].replace({'"': ''}, regex=True)
a['Col2'] = a['Col2'].replace({'Rs.': ''}, regex=True)
a['Col2'] = a['Col2'].replace({'Rs': ''}, regex=True)
a['Col2'] = a['Col2'].replace({' ': ''}, regex=True)

edited Apr 21 '22 at 17:26

Trenton McKinney

56,955
33
144
158

answered Apr 21 '22 at 13:06

Dan Green

76
4

How to remove constant prefix and suffix character

3 Answers3