How to replace last numerical value from a string if there is any

Question

I have some files say A.atdf, B.atdf etc. When I parse these files into dataframe, I get a column called "TEST_TXT" where values are like this -

0    100 Continuity_PPMU_mV XSCI
1    100 Continuity_PPMU_mV XSCI
2    100 Continuity_PPMU_mV XSCI
3    100 Continuity_PPMU_mV XSCI
4    101 Continuity_PPMU_mV XSCO
5    101 Continuity_PPMU_mV XSCO
.....  .....  ...... ...........

But for some ".atdf" files, I have also TEST_TXT like this -

0    100 Continuity_PPMU_mV XSCI 140
1    100 Continuity_PPMU_mV XSCI 12
2    100 Continuity_PPMU_mV XSCI 76
3    100 Continuity_PPMU_mV XSCI 204
4    101 Continuity_PPMU_mV XSCO 139

i.e random numerical values appended to the name.

I want to delete the appended numbers "140", "12", "76" etc from the names and make it plain "100 Continuity_PPMU_mV XSCI", "101 Continuity_PPMU_mV XSCO" etc for the files it has numerical appended.

I have written a python code which does this but it loops through the series and it is not quite elegant.

How can I remove the appended numerical values from the series wherever it occurs(it does not occur in all the parsed files, only a few)?

score 3 · Accepted Answer · edited Dec 04 '20 at 06:01

3

Use str.replace:

df['TEST_TXT'] = df['TEST_TXT'].str.replace('\s+\d+$', '')

Here is a regex demo which shows that the replacement logic working.

edited Dec 04 '20 at 06:01

Nick

138,499
22
57
95

answered Dec 04 '20 at 05:59

Tim Biegeleisen

502,043
27
286
360

This is really great. I applied to my code and works like charm. I understand df["TEST_TXT"].str.replace() part. Can you please explain a bit more about the regex and how did you come up with it, such that it only deleted the later appended numerical value? – Kartik Mehra Dec 04 '20 at 06:07
The regex `\s+\d+$` simple says to match one or more whitespace characters followed by a digit, which immediately is followed by the end of the column. – Tim Biegeleisen Dec 04 '20 at 06:08
Very elegant. Thanks for the redirecting me to regex demo website. Great resource. – Kartik Mehra Dec 04 '20 at 06:09

score 1 · Answer 2 · answered Dec 04 '20 at 06:01

1

Can use either str.replace the digits at the end of string or can split with the same and select right index.

#df['TEST_TXT']=df['TEST_TXT'].str.replace('\d+$','')
df['TEST_TXT']=df['TEST_TXT'].str.split('\s\d+$').str[0]

answered Dec 04 '20 at 06:01

wwnde

26,119
6
18
32

1

Hi. I think the logic is pretty same to the Tim's answer and I believe it would work just fine too. Thanks a lot. Upvoted. – Kartik Mehra Dec 04 '20 at 06:10

How to replace last numerical value from a string if there is any

2 Answers2