-2

I have some files say A.atdf, B.atdf etc. When I parse these files into dataframe, I get a column called "TEST_TXT" where values are like this -

0    100 Continuity_PPMU_mV XSCI
1    100 Continuity_PPMU_mV XSCI
2    100 Continuity_PPMU_mV XSCI
3    100 Continuity_PPMU_mV XSCI
4    101 Continuity_PPMU_mV XSCO
5    101 Continuity_PPMU_mV XSCO
.....  .....  ...... ...........

But for some ".atdf" files, I have also TEST_TXT like this -

0    100 Continuity_PPMU_mV XSCI 140
1    100 Continuity_PPMU_mV XSCI 12
2    100 Continuity_PPMU_mV XSCI 76
3    100 Continuity_PPMU_mV XSCI 204
4    101 Continuity_PPMU_mV XSCO 139

i.e random numerical values appended to the name.

I want to delete the appended numbers "140", "12", "76" etc from the names and make it plain "100 Continuity_PPMU_mV XSCI", "101 Continuity_PPMU_mV XSCO" etc for the files it has numerical appended.

I have written a python code which does this but it loops through the series and it is not quite elegant.

How can I remove the appended numerical values from the series wherever it occurs(it does not occur in all the parsed files, only a few)?

Kartik Mehra
  • 117
  • 1
  • 9

2 Answers2

3

Use str.replace:

df['TEST_TXT'] = df['TEST_TXT'].str.replace('\s+\d+$', '')

Here is a regex demo which shows that the replacement logic working.

Nick
  • 138,499
  • 22
  • 57
  • 95
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • This is really great. I applied to my code and works like charm. I understand df["TEST_TXT"].str.replace() part. Can you please explain a bit more about the regex and how did you come up with it, such that it only deleted the later appended numerical value? – Kartik Mehra Dec 04 '20 at 06:07
  • The regex `\s+\d+$` simple says to match one or more whitespace characters followed by a digit, which immediately is followed by the end of the column. – Tim Biegeleisen Dec 04 '20 at 06:08
  • Very elegant. Thanks for the redirecting me to regex demo website. Great resource. – Kartik Mehra Dec 04 '20 at 06:09
1

Can use either str.replace the digits at the end of string or can split with the same and select right index.

#df['TEST_TXT']=df['TEST_TXT'].str.replace('\d+$','')
df['TEST_TXT']=df['TEST_TXT'].str.split('\s\d+$').str[0]
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • 1
    Hi. I think the logic is pretty same to the Tim's answer and I believe it would work just fine too. Thanks a lot. Upvoted. – Kartik Mehra Dec 04 '20 at 06:10