0

Code before rstrip

column_names = lh_Area_Base_V2.columns.tolist()
for i, val in enumerate(column_names[1:]):
    column_names[i+1] += '_Base_V2'
column_names[0] = 'Subj_ID'
# Replace the column names with a new name
lh_Area_Base_V2.columns = column_names
lh_Area_Base_V2.head()

Initial DF

Code with rstrip (to drop "_V2" from the end of first column values):

column_names = lh_Area_Base_V2.columns.tolist()
for i, val in enumerate(column_names[1:]):
    column_names[i+1] += '_Base_V2'
column_names[0] = 'Subj_ID'
lh_Area_Base_V2['Subj_ID'] = lh_Area_Base_V2['Subj_ID'].map(lambda x: x.lstrip().rstrip('_V2'))
# Replace the column names with a new name
lh_Area_Base_V2.columns = column_names
lh_Area_Base_V2.head()

Resulting DF After rstrip

Error: Why does ID index #1 have a value 2 dropped at the end, which was not requested by the rstrip function (the function only requested for "_V2" to be dropped)?

I would love to hear any suggestions for fixes.

jpp
  • 159,742
  • 34
  • 281
  • 339
arkadiy
  • 746
  • 1
  • 10
  • 26

1 Answers1

3

This is expected behavior of rstrip:

The chars argument is a string specifying the set of characters to be removed

It is not just stripping the string _V2, it will strip any of the contained characters, including the 2 at the end of your second row.

Instead, you may use a regular expression to replace a trailing _V2:

df.assign(Subj_ID=df.Subj_ID.str.replace(r'_V2$', ''))

    Subj_ID  lh_bankssts_area_base_V2
0  SILVA001                       861
1  SILVA002                      1051
2  SILVA004                      1127
3  SILVA005                      1346
4  SILVA007                      1209
user3483203
  • 50,081
  • 9
  • 65
  • 94