1

I have dataframe with states:

Alabama[edit]            8
Alaska[edit]             1
Arizona[edit]            3
Arkansas[edit]  

Want remove the [edit] from the end of the string.

I tried: unit['State'] = unit['State'].str.rstrip('[edit]') But this code ends up remove the letters edit from the end of state names, e.g Delaware-> Delawar.

How can I remove the exactly [edit]?

user2629628
  • 161
  • 1
  • 3
  • 11

4 Answers4

2

Try this out:

unit['State'] = unit['State'].apply(lambda state : state[:state.index('[edit]')])
Balaji Ambresh
  • 4,977
  • 2
  • 5
  • 17
  • Great, works! Could you just briefly explain this part?: state[:state.index('[edit]')] Many thanks. – user2629628 May 15 '20 at 19:36
  • Sure. we're taking characters of the state from index 0 till just before the start of the string "[edit]". Considering an example like `Arkansas[edit]`, the last character that gets picked up is the last `s` before the `[` . Clear? – Balaji Ambresh May 15 '20 at 19:42
  • Check this post on [substring](https://stackoverflow.com/questions/663171/how-do-i-get-a-substring-of-a-string-in-python) if you're lost on indexing. – Balaji Ambresh May 15 '20 at 19:48
0

You can do it like this:

unit.loc[:, 'State'] = [value.split('[edit]')[0].strip() for value in unit.loc[:, 'State']]

replace also works as the others have mentioned:

unit.loc[:, 'State'] = [value.replace('[edit]', '') for value in unit.loc[:, 'State']]

assuming your dataframe is a valid pandas dataframe called "unit" and the desired column label is "State"

For the record, these two methods both outperform the accepted answer:

start_time = timeit.default_timer()
unit["State"] = unit["State"].apply(lambda state: state[: state.index("[edit]")])
print("ACCEPTED ANSWER -> The time difference is :", timeit.default_timer() - start_time)

start_time = timeit.default_timer()
unit.loc[:, 'State'] = [value.split('[edit]')[0] for value in unit.loc[:, 'State']]
print("SPLIT -> The time difference is :", timeit.default_timer() - start_time)

start_time = timeit.default_timer()
unit.loc[:, 'State'] = [value.replace('[edit]', '') for value in unit.loc[:, 'State']]
print("REPLACE -> The time difference is :", timeit.default_timer() - start_time)

ACCEPTED ANSWER -> The time difference is : 0.0015293000000000667
SPLIT -> The time difference is : 0.0010911999999998478
REPLACE -> The time difference is : 0.0007515000000002381
Rexovas
  • 469
  • 2
  • 9
0

Does this work

unit['State'] = unit['State'].str[:-6]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
-1

One alternative could be to replace the string '[edit]' with ''

"Alabama[edit]".replace('[edit]', '')

or use slicing

"Alabama[edit]"[:-6]

both of these options also produce

"Alabama"

To map it to your request above:

unit['State'] = unit['State'].str.replace('[edit'], '')

and

unit['State'] = unit['State'].str.[:-6]

Should both produce the needed output.

S.D.
  • 2,486
  • 1
  • 16
  • 23