2

There is a column (spot_categories_name) in the dataframe like the one below. My goal is to get rid of the 'name' at the beginning and the parenthesis (}]") at the end. Briefly, I want to edit the following

Craftsman

BBQ

Theatre

Coffee Shop

...

enter image description here

drorhun
  • 500
  • 7
  • 22
  • 1
    `df['spot_categories_name'] = df['spot_categories_name'].map(lambda x: x.lstrip('\'name\': '))` see [here](https://stackoverflow.com/questions/13682044/remove-unwanted-parts-from-strings-in-a-column). Also, instead of pasting a picture, it would be helpful to see the dataframe pasted directly. – Life is Good Jan 21 '21 at 18:06
  • 1
    It seems like this dataframe was generated inefficiently. You should try to generate a dataframe correctly in the first place. – Mitchell Olislagers Jan 21 '21 at 18:08

2 Answers2

3

Use .str.extract():

df['spot_categories_name'] = df['spot_categories_name'].str.extract(r'\'name\': \'([^\']*)\'')
noah
  • 2,616
  • 13
  • 27
1

If you use pandas .str.split method it can split your string into arrays wherever it meets this character.

You can then use .str[n] to get the nth entry in these arrays. In your case you can slit on :' and '} and then the last and first entries after split and it seems to match your test cases. Here is an example below.

import pandas as pd
test = pd.DataFrame(data = ["'name': 'Craftman'}]","'name': 'BBQ'}]"],columns=['spot_categories_name'])
test.spot_categories_name.str.split(": '").str[-1].str.split("'}").str[0]
print(test.to_dict())
#{'spot_categories_name': {0: "'name': 'Craftman'}]", 1: "'name': 'BBQ'}]"}}
oli5679
  • 1,709
  • 1
  • 22
  • 34