-1

I currently have a Pandas DataFrame that contains many backslashes used in escape characters. For example, there are strings that are of the form 'Michael\'s dog'.

When I save this DataFrame to a CSV file using pandas.DataFrame.to_csv, I would like to get rid of these backslashes so that the entry in the CSV file would simply be "Michael's dog".

Is there a simply way that I can do this, either by taking advantage of a function or method? I've attempted to go through the original DataFrame and make the changes manually but I can't shake off the feeling that there must be a more efficient way.

Thank you.

Edit

Sorry for the confusion, perhaps I should have been more specific in my original question.

The data that I'm having trouble with is of the form:

[' [\'Mazda\', \'it\', "Mazda \'s", \'its\', \'its\', "Mazda \'s"]',
 " ['the 2019 Mazda3', 'the 2019 Mazda3', 'it', 'the 2019 Mazda3', 'The 2019 Mazda3', 'its']",
 " ['the car', 'its']",
 ' [\'the Japanese automaker\', "the brand \'s"]']

As you can see, the data is technically a list and not a string, which means that simply using replace won't work.

Sean
  • 2,890
  • 8
  • 36
  • 78
  • don't use str.replace, it will also replace actual data with '\' character... instead use https://stackoverflow.com/a/14820462/6741053 – รยקคгรђשค Jul 15 '19 at 07:16
  • @Suparshva I don't `replace` will create any problem in this case. – Divyanshu Srivastava Jul 15 '19 at 07:19
  • Thanks for the data, I added tests for your data in my answer. – รยקคгรђשค Jul 15 '19 at 07:33
  • The requirements aren't specific enough: should anything else be modified, besides changing a sequence of `\'` into `'`? Also, the data shown is just a list of strings - neither CSV file content nor a Dataframe, so it's hard to relate to the question being asked - and the strings in that list **do not actually have backslashes in them** (that's just something Python is inserting while showing a **representation of** the list). – Karl Knechtel Aug 05 '22 at 02:05

1 Answers1

2

Don't use str.replace, it will simply replace every '\' character.

Use this instead:

df.ColumnName.str.decode('unicode_escape')

Tests:

>>> data = {'Name':['Tom\\\\\'', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]} 
>>> df = pd.DataFrame(data)
>>> df.Name.str.decode('unicode_escape')
0    Tom\'
1     nick
2    krish
3     jack
Name: Name, dtype: object

Author's Tests:

>>> data
{'Name': [' [\'Mazda\', \'it\', "Mazda \'s", \'its\', \'its\', "Mazda \'s"]', " ['the 2019 Mazda3', 'the 2019 Mazda3', 'it', 'the 2019 Mazda3', 'The 2019 Mazda3', 'its']", " ['the car', 'its']", ' [\'the Japanese automaker\', "the brand \'s"]']}
>>> df = pd.DataFrame(data)
>>> df.Name.str.decode('unicode_escape')
0     ['Mazda', 'it', "Mazda 's", 'its', 'its', "Ma...
1     ['the 2019 Mazda3', 'the 2019 Mazda3', 'it', ...
2                                   ['the car', 'its']
3           ['the Japanese automaker', "the brand 's"]
Name: Name, dtype: object

Source: https://stackoverflow.com/a/14820462/6741053

รยקคгรђשค
  • 1,919
  • 1
  • 10
  • 18