-1

how to remove special characters in python dictionary?

output = [{'title': 'title 1\u200c',
  'subject': 'subject1\u200c','a'},
{'title': 'title 1\u200c',
  'subject': ['subject1\u200c','a','b']}]

This is what I tried:

output['title'] = s.replace("\u200c", "") for s in output['title']
user12217822
  • 292
  • 1
  • 5
  • 16
  • I changed your `output` dictionary based on what you have written in your code, because the dictionary as you had written it wasn't valid (`SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 7-11: truncated \uXXXX escape`) Please check that it is correct – Pranav Hosangadi Sep 27 '21 at 21:30
  • I get syntax error because of "for" – user12217822 Sep 27 '21 at 21:32
  • Does this answer your question? [Replace non-ASCII characters with a single space](https://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space) – Woodford Sep 27 '21 at 21:34

2 Answers2

2

What are you iterating for? You just need to replace the character from the string using str.replace().

output['title'] = output['title'].replace("\u200c", "")

This only changes value of the 'title' key of output

{'title': 'title 1', 'subject': 'subject1\u200c'}

If you want to remove the character from all items in output, you need a loop.:

for key, value in output.items():
    output[key] = value.replace("\u200c", "")

Or, as a dict comprehension:

output = {key: value.replace("\u200c", "") for key, value in output.items()}
 {'title': 'title 1', 'subject': 'subject1'}

Addressing your comments

I got this error for part one list indices must be integers or slices, not str

I got this error for second answer: 'list' object has no attribute 'items'

Its array of objects

Let's say output looks like this:

output = [{'title': 'title 1\u200c', 'subject': 'subject1\u200c'},
          {'title': 'title 2\u200c', 'subject': 'subject2\u200c'}]

You want to do what I showed above to each dict in output. Just replace output from before with elem

for elem in output:
    elem['title'] = elem['title'].replace("\u200c", "")
[{'title': 'title 1', 'subject': 'subject1\u200c'},
 {'title': 'title 2', 'subject': 'subject2\u200c'}]

Or, using a list and dict comprehension:

output = [
    {key: value.replace("\u200c", "") for key, value in elem.items()}
    for elem in output
    ]
[{'title': 'title 1', 'subject': 'subject1'},
 {'title': 'title 2', 'subject': 'subject2'}]
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
2

This isn't only a special character, those are Unicode Characters. To remove Unicode characters we can use the encode() python method. The encode will return a bytes object, and you can transform in string by using the decode method.

In [1]: title = "subject1\u200c"

In [2]: title.encode("ascii", "ignore")
Out[2]: b'subject1'

In [3]: title.encode("ascii", "ignore").decode()
Out[3]: 'subject1'

For your list of dicts, what you need is something like:

In [15]: output = [{'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}, {'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}]

In [16]: decoded_output = [value["title"].encode("ascii", "ignore").decode() for val
    ...: ue in output]

In [17]: decoded_output
Out[17]: ['title 1', 'title 1']

EDIT:

In [20]: for i in output:
    ...:     for key, value in i.items():
    ...:         value.encode("ascii", "ignore").decode()
    ...:         print(value)
    ...: 
title 1‌
subject1‌
title 1‌
subject1‌

As you have a list of dicts, you have to iterate in the list, and for each item of the list (that are dicts) you will iterate again using the items() dict method.

Andressa Cabistani
  • 463
  • 1
  • 5
  • 14