how to remove special characters in python dictionary?

Question

output = [{'title': 'title 1\u200c',
  'subject': 'subject1\u200c','a'},
{'title': 'title 1\u200c',
  'subject': ['subject1\u200c','a','b']}]

This is what I tried:

output['title'] = s.replace("\u200c", "") for s in output['title']

I changed your `output` dictionary based on what you have written in your code, because the dictionary as you had written it wasn't valid (`SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 7-11: truncated \uXXXX escape`) Please check that it is correct — Pranav Hosangadi, Sep 27 '21 at 21:30
Does this answer your question? [Replace non-ASCII characters with a single space](https://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space) — Woodford, Sep 27 '21 at 21:34

Pranav Hosangadi · Answer 1 · 2021-09-27T21:43:48.093

2

What are you iterating for? You just need to replace the character from the string using str.replace().

output['title'] = output['title'].replace("\u200c", "")

This only changes value of the 'title' key of output

{'title': 'title 1', 'subject': 'subject1\u200c'}

If you want to remove the character from all items in output, you need a loop.:

for key, value in output.items():
    output[key] = value.replace("\u200c", "")

Or, as a dict comprehension:

output = {key: value.replace("\u200c", "") for key, value in output.items()}

 {'title': 'title 1', 'subject': 'subject1'}

Addressing your comments

I got this error for part one list indices must be integers or slices, not str

I got this error for second answer: 'list' object has no attribute 'items'

Its array of objects

Let's say output looks like this:

output = [{'title': 'title 1\u200c', 'subject': 'subject1\u200c'},
          {'title': 'title 2\u200c', 'subject': 'subject2\u200c'}]

You want to do what I showed above to each dict in output. Just replace output from before with elem

for elem in output:
    elem['title'] = elem['title'].replace("\u200c", "")

[{'title': 'title 1', 'subject': 'subject1\u200c'},
 {'title': 'title 2', 'subject': 'subject2\u200c'}]

Or, using a list and dict comprehension:

output = [
    {key: value.replace("\u200c", "") for key, value in elem.items()}
    for elem in output
    ]

[{'title': 'title 1', 'subject': 'subject1'},
 {'title': 'title 2', 'subject': 'subject2'}]

edited Sep 27 '21 at 21:43

answered Sep 27 '21 at 21:33

Pranav Hosangadi

23,755
7
44
70

I got this error for part one list indices must be integers or slices, not str – user12217822 Sep 27 '21 at 21:37
I got this error for second answer: 'list' object has no attribute 'items' – user12217822 Sep 27 '21 at 21:37
@user12217822 Then `output` is a _list_, not a _dictionary_ contrary to what you have shown in your question – Pranav Hosangadi Sep 27 '21 at 21:37
Its array of objects – user12217822 Sep 27 '21 at 21:39
@user12217822 In that case, please edit your question so that the details are correct. I will edit my answer to handle lists of dicts once you have done so – Pranav Hosangadi Sep 27 '21 at 21:40
I fixed it @user12217822 – user12217822 Sep 27 '21 at 21:43
@user12217822 See the edited answer – Pranav Hosangadi Sep 27 '21 at 21:46
I got this error: 'list' object has no attribute 'replace' – user12217822 Sep 27 '21 at 21:48
On what line? What `list` object is it trying to run `replace()` on? @user12217822 – Pranav Hosangadi Sep 27 '21 at 21:49
In this line: elem['title'] = elem['title'].replace("\u200c", "") – user12217822 Sep 27 '21 at 21:52
@user12217822 then `elem['title']` is a _list_, not a _string_ as you have shown in your question. Please take the time to understand what your inputs look like, and include _all relevant information_ in your question. – Pranav Hosangadi Sep 27 '21 at 21:52
now I didn't get error but it doesn't work – user12217822 Sep 27 '21 at 21:55

Andressa Cabistani · Accepted Answer · 2021-09-27T22:34:37.357

2

This isn't only a special character, those are Unicode Characters. To remove Unicode characters we can use the encode() python method. The encode will return a bytes object, and you can transform in string by using the decode method.

In [1]: title = "subject1\u200c"

In [2]: title.encode("ascii", "ignore")
Out[2]: b'subject1'

In [3]: title.encode("ascii", "ignore").decode()
Out[3]: 'subject1'

For your list of dicts, what you need is something like:

In [15]: output = [{'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}, {'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}]

In [16]: decoded_output = [value["title"].encode("ascii", "ignore").decode() for val
    ...: ue in output]

In [17]: decoded_output
Out[17]: ['title 1', 'title 1']

EDIT:

In [20]: for i in output:
    ...:     for key, value in i.items():
    ...:         value.encode("ascii", "ignore").decode()
    ...:         print(value)
    ...: 
title 1‌
subject1‌
title 1‌
subject1‌

As you have a list of dicts, you have to iterate in the list, and for each item of the list (that are dicts) you will iterate again using the items() dict method.

edited Sep 27 '21 at 22:34

answered Sep 27 '21 at 21:57

Andressa Cabistani

463
1
5
14

You're welcome, glad I could help :) – Andressa Cabistani Sep 27 '21 at 21:59
Can you please guide me to the how can I edit subject In addition title? – user12217822 Sep 27 '21 at 22:12
Sorry, English is not my first language and I didn't understand your question, can you write it in a different way for me? – Andressa Cabistani Sep 27 '21 at 22:18
In my question I have subject and title I learned how can I fix the title but what should I do for modify subject – user12217822 Sep 27 '21 at 22:22
Ahh okay, I'll edit the answer with the subject part – Andressa Cabistani Sep 27 '21 at 22:28
was the edit helpful? – Andressa Cabistani Sep 27 '21 at 22:40
you're welcome :D can you mark my answer as the right one if it was helpful for you? thank you – Andressa Cabistani Sep 27 '21 at 22:44
1

Yes, Of course :) – user12217822 Sep 27 '21 at 22:49

how to remove special characters in python dictionary?

2 Answers2