Question marks simbol instead of CP-1252 characters in Python

Question

I would like to receive "Aragón" instead of "Arag�n".

My project consist in a Backend with a method that call the earth engine API in Python (ee) to receive a list of region names of a country (in this case of Spain), using the follow code:

regionList = (ee.FeatureCollection('FAO/GAUL_SIMPLIFIED_500m/2015/level1').filter(ee.Filter.eq('ADM0_NAME','Spain')))

The problem is that I receive the following:

['Andaluc�a', 'Arag�n', 'Canarias', 'Cantabria', 'Castilla-La Mancha', 'Castilla y Le�n', 'Catalu�a/Catalunya', 'Ciudad Aut�noma de Ceuta', 'Ciudad Aut�noma de Melilla', 'Comunidad de Madrid', 'Comunidad Foral de Navarra', 'Comunitat Valenciana', 'Extremadura', 'Galicia', 'Illes Balears', 'La Rioja', 'Pa�s Vasco/Euskadi', 'Principado de Asturias', 'Regi�n de Murcia']

So, in order to solve this problem I guess 2 options:

Is there any function in earth engine API (ee) to receive characters directly in CP-1252?
Is there any API in Python that convert these words with � into CP-1252 characters?

Thanks in advance.

Edit, Minimal reproducible example:

import ee

ee.Initialize()
collection = (ee.FeatureCollection('FAO/GAUL_SIMPLIFIED_500m/2015/level1').filter(ee.Filter.eq('ADM0_NAME','Spain')))
regionList = ee.List(collection.aggregate_array('ADM1_NAME')).getInfo()
print()

Please share a [mcve]. How do you convert `regionList` to the array you have received? — JosefZ, Apr 05 '21 at 12:54
Thanks for your response @JosefZ, I already edited the question. In order to reproduce the example you must have an account in google earth engine. I hope this helps you. — Rodrigo Alberto Guerrero Bermú, Apr 05 '21 at 14:08
Get `ADM1_CODE` list instead `ADM1_NAME`; [here is UTF-8 encoded CSV](https://data.apps.fao.org/catalog/dataset/2e3ec5af-2f1a-4068-87d0-36bd09fd9383/resource/cfdaf156-26b9-46c2-aab2-eb437fc16622/download/g2015_2005_1.csv) for converting back… A lame workaround, I know… Sorry Ask at https://gis.stackexchange.com/ — JosefZ, Apr 05 '21 at 17:01
@JosefZ how did you get this csv?, I mean in case of provinces of regions how can I create this csv?. Thanks for your answer. — Rodrigo Alberto Guerrero Bermú, Apr 06 '21 at 07:13
[Download the file](https://stackoverflow.com/a/7244263/3439404); then `import pandas as pd; df = pd.read_csv(file_path, encoding = 'utf-8', sep=','); mydict = df.set_index( ['ADM1_CODE']).to_dict( orient='dict')['ADM1_NAME']` and you have a dictionary, use it as `mydict[2724]` (returns `'Cataluña/Catalunya'` ). — JosefZ, Apr 06 '21 at 15:44
Thank you for this walkaround @JosefZ, just one more question about: I receive the data as a utf-8, but I would like to receive the data in a ISO8859 format. For example, I receive a province as "CÃŸdiz" instead of the correct word "Cádiz". How can I convert these utf-8 words into Spanish language?. Thank you in advance. — Rodrigo Alberto Guerrero Bermú, Apr 07 '21 at 09:55
There is important `encoding = 'utf-8'` in `pd.read_csv`. The `g2015_2005_1.csv` file does not contains a [Byte order mark](https://en.wikipedia.org/wiki/Byte_order_mark) so you could encounter a [mojibake](https://en.wikipedia.org/wiki/Mojibake) case in some apps. Something like `"Cádiz".encode( 'utf-8').decode( 'cp1252')` which returns `'CÃ¡diz'`. — JosefZ, Apr 07 '21 at 10:49

Question marks simbol instead of CP-1252 characters in Python

0 Answers0