I am trying to read the csv file containing the information about the bills/issues voted by the Ukrainian parliament into a pandas dataframe. The csv file contains the column 'name_question' that has the titles of the bills/issues in Ukrainian language. I read it into a dataframe:
import pandas as pd
url = 'https://data.rada.gov.ua/ogd/zal/ppz/skl9/plenary_agenda-skl9.csv'
bills = pd.read_csv(url)
bills.head()
And I got this result. It seems that all Cyrillic characters are replaced with '?':
date_agenda | id_question | number_question | init_question | name_question | |
---|---|---|---|---|---|
0 | 2019-08-29 | 201908291 | 0 | 0 | ?????????????????? ???????????????????????????... |
1 | 2019-08-29 | 201908292 | 0 | 0 | ?????????????????????????????????? |
2 | 2019-08-29 | 201908293 | 0 | 0 | ??????????????????????????????????????????'???... |
3 | 2019-08-29 | 201908294 | 0 | 0 | ????????,????????????????????????????????? |
4 | 2019-08-29 | 201908295 | 1001 | ? | ??????????????????????????????????????????????... |
I checked the encoding of the csv file, after downloading it and using this advice. The output was as follows:
<_io.TextIOWrapper name='C:\\Users\\dryingmouth\\data\\bills\\plenary_agenda-skl9.csv' mode='r' encoding='cp1251'>
I then edited the code to specify the encoding as a parameter of the read_csv() function:
bills = pd.read_csv(url, encoding='cp1251')
bills.head()
But the output was the same. What can I do to correctly display the Cyrillic (Ukrainian) characters in the dataframe created from this csv file?