1

I am trying to read the csv file containing the information about the bills/issues voted by the Ukrainian parliament into a pandas dataframe. The csv file contains the column 'name_question' that has the titles of the bills/issues in Ukrainian language. I read it into a dataframe:

import pandas as pd

url = 'https://data.rada.gov.ua/ogd/zal/ppz/skl9/plenary_agenda-skl9.csv'
bills = pd.read_csv(url)
bills.head()

And I got this result. It seems that all Cyrillic characters are replaced with '?':

date_agenda id_question number_question init_question name_question
0 2019-08-29 201908291 0 0 ?????????????????? ???????????????????????????...
1 2019-08-29 201908292 0 0 ??????????????????????????????????
2 2019-08-29 201908293 0 0 ??????????????????????????????????????????'???...
3 2019-08-29 201908294 0 0 ????????,?????????????????????????????????
4 2019-08-29 201908295 1001 ? ??????????????????????????????????????????????...

I checked the encoding of the csv file, after downloading it and using this advice. The output was as follows:

<_io.TextIOWrapper name='C:\\Users\\dryingmouth\\data\\bills\\plenary_agenda-skl9.csv' mode='r' encoding='cp1251'>

I then edited the code to specify the encoding as a parameter of the read_csv() function:

bills = pd.read_csv(url, encoding='cp1251')
bills.head()

But the output was the same. What can I do to correctly display the Cyrillic (Ukrainian) characters in the dataframe created from this csv file?

0 Answers0