7

I have a df that consist of 100 rows and 24 columns. The column type is string. It's throwing me the following error when I tried to append the data frame to KDB

UnicodeEncodeError: 'ascii' codec can't encode character '\xd3' in position 9: ordinal not in range(128)

Here is an example of the first row in my df.

                        AnnouncementDate AuctionDate    BBT  \
_id
00000067   2012-12-11T00:00:00.000+00:00         NaN   FHLB

           CouponDividendRate DaysToSettle  \
_id
00000067                 0.61            1

                                        Description  \
_id
00000067                         FHLB 0.61 12/28/16

                     FirstSettlementDate           ISN IsAgency IsWhenIssued  \
_id
00000067   2012-12-28T00:00:00.000+00:00  US313381K796     True        False


           ...  OnTheRunTreasury OperationalIndicator  \
_id        ...
00000067   ...               NaN                False


          OriginalAmountOfPrincipal OriginalMaturityDate  \
_id
00000067                 13000000.0                  NaN


          PrincipalAmountOutstanding       SCSP       SMCP  \
_id
00000067                         0.0  313381K79   76000000

           SecurityTypeLevel1 SecurityTypeLevel2   TCK
_id
00000067          US-DOMESTIC                NaN   NaN

My question is, is there an easy way to convert my df to utf-8 format?

Possibly something like df = df.encode('utf-8')

Thanks

Chris Johnson
  • 279
  • 2
  • 5
  • 10
  • 2
    At some point you populated the dataframe, how? The easiest way to solve this is to send the right values from start. Also, share a small sample with df.head().to_dict(). – Anton vBR Jul 31 '17 at 20:06
  • The df is populated from a json object. It's done automatically using json_normalize. – Chris Johnson Jul 31 '17 at 20:08
  • try this https://stackoverflow.com/questions/33699343/convert-every-dictionary-value-to-utf-8-dictionary-comprehension before you use json_normalize or use https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html and set encoding to utf8 – Mohamed Ali JAMAOUI Jul 31 '17 at 21:23

1 Answers1

10

It depends on how you're outputting the data. If you're simply using csv files, which you then import to KDB, then you can specify that easily:

df.to_csv('df_output.csv', encoding='utf-8')

Or, you can set the encoding when you import the data to Pandas originally, using the same syntax.

If you're connecting directly to KDB using SQLAlchemy or something similar, you should try specifying this in the connection itself - see this question: Another UnicodeEncodeError when using pandas method to_sql with MySQL

Ricky McMaster
  • 4,289
  • 2
  • 24
  • 23