python - converting unicode in list to dataframe

Question

I am using an API to get some data. The data returned is in Unicode (not a dictionary / json object).

# get data
data = []
for urls in api_call_list:
    data.append(requests.get(urls))

the data looks like this:

>>> data[0].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Taylor Swift;33100;0.83;0.20\r\n'

>>> data[1].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Rihanna;28100;0.76;0.33\r\n'

I want to put this in a DataFrame with Country, Celebrity, Song, Volume, CPP and Index as column names.

First I tried to split it on \r\n like this:

x = [i.text.split('\r\n') for i in data]

and got:

[[u'Country;Celebrity;Song Volume;CPP;Index',
  u'us;Taylor Swift;33100;0.83;0.20',
  u''],
 [u'Country;Celebrity;Song Volume;CPP;Index',
  u'us;Rihanna;28100;0.76;0.33',
  u'']]

Not sure where to go from here . . .

What should be the final result, with key:value as dict then to dataframe. — bhansa, Mar 14 '17 at 15:24

Psidom · Accepted Answer · 2017-03-14T15:31:49.517

3

You can use pandas.read_csv to read data as a list of data frames and then concatenate them:

# if you use python 2 change this to // from io import BytesIO and use BytesIO instead
from io import StringIO     
import pandas as pd

pd.concat([pd.read_csv(StringIO(d), sep = ";") for d in data])

Since your actual data is a list of responses, you may need access the text firstly:

pd.concat([pd.read_csv(StringIO(d.text), sep = ";") for d in data])

data = [u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Taylor Swift;33100;0.83;0.20\r\n', 
        u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Rihanna;28100;0.76;0.33\r\n']

edited Mar 14 '17 at 15:31

answered Mar 14 '17 at 15:26

Psidom

209,562
33
339
356

1

exactly that :) It used to be from StringIO import StringIO, see a nice import version here http://stackoverflow.com/questions/22604564/how-to-create-a-pandas-dataframe-from-string – Roelant Mar 14 '17 at 15:26
It converts string object to a file like handle so that `pandas.read_csv` can read. – Psidom Mar 14 '17 at 16:12
what about something like `pd.read_csv(String('\n'.join(data)), sep=';')`... unverified. – piRSquared Mar 14 '17 at 19:49
@piRSquared Each element in the list contains a header, so it will be better to read them separately and then concatenate. – Psidom Mar 14 '17 at 19:53
@Psidom I missed that. Good call then – piRSquared Mar 14 '17 at 20:41

python - converting unicode in list to dataframe

1 Answers1