3

I am making API calls and collecting the results as rows in a DataFrame object. The first two rows are text while the rest are numbers. Is there any way that I can have different data types within each column or said differently, can we set a data type for each row? I have tried convert_objects, astype et al. to convert the row before adding to the DataFrame but they don't work.

Example: Sample DataFrame

   0     1     2
0  text1 text2 text3
1  text1 text2 text3
2  no1   no2   no3
...
adastra21
  • 83
  • 1
  • 1
  • 9
  • 1
    Sorry why would you want a column with different dataTypes? – WoodChopper Nov 05 '15 at 05:12
  • @WoodChopper That's the way I am collecting the results of API calls. It's more robust to fix the columns and append rows, rather than keep increasing columns and fix rows. – adastra21 Nov 05 '15 at 05:23

3 Answers3

3

No, it's not possible. Somewhat simplistically, you can think of a DataFrame as something like a column dict of numpy.arrays, and those are homogeneously typed.

You write

That's the way I am collecting the results of API calls. It's more robust to fix the columns and append rows, rather than keep increasing columns and fix rows.

Given this usage pattern and types, you might consider if DataFrames are right for you at all. From my experience, DataFrames have horrible performance for dynamic row-by-row appending. You might consider using regular Python dicts and listss for the aggregation phase, then somehow process the data and stick it into a DataFrame.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Even, I thought the same. In R for sure this happens in data frame even we provide different data types it coerce values to a one type. – WoodChopper Nov 05 '15 at 05:38
  • 1
    @WoodChopper If I understand you correctly, that's what will happen here too. Heterogeneous-enough values will effectively make the type `Object`, which is as generic as can be. – Ami Tavory Nov 05 '15 at 05:41
2

Example

>>> import pandas as pd
>>> df = pd.DataFrame([['txt1','txt2'], [12, 22]], columns=['c1', 'c2'])
>>> df
     c1    c2
0  txt1  txt2
1    12    22

A row is an object:

>>> df.iloc[0]
c1    txt1
c2    txt2
Name: 0, dtype: object
>>> df.iloc[1]
c1    12
c2    22
Name: 1, dtype: object

And each individual cell depends on what value you put in it:

>>> df.iloc[0]['c2']
'txt2'
>>> type(df.iloc[0]['c2'])
<type 'str'>

>>> df.iloc[1]['c2']
22
>>> type(df.iloc[1]['c2'])
<type 'int'>

If you wish to specify the dtype of a row, you can do something like this,

change dtype of row 1 to int:

>>> df.iloc[1].apply(int)
c1    12
c2    22
Name: 1, dtype: int64
Aziz Alto
  • 19,057
  • 5
  • 77
  • 60
0

Given that you're collecting the results of API calls, it's quite likely that you should be storing the results as a list of tuples as an intermediate step rather than appending to a DataFrame. This should result in what you want.

def api_call(x):
    return 5.0, 'a', 42

df = pandas.DataFrame(map(api_call, args))

Note, if you're using Python 2.x, use itertools.imap instead of map.

As a side note, the comment that it's more robust to add rows instead of add columns is implausible. DataFrame.transpose() makes that distinction irrelevant.

Mike Selik
  • 88
  • 5