python pandas: list of integers as individual values of DataFrame

Question

Question: How to 'pd.read_csv' so that the values in a given column are of type list (a list in each row of a column)?

When creating a DataFrame (from a dict, see below), individual values are of type list. The problem: After writing the DataFrame to a file and reading from the file back to a DataFrame, I get a string instead of a list.

creating the DataFrame

import pandas as pd
dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"], 
           "neg": [[58], [1332, 753, 716, 782], [187]], 
           "pos": [[96], [659, 661, 705, 1228], [1414]]}
df = pd.DataFrame(dict2df)

value is a list

type(df.loc[0, 'neg']) == list # --> True
type(df.loc[0, 'neg']) == str # --> False
df.loc[1, 'neg'][-1] == 782 # --> True

write to file

df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)

read from file

df = pd.read_csv('DataFrame.txt', sep='\t')

value is a string not a list

type(df.loc[0, 'neg']) == list # --> False
type(df.loc[0, 'neg']) == str # --> True
df.loc[1, 'neg'][-1] == 782 # --> False

Of course, it's possible to convert between the two data types, but it's computationally expensive and needs extra work (see below)

def convert_StringList2ListOfInt(string2convert):
    return [int(ele) for ele in string2convert[1:-1].split(',')]

def DataFrame_StringOfInts2ListOfInts(df, cols2convert_list):
    for column in cols2convert_list:
        column_temp = column + "_temp"
        df[column_temp] = df[column].apply(convert_StringList2ListOfInt, 1)
        df[column] = df[column_temp]
        df = df.drop(column_temp, axis=1)
    return df
df = DataFrame_StringOfInts2ListOfInts(df, ['neg', 'pos'])

What would be a better (more pythonic) solution? It would be very convenient to iterate over the Integers in the list without having to convert them back and forth. Thank you for your support!!

Anand S Kumar · Accepted Answer · 2015-10-28T19:43:05.620

You can use ast.literal_eval() to convert the strings to lists.

A simple example of ast.literal_eval() -

>>> import ast
>>> l = ast.literal_eval('[10,20,30]')
>>> type(l)
<class 'list'>

For your case, you can pass it to Series.apply , so that each element in the series is evaluated (safely). Example -

df = pd.read_csv('DataFrame.txt', sep='\t')
import ast
df['neg_list'] = df['neg'].apply(ast.literal_eval)
df = df.drop('neg',axis=1)
df['pos_list'] = df['pos'].apply(ast.literal_eval)
df = df.drop('pos',axis=1)

Demo -

In [15]: import pandas as pd

In [16]: dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
   ....:            "neg": [[58], [1332, 753, 716, 782], [187]],
   ....:            "pos": [[96], [659, 661, 705, 1228], [1414]]}

In [17]: df = pd.DataFrame(dict2df)

In [18]: df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)

In [19]: newdf = pd.read_csv('DataFrame.txt', sep='\t')

In [20]: newdf['neg']
Out[20]:
0                     [58]
1    [1332, 753, 716, 782]
2                    [187]
Name: neg, dtype: object

In [21]: newdf['neg'][0]
Out[21]: '[58]'

In [22]: import ast

In [23]: newdf['neg_list'] = newdf['neg'].apply(ast.literal_eval)

In [24]: newdf = newdf.drop('neg',axis=1)

In [25]: newdf['pos_list'] = newdf['pos'].apply(ast.literal_eval)

In [26]: newdf = newdf.drop('pos',axis=1)

In [27]: newdf
Out[27]:
         euNOG               neg_list               pos_list
0  ENOG410IF52                   [58]                   [96]
1      KOG2956  [1332, 753, 716, 782]  [659, 661, 705, 1228]
2      KOG1997                  [187]                 [1414]

In [28]: newdf['neg_list'][0]
Out[28]: [58]

thank you! this is more pythonic in the sense that it uses a library instead of my own code. I was hoping to find a solution that would include this conversion when reading the data frame. — tryptofame, Nov 02 '15 at 11:14

python pandas: list of integers as individual values of DataFrame

value is a list

value is a string not a list

1 Answers1

Linked