Question: How to 'pd.read_csv' so that the values in a given column are of type list (a list in each row of a column)?
When creating a DataFrame (from a dict, see below), individual values are of type list. The problem: After writing the DataFrame to a file and reading from the file back to a DataFrame, I get a string instead of a list.
creating the DataFrameimport pandas as pd
dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
"neg": [[58], [1332, 753, 716, 782], [187]],
"pos": [[96], [659, 661, 705, 1228], [1414]]}
df = pd.DataFrame(dict2df)
value is a list
type(df.loc[0, 'neg']) == list # --> True
type(df.loc[0, 'neg']) == str # --> False
df.loc[1, 'neg'][-1] == 782 # --> True
write to file
df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)
read from file
df = pd.read_csv('DataFrame.txt', sep='\t')
value is a string not a list
type(df.loc[0, 'neg']) == list # --> False
type(df.loc[0, 'neg']) == str # --> True
df.loc[1, 'neg'][-1] == 782 # --> False
Of course, it's possible to convert between the two data types, but it's computationally expensive and needs extra work (see below)
def convert_StringList2ListOfInt(string2convert):
return [int(ele) for ele in string2convert[1:-1].split(',')]
def DataFrame_StringOfInts2ListOfInts(df, cols2convert_list):
for column in cols2convert_list:
column_temp = column + "_temp"
df[column_temp] = df[column].apply(convert_StringList2ListOfInt, 1)
df[column] = df[column_temp]
df = df.drop(column_temp, axis=1)
return df
df = DataFrame_StringOfInts2ListOfInts(df, ['neg', 'pos'])
What would be a better (more pythonic) solution? It would be very convenient to iterate over the Integers in the list without having to convert them back and forth. Thank you for your support!!