1

I have a excel spreadsheet that has a table like this:

Variable bins_edges_list
var1 [0.998, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 14.0, 90.0]
var2 [0.999, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 9.0, 11.0, 14.0, 90.0]
var3 [-0.001, 1.0, 2.0, 3.0, 4.0, 5.0, 7.0, 83.0]

I read it into a pandas dataframe using the command

df_bin_edges = pd.read_excel('bin_edges_data_backup.xlsx')
df_bin_edges = df_bin_edges[['Variable','bins_edges_list']]

I tried to extract the bins corresponding to "Variable"

bins_list = df_bin_edges[df_bin_edges['Variable'] == 'var1']['bins_edges_list']
bins_list.replace("'","")

What I expect to see is

bins_list = [0.998, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 14.0, 90.0]

but I get the following instead

bins_list = '[0.998, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 14.0, 90.0]' 

I want to use this bins_list in pd.cut to divide variables into bins.

pd.cut(df[col], bins=bins_list,labels=None, retbins=False, duplicates='drop', precision=0).astype(str)

I get the following error

ValueError: could not convert string to float: '[0.998, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 14.0, 90.0]'

Please let me know what I am doing wrong

Zenvega
  • 1,974
  • 9
  • 28
  • 45
  • 1
    have a look https://stackoverflow.com/questions/50278300/convert-a-columns-of-string-to-list-in-pandas – Epsi95 Aug 22 '21 at 03:49
  • You may have to specify the dtype argument to read_excel, see https://stackoverflow.com/questions/32591466/python-pandas-how-to-specify-data-types-when-reading-an-excel-file – David Waterworth Aug 22 '21 at 03:56

3 Answers3

0

It's because reading it doesn't automatically make it a list, so try using pd.eval.

Add the following line after this line df_bin_edges = df_bin_edges[['Variable','bins_edges_list']]:

df['bins_edges_list'] = df['bins_edges_list'].apply(pd.eval)
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
0

Try via literal_eval():

from ast import literal_eval:

df['bins_edges_list']=df['bins_edges_list'].map(literal_eval)

OR

via pd.eval()

df['bins_edges_list']=pd.eval(df['bins_edges_list'])
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
-1

I believe this would help.

The problem is that ['bins_edges_list'] is a column containing string, you need to split the characters and for each character create a new column for the Dataframe.