1

I have the following dataframe (from a large csv file using pd.read_csv):

sal_vcf_to_df = pd.read_csv(sal_filepath, delimiter='\t', header = 0, index_col = False,
                            low_memory=False, usecols=['listA', 'Amino_Acid_Change', 'Gene_Name'])

sal_df_wo_na = sal_vcf_to_df.dropna(axis = 0, how = 'any')

sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x : ast.literal_eval(x))
sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x: list(map(float, x)))

The dataframe I got:

            listA                Amino_Acid_Change        Gene_Name
0  "['133', '115', '3', '1']"        Q637K                 ATM                   
1  "['114', '115', '2', '3']"        I111                  PIK3R1
2  "['51', '59', '1', '1']"          T2491                 KMT2C

I'd like to convert the 'listA' column to list of floats. So far I've tried to do it in several steps:

sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x : ast.literal_eval(x))

then:

sal_df_wo_na['DP4_freeBayes'] = sal_df_wo_na['DP4_freeBayes'].apply(lambda x: list(map(float, x)))

But I got the follwing warning after the first step:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Does anyone know how to fix the warning or have a better solution?

Bella
  • 937
  • 1
  • 13
  • 25
  • The biggest issue here isn't the conversion. That's easily enough done with `df.listA.str.replace("'", '').apply(ast.literal_eval)`. I fear you have another problem generated by code that is not shown here. May I see all your code? – cs95 Dec 27 '17 at 10:05
  • Actually, you can do this conversion even faster, with `pd.eval`. `df.listA = pd.eval(df.listA.str.replace("['\"]", ''))` – cs95 Dec 27 '17 at 10:09
  • Got the following error: AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis' – Bella Dec 27 '17 at 10:19
  • The fallback would be `df.listA = df.listA.str.replace("'", '').apply(ast.literal_eval)`, because `pd.eval` only supports upto 100 rows. – cs95 Dec 27 '17 at 10:23

1 Answers1

0

Option 1
pd.eval - Works for upto 100 rows
A really quick way of performing conversion on that horrendous looking column is to get rid of all the quotes and then call pd.eval -

v = pd.eval(df.listA.str.replace("['\"]", '')).astype(float)

v
array([[ 133.,  115.,    3.,    1.],
       [ 114.,  115.,    2.,    3.],
       [  51.,   59.,    1.,    1.]])

Assign the result back -

df['listA'] = v
df

              listA Amino_Acid_Change Gene_Name
0  [133, 115, 3, 1]             Q637K       ATM
1  [114, 115, 2, 3]              I111    PIK3R1
2    [51, 59, 1, 1]             T2491     KMT2C

Option 2
ast.literal_eval - The reliable workhorse
Update: pd.eval only supports upto a 100 rows, so the slower, more reliable fallback would be using ast.literal_eval -

from ast import literal_eval

df.listA = df.listA.str.replace("'", '').apply(literal_eval)
df 

              listA Amino_Acid_Change Gene_Name
0  [133, 115, 3, 1]             Q637K       ATM
1  [114, 115, 2, 3]              I111    PIK3R1
2    [51, 59, 1, 1]             T2491     KMT2C

As for the SettingWithCopyWarning, the best source of reading is

In a nutshell, what you're doing is creating sal_df_wo_na by extracting a slice/view from a larger dataframe, something like this -

sal_df_wo_na = df[<some condition here>]

This could lead to chained indexing, which pandas warns against. Instead, you'd need to do something like

sal_df_wo_na = df[<some condition here>].copy()

By creating a copy of the slice using the pd.DataFrame.copy function. If you have objects in your column, add deep=True as an argument to copy.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Tried the first part of your answer and got the error: AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis' – Bella Dec 27 '17 at 10:25
  • oh I still got the error: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – Bella Dec 27 '17 at 10:33
  • @Bella Thanks for asking! This let me to figure out the reason for `pd.eval` bugging out. I'll probably write a Q&A on it later. – cs95 Dec 27 '17 at 10:36
  • 1
    @Bella Actually, try `sal_df_wo_na = sal_vcf_to_df.dropna(axis = 0, how = 'any').copy(deep=True)` – cs95 Dec 27 '17 at 10:40
  • Do you know how can I select the rows that equal to zero or 1 at the last position of the list in 'listA' column? – Bella Jan 04 '18 at 08:56
  • 1
    @Bella `df[df.listA.str[-1].isin([0, 1])]` – cs95 Jan 04 '18 at 08:57