Why can't I convert dataframe column to float

Question

I am trying to convert dataframe column to a list of floats without success.

My original code is as below and this is a Link to the excel workbook I am working on.

df = pd.read_csv('C:/Users/radoy/Programming/ML Chart Patterns/xx_combined.csv')
df['opens'].astype(float)

df.dtypes
periods           int64
opens            object
highs            object
lows             object
closes           object
volumes           int64
consolidating    object
dtype: object

I keep getting an error:

ValueError: could not convert string to float: '[[63.240001

I have gone through the suggestions on another post here but none of them seem to work either.

Converting strings to floats in a DataFrame

What am I doing wrong? Any guidance will be appreciated.

It looks like you have some bad data; that's all I can say without seeing it myself, and I'm not going to access a site I've never heard of (wetransfer.com) to do that. You need to provide the data *in the question itself*, not an external link. See [mre]. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). Please also provide the expected output for completeness. And it wouldn't hurt to include the [full error message with traceback](https://meta.stackoverflow.com/q/359146/4518341). — wjandrea, Jun 28 '22 at 16:09

score 1 · Accepted Answer · answered Jun 28 '22 at 16:20

1

This should select numbers (with dots) from the strings in your data.

import re

pattern = r"[\d.]+"
df['opens'] = df['opens'].apply(lambda x: re.search(pattern, x)[0]).astype(float)

answered Jun 28 '22 at 16:20

Ignatius Reilly

1,594
2
6
15

score 0 · Answer 2 · answered Jun 28 '22 at 16:24

Try this:


import numpy as np


df['opens'] = (
    df['opens']
    .str.strip('[')
    .str.strip(']')
    .str.strip(' ')
    .str.replace(', ', ',')
    .str.split(',')
    .apply(np.array, dtype=float)
)

The error you're getting (ValueError: could not convert string to float: '[[63.240001) is due to the fact that the values from the column opens are being read as strings, instead of lists of values.

For example, the first value of opens being read is:

"[63.2400016784668, 62.20000076293945, 61.91999816894531, 61.40000152587891, 60.65999984741211, 60.04000091552734, 61.27999877929688, 60.0, 59.11999893188477, 57.88000106811523, 57.7599983215332, 59.04000091552734, 58.18000030517578, 55.29999923706055, 54.13999938964844, 54.52000045776367, 54.13999938964844, 56.72000122070312, 57.0, 58.29999923706055, 58.34000015258789, 58.04000091552734, 58.5, 58.45999908447266, 58.34000015258789, 56.09999847412109, 56.72000122070312, 58.5, 59.13999938964844, 58.41999816894531, 58.65999984741211, 57.90000152587891, 56.43999862670898, 55.7599983215332, 56.27999877929688, 55.22000122070312, 56.5, 56.58000183105469, 56.72000122070312, 56.38000106811523, 55.72000122070312, 55.88000106811523, 56.7400016784668, 58.06000137329102, 58.79999923706055, 59.40000152587891, 59.56000137329102, 58.18000030517578, 58.11999893188477, 57.72000122070312, 57.79999923706055, 56.88000106811523, 57.31999969482422, 56.11999893188477, 56.59999847412109, 56.38000106811523, 57.15999984741211, 56.08000183105469, 56.93999862670898, 57.86000061035156, 57.88000106811523, 58.54000091552734, 58.70000076293945, 57.81999969482422, 58.68000030517578, 58.58000183105469]"

instead of something like:

['63.2400016784668',
 '62.20000076293945',
 '61.91999816894531',
 '61.40000152587891',
 '60.65999984741211',
 '60.04000091552734',
 '61.27999877929688',
 '60.0',
 '59.11999893188477',
 '57.88000106811523',
 '57.7599983215332',
 '59.04000091552734',
 '58.18000030517578',
 '55.29999923706055',
 '54.13999938964844',
 '54.52000045776367',
 '54.13999938964844',
 '56.72000122070312',
 '57.0',
 '58.29999923706055',
 '58.34000015258789',
 '58.04000091552734',
 '58.5',
 '58.45999908447266',
 '58.34000015258789',
 '56.09999847412109',
 '56.72000122070312',
 '58.5',
 '59.13999938964844',
 '58.41999816894531',
 '58.65999984741211',
 '57.90000152587891',
 '56.43999862670898',
 '55.7599983215332',
 '56.27999877929688',
 '55.22000122070312',
 '56.5',
 '56.58000183105469',
 '56.72000122070312',
 '56.38000106811523',
 '55.72000122070312',
 '55.88000106811523',
 '56.7400016784668',
 '58.06000137329102',
 '58.79999923706055',
 '59.40000152587891',
 '59.56000137329102',
 '58.18000030517578',
 '58.11999893188477',
 '57.72000122070312',
 '57.79999923706055',
 '56.88000106811523',
 '57.31999969482422',
 '56.11999893188477',
 '56.59999847412109',
 '56.38000106811523',
 '57.15999984741211',
 '56.08000183105469',
 '56.93999862670898',
 '57.86000061035156',
 '57.88000106811523',
 '58.54000091552734',
 '58.70000076293945',
 '57.81999969482422',
 '58.68000030517578',
 '58.58000183105469']

Naveed · Answer 3 · 2022-06-28T16:57:32.800

-1

there are some some non digit characters like space etc. you can try to replace them all with null prior to converting to float

df['open'].replace(r'(\"|\[|\s)',"", regex=True).astype('float')

0     63.240002
1     12.600000
2    213.149994
3     11.280000
4     76.820000
5     15.600000
6    210.179993
Name: opens, dtype: float64

Test Data from provided Excel

periods opens
65  [63.2400016784668"
29  [12.60000038146973
19  [213.1499938964844
44  [11.27999973297119
78  [76.81999969482422
34  [15.60000038146973
74  [210.1799926757812

edited Jun 28 '22 at 16:57

answered Jun 28 '22 at 16:12

Naveed

11,495
2
14
21

@Dodo, one line solution, that is vectorized. Does it work for you? – Naveed Jun 28 '22 at 16:58

Why can't I convert dataframe column to float

3 Answers3