0

I am trying to convert dataframe column to a list of floats without success.

My original code is as below and this is a Link to the excel workbook I am working on.

df = pd.read_csv('C:/Users/radoy/Programming/ML Chart Patterns/xx_combined.csv')
df['opens'].astype(float)
df.dtypes
periods           int64
opens            object
highs            object
lows             object
closes           object
volumes           int64
consolidating    object
dtype: object

I keep getting an error:

ValueError: could not convert string to float: '[[63.240001

I have gone through the suggestions on another post here but none of them seem to work either.

Converting strings to floats in a DataFrame

What am I doing wrong? Any guidance will be appreciated.

Dodo
  • 55
  • 10
  • You can't directly turn a list into a float – Omaro_IB Jun 28 '22 at 16:05
  • 1
    It looks like you have some bad data; that's all I can say without seeing it myself, and I'm not going to access a site I've never heard of (wetransfer.com) to do that. You need to provide the data *in the question itself*, not an external link. See [mre]. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). Please also provide the expected output for completeness. And it wouldn't hurt to include the [full error message with traceback](https://meta.stackoverflow.com/q/359146/4518341). – wjandrea Jun 28 '22 at 16:09

3 Answers3

1

This should select numbers (with dots) from the strings in your data.

import re

pattern = r"[\d.]+"
df['opens'] = df['opens'].apply(lambda x: re.search(pattern, x)[0]).astype(float)
Ignatius Reilly
  • 1,594
  • 2
  • 6
  • 15
0

Try this:


import numpy as np


df['opens'] = (
    df['opens']
    .str.strip('[')
    .str.strip(']')
    .str.strip(' ')
    .str.replace(', ', ',')
    .str.split(',')
    .apply(np.array, dtype=float)
)

The error you're getting (ValueError: could not convert string to float: '[[63.240001) is due to the fact that the values from the column opens are being read as strings, instead of lists of values.

For example, the first value of opens being read is:

"[63.2400016784668, 62.20000076293945, 61.91999816894531, 61.40000152587891, 60.65999984741211, 60.04000091552734, 61.27999877929688, 60.0, 59.11999893188477, 57.88000106811523, 57.7599983215332, 59.04000091552734, 58.18000030517578, 55.29999923706055, 54.13999938964844, 54.52000045776367, 54.13999938964844, 56.72000122070312, 57.0, 58.29999923706055, 58.34000015258789, 58.04000091552734, 58.5, 58.45999908447266, 58.34000015258789, 56.09999847412109, 56.72000122070312, 58.5, 59.13999938964844, 58.41999816894531, 58.65999984741211, 57.90000152587891, 56.43999862670898, 55.7599983215332, 56.27999877929688, 55.22000122070312, 56.5, 56.58000183105469, 56.72000122070312, 56.38000106811523, 55.72000122070312, 55.88000106811523, 56.7400016784668, 58.06000137329102, 58.79999923706055, 59.40000152587891, 59.56000137329102, 58.18000030517578, 58.11999893188477, 57.72000122070312, 57.79999923706055, 56.88000106811523, 57.31999969482422, 56.11999893188477, 56.59999847412109, 56.38000106811523, 57.15999984741211, 56.08000183105469, 56.93999862670898, 57.86000061035156, 57.88000106811523, 58.54000091552734, 58.70000076293945, 57.81999969482422, 58.68000030517578, 58.58000183105469]"

instead of something like:

['63.2400016784668',
 '62.20000076293945',
 '61.91999816894531',
 '61.40000152587891',
 '60.65999984741211',
 '60.04000091552734',
 '61.27999877929688',
 '60.0',
 '59.11999893188477',
 '57.88000106811523',
 '57.7599983215332',
 '59.04000091552734',
 '58.18000030517578',
 '55.29999923706055',
 '54.13999938964844',
 '54.52000045776367',
 '54.13999938964844',
 '56.72000122070312',
 '57.0',
 '58.29999923706055',
 '58.34000015258789',
 '58.04000091552734',
 '58.5',
 '58.45999908447266',
 '58.34000015258789',
 '56.09999847412109',
 '56.72000122070312',
 '58.5',
 '59.13999938964844',
 '58.41999816894531',
 '58.65999984741211',
 '57.90000152587891',
 '56.43999862670898',
 '55.7599983215332',
 '56.27999877929688',
 '55.22000122070312',
 '56.5',
 '56.58000183105469',
 '56.72000122070312',
 '56.38000106811523',
 '55.72000122070312',
 '55.88000106811523',
 '56.7400016784668',
 '58.06000137329102',
 '58.79999923706055',
 '59.40000152587891',
 '59.56000137329102',
 '58.18000030517578',
 '58.11999893188477',
 '57.72000122070312',
 '57.79999923706055',
 '56.88000106811523',
 '57.31999969482422',
 '56.11999893188477',
 '56.59999847412109',
 '56.38000106811523',
 '57.15999984741211',
 '56.08000183105469',
 '56.93999862670898',
 '57.86000061035156',
 '57.88000106811523',
 '58.54000091552734',
 '58.70000076293945',
 '57.81999969482422',
 '58.68000030517578',
 '58.58000183105469']
Ingwersen_erik
  • 1,701
  • 1
  • 2
  • 9
-1

there are some some non digit characters like space etc. you can try to replace them all with null prior to converting to float

df['open'].replace(r'(\"|\[|\s)',"", regex=True).astype('float')
0     63.240002
1     12.600000
2    213.149994
3     11.280000
4     76.820000
5     15.600000
6    210.179993
Name: opens, dtype: float64

Test Data from provided Excel

periods opens
65  [63.2400016784668"
29  [12.60000038146973
19  [213.1499938964844
44  [11.27999973297119
78  [76.81999969482422
34  [15.60000038146973
74  [210.1799926757812 
Naveed
  • 11,495
  • 2
  • 14
  • 21