2

I have a pandas dataframe (can convert it to numpy array if it's better) like this:

X_train

And I would like to convert each value from strings to numeric

I have tried to use things like convert objects but it doesn't work at all. I think the problem are the square brackets so the function would work If i can get rid of them.

Greetings and thanks in advance

Edit:

Here is where the data comes from

X_ans=[]
Y_ans=[]
for i in range (len(data["Births"])-2):
    X=list(data["Births"])[i:i+3]
    Y=list(data["Births"])[i+1]
    X_ans.append(X)
    Y_ans.append(Y)
    in_=pd.DataFrame([ str(x) for x in X_ans ],columns=['input'])
    out=pd.DataFrame([ str(x) for x in Y_ans ],columns=['output'])
ans_1=pd.concat([in_,out],axis=1)

ans_1 would be like that:

enter image description here

Now I split it:

msk = np.random.rand(len(ans_1)) < 0.8
traindf = ans_1[msk]
evaldf = ans_1[~msk]

And split the values which are separated by commas to get the dimensions

    X_train = traindf.iloc[:, 0]
    Y_train = traindf.iloc[:, 1]
    X_test = evaldf.iloc[:, 0]
    Y_test = evaldf.iloc[:, 1]
    X_train = X_train.str.split(pat = ',', expand = True)
    X_train = X_train.values
    X_test = X_test.str.split(pat = ',', expand = True)
    X_test = X_test.values

PS:I can use values:

enter image description here

A. Esquivias
  • 230
  • 3
  • 10
  • 1
    Possible duplicate of [Converting strings to floats in a DataFrame](https://stackoverflow.com/questions/16729483/converting-strings-to-floats-in-a-dataframe) – Sotos Nov 13 '18 at 11:00
  • Can you give us a line of code that instantiates that dataframe? It's easier to work with than a picture. – timgeb Nov 13 '18 at 11:01
  • 1
    Where are you getting the data from? I'd suggest getting rid of those brackets as early as possible or at least before generating the dataframe in the first place... – Sebastian Loehner Nov 13 '18 at 11:01

2 Answers2

2

Use replace:

df = df.replace(r'\[|\]','',regex=True).astype(float)

for array use:

arr = df.values
Space Impact
  • 13,085
  • 23
  • 48
0
import pandas as pd
df=pd.DataFrame({0:['[3242','232','243214]'],1:['[3242','232','243214]']})

df:

        0        1
0    [3242    [3242
1      232      232
2  243214]  243214]

If you want to generalize the logic to remove all non-numeric characters,

df.replace(regex=r'\D',value='',inplace=True)
df=df.apply(pd.to_numeric)

Output:

        0       1
0    3242    3242
1     232     232
2  243214  243214
Venkatachalam
  • 16,288
  • 9
  • 49
  • 77