0

I am splitting a data frame into test and train dataframes but have a problem with the output.I want to et the dataframes for train and test The code is as follows.

train_prices=int(len(prices)*0.65)
test_prices=len(prices)-train_prices
train_data,test_data=prices.iloc[0:train_prices,:].values,prices.iloc[train_prices:len(prices),:1].values

however, I only get a single value rather than dataframes.The output is something of the sort code

training_size,test_size 

Output

11156746, 813724

I expect train and test dataframes that will enable me to go on with ML models.

Please assist

Derrick Kuria
  • 159
  • 1
  • 10
  • 1
    Your code looks reasonable. You should provide a [mcve] in order for us to more easily help you solve your problem. – piRSquared Jun 23 '22 at 05:15
  • @piRSquared, edited. Hope it's clearer now. – Derrick Kuria Jun 23 '22 at 05:25
  • a [mcve] includes an actual sample dataset such that I can run the sample myself. See, the problem is I have no idea what `prices` is and that could be the problem. But I can't know that unless you show me exactly how `prices` was created. That said, your probably better off reading these answers [https://stackoverflow.com/q/24147278/2336654] or [https://stackoverflow.com/q/38250710/2336654] – piRSquared Jun 23 '22 at 05:30

1 Answers1

1

since you didn't provide any reproducible example, I'll demonstrate on iris dataset.

from sklearn.datasets import load_iris
data = load_iris(as_frame=True)
data = data["data"]
data.head()

    sepal length (cm)   sepal width (cm)    petal length (cm)   petal width (cm)
0   5.1                 3.5                 1.4                 0.2
1   4.9                 3.0                 1.4                 0.2
2   4.7                 3.2                 1.3                 0.2
3   4.6                 3.1                 1.5                 0.2
4   5.0                 3.6                 1.4                 0.2

first, remove the .values from your code. values gives you the values as a list, but you want the output to be dataframe.

second, in test_data you took only the first column, since you used prices.iloc[train_prices:len(prices),:1] instead of prices.iloc[train_prices:len(prices),:] as you did in train_data.

so in order to get two dataframe outputs for train and test:

train_prices=int(len(data)*0.65)
test_prices=len(data)-train_prices
train_data,test_data=data.iloc[0:train_prices,:],data.iloc[train_prices:len(data),:]

btw, if you want to do some ML, check out sklearn train_test_split method.

nogmos
  • 859
  • 1
  • 8
  • 12