0

I have this dataset for agriculture raw materials from 1990 to 2017, and I am trying to make some price predictions for sake of learning:

enter image description here

Here are all the columns:

enter image description here

Now I want to split the dataset into training and test set, so I can apply some machine learning models into predicting, however it is not clear in my head what should be my target variable y, considering that each of the columns has their prices and they are all independent from each other. How should I be splitting this dataset if I wanted to make price prediction?

Unix
  • 91
  • 1
  • 14
  • 1
    Please **re-read** [How to ask](https://stackoverflow.com/help/how-to-ask), as it would seem that you missed some crucial points the first time you read it, namely "***DO NOT post images of code, data, error messages, etc.** - copy or type the text into the question*" (emphasis in the original). See why [an image of your code is not helpful](http://idownvotedbecau.se/imageofcode). – desertnaut Jun 15 '20 at 13:49
  • There's also this [specific post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) on how to ask questions involving pandas dataframes. – vlizana Jun 16 '20 at 04:20

1 Answers1

0

As I can see from your data, there are a couple of raw material prices available for prediction. Considering that these raw materials prices are independent of each other, you can create a dataset with just one dependent variable (for example Copra_Price) and the rest of the independent variables, removing other price-related variables from the data. Once you have this dataset, you can easily split into train and test using Copra_Price. This can be repeated for each of the price variables.

One more consideration is that, if none of the price variables has anomalies in them, then you could use any one of them to split the data as a random selection on one of them would in most probability be a random selection across the group.