0

I have 3 CSV files. train.csv - the training set, test.csv - the test set and sampleSubmission.csv - a sample submission file in the correct format. I am new in R. I don't know how to read it R. This is drive link for Dataset

https://drive.google.com/open?id=1YPw-MPlW7g2y19GT1ITy_fHbjrKBNc-M

  • 1
    Hi, this is a bit too broad a question just now. Can you try to break it down in to smaller questions (i.e. how to read in data, how to prepare data fro modelling, how to use decision tree, how to output results) and show what you have tried. The [r tag info](https://stackoverflow.com/tags/r/info) provides a bunch of introductions that will help to get you started. – user20650 Sep 09 '19 at 12:13
  • Ok Thanks, I will break it in parts – Akmal Masud Sep 09 '19 at 12:17
  • okay, but please do look at the intro docs i.e. from the *r tag link ^^* [R Data Import/Export](https://cran.r-project.org/doc/manuals/r-devel/R-data.pdf) details how to read in spreadsheet like data. – user20650 Sep 09 '19 at 12:20
  • you can read using df <- read.csv('your_file_path.csv') – Casper Sep 09 '19 at 12:23
  • I know how to read Files with R and How to split it Into Test and Train Data but These files Teacher gave us already in Train and test Data. and their Extra file I don't know how to use this file with test data for validation. – Akmal Masud Sep 09 '19 at 12:26
  • 1
    It seems like you're asking a different question now. You can post another question on working with the train & test data, but please first research other SO posts that might help and make sure your question is [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – camille Sep 09 '19 at 15:13

1 Answers1

0

Regarding to your comments I think you have to use the extra-file to put there the results of the decisions tree. A short commented way is given below.

dTest  <- read.csv("test.csv")  #Read in the datasets
dTrain  <- read.csv("train.csv")
dSub  <- read.csv("sub.csv")

dTrain$y <- as.logical(dTrain$y) #Change type of y to logical

library(rpart)
dtree <- rpart(y ~ . - id, data=dTrain) #Make decission tree

all(dSub$id == dTest$id) #Test of order of dSub$id is equal to dTest$id
#[1] TRUE

dSub$y  <- predict(dtree, newdata=dTest) #make prediction
head(dSub)
#     id          y
#1 38062 0.05454481
#2 40079 0.05454481
#3 39238 0.21288164
#4 36069 0.05454481
#5 40531 0.05454481
#6 38164 0.21288164
GKi
  • 37,245
  • 2
  • 26
  • 48