I see a lot of these [,something]
- filename.train <- train[indexes,]
- x <- dataset[,1:4]
The [,] syntax is used for indexing. Your dataset is a so-called data.frame which implies a rectangular shape and consists of rows and columns. You can index every value of your dataframe by indicating which row(s)/column(s) you want to be returned. This is done by using the [,] syntax: [rows you want, columns you want]. If you want all rows to be returned, you simply do not define any rows - you leave it blank. For example
dataset[,4]
returns the fourth column and all rows from your dataframe. You can also get multiple rows/columns by defining multiple indices in [,]. You can for example use 1:4 to get all the first 4 rows (1:4 is the syntax for a sequence from 1 to 4):
train[1:4,]
Note that this returns all columns since you did not specify any column indices.
You could also combine the indexing for rows and columns:
train[2:5, 7:9]
will return the rows 2-5 and columns 7-9. In general what the [,] does is called subsetting, because it generates a subset of columns and rows from your dataframe. Internally R calls a function called subset which does the actually subsetting.