How to calculate the average of random data in R

Question

I'm new to R. I have a large file with multiple columns and I've been asked to split the data into 2 parts. I have R split the data randomly by 70% into a group called nTrain, and 30% into a group called nTest.
I was able to split the data randomly, but I now need to calculate the AVERAGE of a specific column in the 70% random data and do the same for the 30% random data. Can someone please explain how to do so?

Thanks.

If it helps understand my situation, this is what I have so far in R:

length(DataFile)

(nData=nrow(DataFile))

DataFile

set.seed(0)

(trainIdx<- sample(seq(1,nrow(DataFile)), floor(nrow(DataFile)*0.70)))

> (nTrain=length(trainIdx))
[1] 15129

> (nTest=nData-nTrain)
[1] 6484

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Ronak Shah, Nov 24 '18 at 08:00
Thanks for the advice Ronak. I will read the info on how to ask a good question and how to give a reproducible example. — R Newbie, Nov 25 '18 at 06:28

Roman · Answer 1 · 2018-11-24T12:52:56.717

0

Welcome to Stackoverflow!

In R convention you should stick to the <- operator for most types of assigments (you can find more info here and here).
The code/output you posted is incomplete, really (e.g., the output after the first line, length(DataFile), is missing).

Let's go through this step by step.

1. Create mock data

set.seed(1701)
DataFile <- sample(seq(0, 1, 0.01), 10000, replace = TRUE)

2. Create a dataset

# This randomizes the order
DataSet <- sample(DataFile)

3. Split Train and Test

split <- length(DataSet) * 0.7
# You use length() for one-dimensional objects, and
# nrow() for matrices, tables, etc.

DataTrain <- head(DataSet, split)
DataTest <- tail(DataSet, length(DataSet) - split)

# This approach avoids rounding errors when splitting and
# as our dataset is already randomized we can sample linearly.

4. Calculate average

> mean(DataTrain)
[1] 0.5029891
> mean(DataTest)
[1] 0.496056

edited Nov 24 '18 at 12:52

answered Nov 24 '18 at 10:35

Roman

4,744
2
16
58

Thanks Roman for the step-by-step instructions. I will try your process and let you know if I can get it to work. – R Newbie Nov 25 '18 at 01:09
the file that I'm pulling the data from has 20 columns with headers. I need to pull the average of only 1 of the columns, for only 70% of the data. Can you explain how I can do this? I appreciate your help! – R Newbie Nov 25 '18 at 01:38
Can you post a `dput(head(data))` into your original post and specify the column? As a general approach, @AdamB showed the right method if you work with table-shaped data. – Roman Nov 25 '18 at 01:42

How to calculate the average of random data in R

1 Answers1

1. Create mock data

2. Create a dataset

3. Split Train and Test

4. Calculate average