Randomly sampling rows from particular months in a data set

Question

I was given this task in R:

"Randomly select 10 trading days from each of the following months: January 2019 to June 2019 (6 months total)".

I have a CSV file of a company's stock trading history from the last 5 years (dates, opening price, closing price, changes, etc.) that I imported into R using this code (reading the file; setting date format; extracting all 6 relevant months):

SHAPIRENG5YEARS <- read.csv(file="C:\\Users\\Ron\\OneDrive\\5year.csv", header=TRUE, sep=",") #Choosing Shapir Engineering stock (last 5 years)
SHAPIRENG5YEARS$Date = as.Date(as.character(SHAPIRENG5YEARS$Date), format = "%d/%m/%Y")
January19=SHAPIRENG5YEARS[(SHAPIRENG5YEARS$Date > "2019-01-01" & SHAPIRENG5YEARS$Date < " 2019-01-31" ) ,]
February19=SHAPIRENG5YEARS[(SHAPIRENG5YEARS$Date > "2019-02-03" & SHAPIRENG5YEARS$Date < " 2019-02-28" ) ,]
March19=SHAPIRENG5YEARS[(SHAPIRENG5YEARS$Date > "2019-03-09" & SHAPIRENG5YEARS$Date < " 2019-03-31" ) ,]
April19=SHAPIRENG5YEARS[(SHAPIRENG5YEARS$Date > "2019-04-01" & SHAPIRENG5YEARS$Date < " 2019-04-30" ) ,]
May19=SHAPIRENG5YEARS[(SHAPIRENG5YEARS$Date > "2019-05-01" & SHAPIRENG5YEARS$Date < " 2019-05-30" ) ,]
June19=SHAPIRENG5YEARS[(SHAPIRENG5YEARS$Date > "2019-06-02" & SHAPIRENG5YEARS$Date < " 2019-06-30" ) ,]

Now I don't know what should I do. I can sample one month using

January19sample <-January19[sample(nrow(January19), 10), ]

but I want to avoid doing this six times (once for each month). Ideally I'd like to sample all 10*6=60 values from the original big data frame.

Edit: I'm still struggling. I tried this (It is not good because I'm getting a list of 6 lists, each with length of 18 and not random 10 picks):

SamplesOfMonths=list(c(January19),c(February19),c(March19),c(April19),c(May19),c(June19))
TopSamples=c(1:10)
LowSamples=c(1:10)
for (i in 1:6)
{
    Changer=unlist(SamplesOfMonths[i])
    TopSamples[i]=sample(Changer, 10)[2]
    LowSamples[i]=sample(Changer, 10)[1]
    print(sample(Changer, 10))
}

What have you tried so far? Were you given any information by your instructor about how to approach this problem? Getting homework help is allowed on SO, but we want to help you solve specific problems, not do your homework for you ... — Ben Bolker, Jan 13 '20 at 13:23
Just need help. my instructor does not respond to messages of any of the students. i didn't get the fist answer so i'm stuck right now. this is the first step before trying to do analysis of variance — Ron, Jan 13 '20 at 21:35
your comment below @mrhellmann's answer shows that you've made a reasonable start. Can you please edit your question to include that bit of code and question? ("I tried [something], but [I have this specific problem]" is much better for getting answers from StackOverflow than "my instructor is unhelpful and I'm desperate" ...) (For completeness, please all include the code that you used to define `January19` ... also see [this question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Ben Bolker, Jan 13 '20 at 21:41
Haha i get you. this is my first time here and i'm terrible in coding and my english is not perfect. I edited my Q and have a look on this like you added. — Ron, Jan 13 '20 at 21:59

mrhellmann · Answer 1 · 2020-01-13T13:41:23.083

You can use the sample() function and bracketted [ subsetting.

mtcars[sample(1:nrow(mtcars), size = 10, replace = FALSE),]
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4

Broken down, step by step:

rows_in_data <- nrow(mtcars)
rows_in_data
#> [1] 32

# Sample from 1 to number of rows, selecting some using `size = ` argument
index_of_random_rows <- sample(1:rows_in_data, size = 10, replace = FALSE)

#use bracketted subsetting data[rows, columns]
mtcars[index_of_random_rows, ]
#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> Honda Civic       30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1

^{Created on 2020-01-13 by the reprex package (v0.3.0)}

Using map:

#custom function
my_samples <- function(df = mtcars, num_rows = 10){
  sample(1:nrow(df), size = num_rows)
}

purrr::map(list(mtcars, iris), my_samples)
#> [[1]]
#>  [1] 16  1  4 22 30  2 21 14 23 10
#> 
#> [[2]]
#>  [1]  73  31 112   1  43  91  87  23  19  16

^{Created on 2020-01-13 by the reprex package (v0.3.0)}

this is how i really used to do it: `January19smaple <-January19[sample(nrow(January19), 10), ]` but i want to avoid doing this 6 time (each month). can i do is in a loop? — Ron, Jan 13 '20 at 13:24
@Ron See edit for using `map()` to get a list of rows for each data.frame. — mrhellmann, Jan 13 '20 at 13:41
Sorry but i didn't get what you did there with `map()`. I edited my Q to be more specific :) — Ron, Jan 13 '20 at 21:55

Randomly sampling rows from particular months in a data set

1 Answers1