1

I am new to R language.. I want to divide a data frame with window size and slide.and save all those into individual data frames.

For example, my "data frame has 20 rows". then I want to divide those rows with window size=5 and slide=3.. the desired output should be .. first 5 rows should be in one data frame thenenter image description here second data frame should contain 5 rows leaving first 3 rows of the data frame as slide =3.It means second data frame should contain las 2 rows of first dataframe and remaning 3 rows of the remainng dataframe.

my data frame.. desired output should be: enter image description here

sagar .rao
  • 11
  • 4
  • 1
    Welcome to StackOverflow. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. – lmo May 09 '17 at 14:11
  • you will need to update this question with a minimal reproducible example. – Dan May 09 '17 at 14:15
  • i just updated the images of how my data frame is and how the dsired output should be.....help is appreciated thanks in advance – sagar .rao May 09 '17 at 14:37

1 Answers1

0

You could do this as a list. I am creating different data frames based on which midpoint you are subsetting.

    mydata<-data.frame(dataelement1 = rep(1:20),dataelement2=letters[1:20]) ## create sample data with 20 rows
idx <- seq(from=3,to=(nrow(mydata)-3),by=3) ##identify midpoint for 20 rows with slide 5
for (i in idx){
    start = i-2; stop = i+2
    assign(paste0("newdata",i),mydata[start:stop,])
}

This yields the following 5 dataframes named with the midpoint index used.

> newdata3
  dataelement1 dataelement2
1            1            a
2            2            b
3            3            c
4            4            d
5            5            e
> newdata6
  dataelement1 dataelement2
4            4            d
5            5            e
6            6            f
7            7            g
8            8            h
> newdata9
   dataelement1 dataelement2
7             7            g
8             8            h
9             9            i
10           10            j
11           11            k
> newdata12
   dataelement1 dataelement2
10           10            j
11           11            k
12           12            l
13           13            m
14           14            n
> newdata15
   dataelement1 dataelement2
13           13            m
14           14            n
15           15            o
16           16            p
17           17            q

With List solution:

## Putting results into a single list object instead##
mylist<-list()
j=1
for (i in idx){
  start = i-2; stop = i+2
  mylist[[j]] <- mydata[start:stop,]
  j=j+1
}
## The index is sequentially indexed ##
mylist[[1]]
mylist[[5]]

> mylist[[1]]
  dataelement1 dataelement2
1            1            a
2            2            b
3            3            c
4            4            d
5            5            e
> mylist[[5]]
   dataelement1 dataelement2
13           13            m
14           14            n
15           15            o
16           16            p
17           17            q

EDIT: As requested. seq() creates a sequence. Since your first window is at 3, we tell it to start at 3. Since you want the window to move 3 units each time, we have by = 3. The end is defined by the number of rows in your data using nrow()'. We subtract 3 from that because we don't want a situation where the row number fails to have 2 more rows in themydata`.

I should have used -2 because row 18 would still have 2 rows ahead of it.

So this creates a vector idx which equals c(3,6,9,12,15) that we will use in the loop.

for (i in idx){ 
    start = i-2; stop = i+2  
    assign(paste0("newdata",i),mydata[start:stop,])
}

for (i in idx) {

This says to loop over every value contained in idx.

start = i-2; stop = i+2

So for the first value of idx=3, we define start=1 and stop=5, you window.

The last line defines a data set with prefix = newdata and with suffix equal to the value of idx we happen to be looping over. It then subsets your data set based on the values of start and stop defined on the line prior.

So first time through the loop, the last line resolves to: newdata3 <- mydata[1:5,] which is taking the 5 records of interest (with all columns).

There are various ways of sub-setting in R. This is a good reference.

mydata[1:5,1:2] would subset not only rows 1 thru 5, but also columns 1 and 2.

akaDrHouse
  • 2,190
  • 2
  • 20
  • 29
  • yeah it works.....Great thank u .....can you please eplain me how the seq formula and for loop working clearly......:-) – sagar .rao May 10 '17 at 09:31
  • @sagar.rao Done. If this works and is useful, please upvote and check as best answer accordingly. It lets others find solutions to their problems faster and is how the community functions. – akaDrHouse May 10 '17 at 12:19