Group elements based on rownames and number these

Question

I certainly do hope that I clearly formulate my question.. So, I am trying to split a certain df in R. It is kind of a large df of which you can see the (shortened) structure below.

'data.frame': 36993 obs. of n variables:
$ klasse : num 1 1 1 1 1 1 1 1 1 1 ...
$ Start_time: chr "23:56:09.000" "23:56:09.000" "23:56:09.000" "23:56:09.000" ...
$ Start_date: Date, format: "2013-08-31" "2013-08-31" "2013-08-31" "2013-08-31" ...
$ Milk : num 23.5 23.5 23.5 23.5 23.5 23.5 23.5 23.5 23.5 23.5 ...
$ duur_visit: num 1048 1048 1048 1048 1048 ...

Now, what I am trying to do is to split the df in parts of 120 observations in df$klasse, which contains 39 different elements of different lengths (e.g. 1 = 1048 obs., 2 = 239 obs, etc.). Further, I would like to number these groups, which start every time at 1 for every new element of df$klasse..

I am a noob, and the furthest I got was to find out that I might have to work with the package stringr, however, I am not sure. Or use the function split, combined with lapply. There is so much information on the topic of splitting and grouping variables and I got lost. If someone could help me, or give me a nudge in the right direction I would be so grateful.

EDIT

The code of @shadow seems to be in the right direction, but it cuts of my dataset at the point where it is "out of" 120 observations.. So, I'll try to give an example of what I would like to accomplish :

> within df
klasse  grp   Start_time
1        1     2013-08-31 02:54:35.000
1        1     2013-08-31 02:54:35.000
1        2     2013-08-31 02:54:35.000
1        2     2013-08-31 02:54:35.000
1        3     2013-08-31 02:54:35.000
2        1     2013-08-31 08:36:13.000
2        1     2013-08-31 08:36:13.000
2        2     2013-08-31 08:36:13.000
2        2     2013-08-31 08:36:13.000
2        3     2013-08-31 08:36:13.000
2        3     2013-08-31 08:36:13.000
2        4     2013-08-31 08:36:13.000
3        1     2013-09-01 15:01:40.000
3        1     2013-09-01 15:01:40.000
4        1     2013-09-01 23:51:54.000

Ofcourse I shortened it to just 2 numbers per group, otherwise it would become way to large, but actually in the group column I would like groups with maximum 120 times the number 1, 2, and so on. I hope I cleared it a bit?

EDIT 2

Yeah, I ran it again this morning and the solution of @shadow just works fine! I do not know what I did wrong yesterday.

Please check this [link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). A good reproducible example will help others to tackle your question lot more easily. — CHP, Oct 28 '13 at 15:19

score 1 · Accepted Answer · answered Oct 28 '13 at 15:36

Here is the split, lapply way to do this. If df is your data.frame, then you can use split to get a list of data.frame, grouped by klasse:

lst <- split(df, df$klasse)
lst

Then you use lapply to further split into the number of observations you want:

nobs <- 120 # number of observations
l2 <- unlist(lapply(lst, function(x) {
  x$grp <- rep(1:(nrow(x)/nobs+1), each=nobs)[1:nrow(x)] # grouping 
  split(x, x$grp) # splitting
}), recursive=FALSE)

If I misunderstood your question, you may want to update it with a reproducible example, where you explicitly give some data and the expected output.

score 0 · Answer 2 · answered Oct 28 '13 at 15:37

The split function will allow you to create a list of dataframes separated on the basis of klasse.

myKlasse <- split(dfrm, dfrm$klasse)
myKlasse  <- lapply( myKlasse, function(df) {df$seqnum <- seq_along(rownames(df) )
                                             df }

`seq_along will number to rows of the dataframes. (thye may have already been numbered with rownames, since tht is the default, but this will make it a dataframe column as well.

Group elements based on rownames and number these

2 Answers2