0

I have a data frame of the following type:

enter image description here

I need to create a separate column that would include the last variables from each row starting with the column V9, i.e. 15:32, 13:44, 16:37, 15:31, NULL, NULL, 16:10, 16:22 etc. If it is easier, I can live with removing the empty rows (in this case 5 and 6). I tried a combination of which.max, length and apply, but the output did not make sense. So I have no idea what to do next. Thanks for help.

Akhil Nair
  • 3,144
  • 1
  • 17
  • 32
Vasile
  • 1,017
  • 2
  • 10
  • 19
  • 2
    Please don't use image to show data. It is better to use `dput`. Do you have columns V4 to V8? – akrun Jul 14 '15 at 09:58
  • Please `dput` a piece of your code. this image is useless and no one user will type manually the data by coping the picture. – SabDeM Jul 14 '15 at 09:59
  • I tried to use dput, but have not succeeded. I looked on the forum, found something, followed it but did not get any decent results. The structure was like half a page if no more, and when I tried to rebuild it, it did not produce the initial data frame, so I decided to use an image, thinking that the reproducible data for the question was not as important. I was actually was going to ask for a step by step tutorial on how to use dput. – Vasile Jul 14 '15 at 10:01
  • This [link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) might be useful – akrun Jul 14 '15 at 10:03
  • Would you not do the apply on a subset `df` for columns `V9` onwards first? (Now irrelevant, corrected below) – Akhil Nair Jul 14 '15 at 10:09
  • I think my earlier comment is not correct. It should be `apply(df1[paste0('V', 9:11)], 1, function(x) if(any(x!='')) tail(x[x!=''],1) else '') #[1] "15:32" "13:44" "16:37" "15:31" "" "" "16:10" "16:22" "16:21" [10] "15:34" "16:26"` – akrun Jul 14 '15 at 10:11
  • @akrun This gives me the row.names. I followed the link you provided and used dput(df). But it gives me like 13 A4 pages. So something is wrong. – Vasile Jul 14 '15 at 10:13
  • @Vasile Try the code I just pasted on the comments. – akrun Jul 14 '15 at 10:14
  • @arkun, tried it and it works, in the sense that it selects the last variable in the rows. Could you suggest how to arrange it in a column in the same data frame or at least as a separate vector that I can merge with the existing df. I assigned it to an object but when I try to view it it gives me:Error in View : arguments imply differing number of rows: 1, 0 – Vasile Jul 14 '15 at 10:23
  • You can just `cbind` with the original dataset. Suppose if `v1 <- apply(df1[paste0...`; cbind(df1, v1)` I updated the post – akrun Jul 14 '15 at 10:27

2 Answers2

2

We could use max.col. Subset the columns 'V9' to 'V11'. Then, use max.col to get the column index of elements that are not blank. In case of 'ties', there is an optional argument in 'max.col' i.e. ties.method to specify either 'first', 'last' or 'random'. The default option is 'random'. Here, I am using 'last' as the option. Then we cbind with the sequence of 'row' to create 'row/column' index and extract the values from 'dfN'.

dfN <- df1[paste0('V', 9:11)]
new <- dfN[cbind(1:nrow(dfN),max.col(dfN!='', 'last'))]
new  
#[1] "15:32" "13:44" "16:37" "15:31" ""      ""      "16:10" "16:22" "16:21"
#[10] "15:34" "16:26"

cbind(dfN, new)
#     V9   V10   V11   new
#1  15:32             15:32
#2        13:44       13:44
#3  16:37             16:37
#4  15:31             15:31
#5                         
#6                         
#7  12:07 12:32 16:10 16:10
#8  12:09 12:36 16:22 16:22
#9  12:06 12:35 16:21 16:21
#10 12:08 12:26 15:34 15:34
#11 12:35 13:00 16:26 16:26

Or we can use apply

apply(dfN, 1, function(x) if(any(x!='')) tail(x[x!=''],1) else '')
#[1] "15:32" "13:44" "16:37" "15:31" ""      ""      "16:10" "16:22" "16:21"
#[10] "15:34" "16:26"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks a lot. This worked perfectly. I was able to do what I needed. And thanks a lot for your explanations. – Vasile Jul 14 '15 at 10:28
  • @Vasile No problem. BTW, why did you select columns V9 to V11 specifically. Is it because blanks starts from V9? – akrun Jul 14 '15 at 10:29
  • 1
    @ arkun These columns indicate the booking times. I am interested to calculate the difference between the first booking and the last one (which indicate the time arriving and time leaving). Since people have the possibility to swipe out (lunch etc.) and in during the day, I have the last booking positioned in different columns. – Vasile Jul 14 '15 at 15:15
0

This is not elegant, but it should work:

output <- rep(NA, nrow(df))
for (i in 1:nrow(df)) output[i] = rev(na.omit(t(df[i,])))[1]
unlist(output)

For each row, you transpose it into a column, omit missing values, reverse it, and then return the first value.

I used this for test data:

a <- seq(7)
b <- c(1, NA, 1, NA, 2, NA, 2)
c <- c(2, 3, NA, NA, 4, NA, NA)
df <- data.frame(rbind(a, b, c))

And here is the output of that process:

> unlist(output)
[1] 7 2 4
ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • Thanks ulfelder. I tried your solution something did not work. The output is a number of " " (corresponding to the number of rows). I think I might have screwed something. But anyway thanks a lot for your effort. – Vasile Jul 14 '15 at 15:19
  • I doubt you're doing anything wrong; it's that I used NA instead of " " for missing values in my play data. It would be possible to modify this approach to work for " ", but why bother when you've already got a solution? – ulfelder Jul 14 '15 at 15:47