r - move value to new column based on next row

Question

I am trying to work with poorly formatted SAP data.

In this source data, when one of the variables ("Profile" in the example) has more entries, they are stacked. This creates an empty observation with only the same "ID" in the next row.

As such:

ID  Status  Product     Profile   Description
154 NOCO    3000        A1        failure           
215 ATCO    4000                  dfect     
164 NOCO    2000        A1        dfect
164                     A2
875 ATCO    3000                  failure
548 NOCO    2000        A1        dfect         
548                     A2
548                     A3
797 NOCO    3000                  failure       
444 ATCO    4000                  failure

What I would like to do is to move these stacked values and move them to the next column.

ID  Status  Product Profile  Profile2   Profile3    Description
154 NOCO    3000    A1                              failure
215 ATCO    4000                                    dfect
164 NOCO    2000    A1       A2                     dfect
875 ATCO    3000                                    failure
548 NOCO    2000    A1       A2         A3          dfect
797 NOCO    3000                                    failure
444 ATCO    4000                                    failure

How would I go about doing this?

Thanks!

Edit:

Added dput of the first table above:

structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
    Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
    3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
    2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
"Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 
-10L))

Please provide the data in an easy-to-paste form and show your effort. — Roman Luštrik, Jul 11 '17 at 07:44
Possible duplicate of [Reshape data from long to wide format](http://stackoverflow.com/questions/5890584) ? — zx8754, Jul 11 '17 at 07:47
@zx8754 I think this question is different from the the one you linked, because it involves missing values, that are supposed to be incorporated as well — KenHBS, Jul 11 '17 at 07:50
@KenS. it says "possible", it needs a bit more work before it can be reshaped. — zx8754, Jul 11 '17 at 07:52
It looks like it might work if $Profile is reshaped based on grouping $ID? — Markus, Jul 11 '17 at 08:08

Andrew Gustar · Answer 1 · 2017-07-11T08:38:11.393

1

You can do this with tidyr...

require(tidyr)
df[df==""] <- NA #change your blanks to NAs
df2 <- df %>% fill(-ID) %>% #fill down missing values
              spread(key=Profile, value=Profile, sep="", fill="") #convert to wide format

df2
   ID Status Product Description ProfileA1 ProfileA2 ProfileA3
1 154   NOCO    3000     failure        A1                    
2 164   NOCO    2000       dfect        A1        A2          
3 215   ATCO    4000       dfect        A1                    
4 444   ATCO    4000     failure                            A3
5 548   NOCO    2000       dfect        A1        A2        A3
6 797   NOCO    3000     failure                            A3
7 875   ATCO    3000     failure                  A2

edited Jul 11 '17 at 08:38

answered Jul 11 '17 at 08:19

Andrew Gustar

17,295
1
22
32

1

We can use `tidyr::fill()`instead of `zoo::na.locf()` to stay within one package. – zx8754 Jul 11 '17 at 08:24
@zx8754 Thanks - I've updated the answer. Much neater! – Andrew Gustar Jul 11 '17 at 08:38
@AndrewGustar This is perfect! – Markus Jul 11 '17 at 09:03

Benjamin Schlegel · Answer 2 · 2017-07-12T16:07:41.653

A version which works without any packages. But the answers with zoo/tidyr are more elegant.

data = structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
                  548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
                                                          1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
           Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
                       3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
                                                            2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
           Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
                                     3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
                                                                                                                 "Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 


new.data = data[,c("ID","Status","Product","Description")]
new.data = new.data[-which(new.data$Status==""),]
for(i in 1:3){
   new.data[[paste0("Profile",i)]] = NA
}
for(i in 1:3){
  for(id in new.data$ID){
    new.data[which(new.data$ID==id),paste0("Profile",i)] =
         ifelse(sum(data[which(data$ID==id),"Profile"]==
                paste0("A",i))>0,paste0("A",i),"")
  }
}

This produces the data.frame new.data:

    ID Status Product Description Profile1 Profile2 Profile3
1  154   NOCO    3000     failure       A1                  
2  215   ATCO    4000       dfect                           
3  164   NOCO    2000       dfect       A1       A2         
5  875   ATCO    3000     failure                           
6  548   NOCO    2000       dfect       A1       A2       A3
9  797   NOCO    3000     failure                           
10 444   ATCO    4000     failure

I like using the code without packages. However, this returns NA for all profile options.. — Markus, Jul 11 '17 at 09:46
I tried it out again, this time with your provided data structure. It works without any problems. Did you copy & paste the whole code? Because your output would be generated without the second loop-block. — Benjamin Schlegel, Jul 12 '17 at 16:09

r - move value to new column based on next row

2 Answers2