1

I am trying to work with poorly formatted SAP data.

In this source data, when one of the variables ("Profile" in the example) has more entries, they are stacked. This creates an empty observation with only the same "ID" in the next row.

As such:

ID  Status  Product     Profile   Description
154 NOCO    3000        A1        failure           
215 ATCO    4000                  dfect     
164 NOCO    2000        A1        dfect
164                     A2
875 ATCO    3000                  failure
548 NOCO    2000        A1        dfect         
548                     A2
548                     A3
797 NOCO    3000                  failure       
444 ATCO    4000                  failure       

What I would like to do is to move these stacked values and move them to the next column.

ID  Status  Product Profile  Profile2   Profile3    Description
154 NOCO    3000    A1                              failure
215 ATCO    4000                                    dfect
164 NOCO    2000    A1       A2                     dfect
875 ATCO    3000                                    failure
548 NOCO    2000    A1       A2         A3          dfect
797 NOCO    3000                                    failure
444 ATCO    4000                                    failure

How would I go about doing this?

Thanks!

Edit:

Added dput of the first table above:

structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
    Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
    3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
    2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
"Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 
-10L))
Markus
  • 11
  • 3

2 Answers2

1

You can do this with tidyr...

require(tidyr)
df[df==""] <- NA #change your blanks to NAs
df2 <- df %>% fill(-ID) %>% #fill down missing values
              spread(key=Profile, value=Profile, sep="", fill="") #convert to wide format

df2
   ID Status Product Description ProfileA1 ProfileA2 ProfileA3
1 154   NOCO    3000     failure        A1                    
2 164   NOCO    2000       dfect        A1        A2          
3 215   ATCO    4000       dfect        A1                    
4 444   ATCO    4000     failure                            A3
5 548   NOCO    2000       dfect        A1        A2        A3
6 797   NOCO    3000     failure                            A3
7 875   ATCO    3000     failure                  A2          
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
0

A version which works without any packages. But the answers with zoo/tidyr are more elegant.

data = structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
                  548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
                                                          1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
           Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
                       3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
                                                            2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
           Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
                                     3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
                                                                                                                 "Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 


new.data = data[,c("ID","Status","Product","Description")]
new.data = new.data[-which(new.data$Status==""),]
for(i in 1:3){
   new.data[[paste0("Profile",i)]] = NA
}
for(i in 1:3){
  for(id in new.data$ID){
    new.data[which(new.data$ID==id),paste0("Profile",i)] =
         ifelse(sum(data[which(data$ID==id),"Profile"]==
                paste0("A",i))>0,paste0("A",i),"")
  }
}

This produces the data.frame new.data:

    ID Status Product Description Profile1 Profile2 Profile3
1  154   NOCO    3000     failure       A1                  
2  215   ATCO    4000       dfect                           
3  164   NOCO    2000       dfect       A1       A2         
5  875   ATCO    3000     failure                           
6  548   NOCO    2000       dfect       A1       A2       A3
9  797   NOCO    3000     failure                           
10 444   ATCO    4000     failure   
  • I like using the code without packages. However, this returns NA for all profile options.. – Markus Jul 11 '17 at 09:46
  • I tried it out again, this time with your provided data structure. It works without any problems. Did you copy & paste the whole code? Because your output would be generated without the second loop-block. – Benjamin Schlegel Jul 12 '17 at 16:09