0

After spread function I would like to copy non-NA values to new column. Is there any way to let data that is not NA be copied to new columns?

Data

Serial_ID   Repair_type    Col1        Col2         Coln+1
ID_1            Warranty    NA         02.02.2013   NA
ID_1            Normal      NA         15.10.2011   12.01.2012
ID_2            Warranty    01-01-2013 NA           NA
ID_2            Normal      NA         NA           18.12.2014
ID_n            Normal      NA         23.01.2014   NA

Desired result

Serial_ID   Repair_type    ColX (new)  ColX2 (new)   Col1      Col2         
ID_1            Warranty   02.02.2013 
ID_1            Normal     15.10.2011  12.01.2012
ID_2            Warranty   01-01-2013 
ID_2            Normal     18.12.2014
ID_n            Normal     23.01.2014   

Please see the data and result on image below:

enter image description here

Hope that makes it clearer. Thank you in advance.


Long data before spread

Data:

COMM_VIN    Si_DocDate  COMM_Kind   Cost
V1  2017-01-01  Normal  100
V1  2017-03-02  Warranty    200
V2  2015-04-04  Warranty    50
V2  2017-05-22  Warranty    100
V3  2004-05-22  Normal  150
V3  2016-06-01  Normal  250

I would like the dates of visits to the site to be moved to the column for the COMM_VIN variable depending on COMM_Kind

Results:

COMM_VIN    COMM_Kind   Col_ne1 Col_nen Cost(sum)
V1  Normal  2017-01-01      100
V1  Warramty    2015-04-04  2017-03-02  250
V2  Normal  2004-05-22  2016-06-01  400
V2  Warranty    2017-05-22      50

Sorry, I don't know how to add the table. Please see the attached picture:

enter image description here

Taazar
  • 1,545
  • 18
  • 27
Peter
  • 3
  • 3
  • 1
    Hey @Peter, if you post your original data can give an a go at providing an answer. You can use `dput(head(df, 20))` to provide a sample of it. In general using pictures of your data isn't the best at it makes it hard to reproduce then. See here for some info on asking questions https://stackoverflow.com/help/how-to-ask – NColl Dec 28 '18 at 00:29
  • 1
    To add to NColl's comment, there is a FAQ just about r. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – shea Dec 28 '18 at 00:32
  • This is probably more easily dealt with *before* the `spread`... Can you share the long data? – iod Dec 28 '18 at 00:39
  • Hi @iod, the post has been edited with previous data set (before spread). – Peter Dec 28 '18 at 17:08
  • Ummm.. There's no key column. How did you spread it? – iod Dec 28 '18 at 22:19
  • @iod did you see the edited post. Maybe there is a better way to spread those data set? Do you have any idea? I've tried to spread by additional index column [1..n]. That is the reason of a lot of NA columns and rows. – Peter Dec 29 '18 at 11:39
  • I revised my solution to fit your long data. Also, to add tables in a formatted manner, add four spaces in the beginning of every line, or, more easily, select the table after you pasted it and click on the curly-brackets icon in the icons bar on top of the text box. – iod Dec 29 '18 at 15:36
  • Dear @iod. I've tried to use your code, but there is an error Error in rank(x, ties.method = "first", na.last = "keep") : argument "x" is missing, with no default – Peter Dec 29 '18 at 17:58
  • Do you have dplyr and tidyr loaded? It sounds like it's using the wrong function, probably because you loaded plyr after dplyr. Try the code as it is now. – iod Dec 29 '18 at 19:34

2 Answers2

0

I think you want the coalesce() function from the dplyr package. I couldn't read in your data, but here's an example with dummy data:

library(dplyr)
df <- data_frame(
  c1 = c(NA, "hey", NA),
  c2 = c(NA, NA, "ho"),
  c3 = c("go", NA, NA)
)

df %>% mutate(colx = coalesce(c1, c2, c3))

Produces:

# A tibble: 3 x 4
  c1    c2    c3    colx 
  <chr> <chr> <chr> <chr>
1 NA    NA    go    go   
2 hey   NA    NA    hey  
3 NA    ho    NA    ho  
Nick DiQuattro
  • 729
  • 4
  • 7
  • Some of OP's rows have more than one non-NA columns, so this won't do -- see the desired output. – iod Dec 28 '18 at 01:01
  • @Nick DiQuattro what if I do not know how many loudspeakers are to be created? (depends on the date set) – Peter Dec 28 '18 at 19:02
0

This is actually easier to do from the long data, before you spread it:

dd %>% gather("key","value",-Serial_ID, -Repair_type) %>% 
 filter(!is.na(value)) %>% # reverse engineer original data (if the original had NAs, you'll need this row to remove them)
group_by(Serial_ID, Repair_type) %>% 
mutate(key=paste0("colx",row_number())) %>% # replace key with minimal number of keys
spread(key,value) # spread again

Result:

# A tibble: 5 x 4
# Groups:   Serial_ID, Repair_type [5]
  Serial_ID Repair_type colx1       colx2      
  <chr>     <chr>       <chr>      <chr>     
1 ID_1      Normal      15.10.2011 12.01.2012
2 ID_1      Warranty    02.02.2013 NA        
3 ID_2      Normal      18.12.2014 NA        
4 ID_2      Warranty    01-01-2013 NA        
5 ID_n      Normal      23.01.2014 NA      

If you would REALLY want to avoid all NAs, even if at the end of a row, you'll need to replace the NAs with empty strings. But I would advise against that.

Here's the same solution applied to the long data you provided:

dd %>% group_by(COMM_VIN,COMM_Kind) %>% 
    dplyr::mutate(Cost=sum(Cost),key=paste0("colx",row_number())) %>% 
    spread(key,Si_DocDate)

You'll note that I create the new cost sum column before the spread, to avoid creating multiple rows with the same COMM_VIN/Comm_Kind combination.

Result:

# A tibble: 4 x 5
# Groups:   COMM_VIN, COMM_Kind [4]
  COMM_VIN COMM_Kind  Cost colx1      colx2     
  <fct>    <fct>     <int> <fct>      <fct>     
1 V1       Normal      100 2017-01-01 NA        
2 V1       Warranty    200 2017-03-02 NA        
3 V2       Warranty    150 2015-04-04 2017-05-22
4 V3       Normal      400 2004-05-22 2016-06-01
iod
  • 7,412
  • 2
  • 17
  • 36