1

I have a dataframe:

ID   Value    Name    Score  Card_type  Card_number
1      NA     John     242      X           23
1     124     John      NA      X           23
1     124     John     242      Y           25
1     124     NA       242      Y           NA
2      55     Mike      NA      X           11
2      55     NA       431      X           11
2      55     Mike     431      Y           14
2      NA     Mike     431      Y           14

As you see, there are IDs and each of them has two groups (Card_type) for column Card_number. Also as you see, some rows with same ID and Card_type have same missing values in some columns. What I want to get is, to make each ID be one row with filled columns. And column Card_number must be split into two columns Card_number_type_X and Card_number_type_X and column Card_type must be removed.

So the desired result must look like this:

ID   Value    Name    Score   Card_number_type_X  Card_number_type_Y
1     124     John     242             23                   25
2      55     Mike     431             11                   14                 

How could I do that?

henhesu
  • 756
  • 4
  • 9
  • They are multiple ways to do this, see `tidyr::pivot_wider()` and `stats::reshape()` you may also have a look at `reshape2` package. Here the problem is how do you want to deal with NA? – Paul Sep 21 '20 at 07:52
  • Maybe [this](https://stackoverflow.com/a/45639057/10264278) has some tips that could help you. – Paul Sep 21 '20 at 08:23

1 Answers1

0

One way would be to fill the missing values in each ID and then get data in wide format keeping only the unique values.

library(dplyr)
library(tidyr)

df %>%
  group_by(ID) %>%
  fill(everything(), .direction = 'updown') %>%
  pivot_wider(names_from = Card_type, values_from = Card_number, 
              values_fn = unique, names_prefix = 'Card_number_type_')

#     ID Value Name  Score Card_number_type_X Card_number_type_Y
#  <int> <int> <chr> <int>              <int>              <int>
#1     1   124 John    242                 23                 25
#2     2    55 Mike    431                 11                 14

It seems original data is not the same as shared data in which case we can try :

df %>%
    group_by(ID) %>%
    fill(everything(), .direction = 'updown') %>%
    distinct() %>%
    group_by(ID, Value, Name, Score) %>%
    mutate(row = row_number()) %>%
    pivot_wider(names_from = Card_type, values_from = Card_number, 
                names_prefix = 'Card_number_type_')

data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA, 
124L, 124L, 124L, 55L, 55L, 55L, NA), Name = c("John", "John", 
"John", NA, "Mike", NA, "Mike", "Mike"), Score = c(242L, NA, 
242L, 242L, NA, 431L, 431L, 431L), Card_type = c("X", "X", "Y", 
"Y", "X", "X", "Y", "Y"), Card_number = c(23L, 23L, 25L, NA, 
11L, 11L, 14L, 14L)), class = "data.frame", row.names = c(NA, 
-8L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thanks, with my original data i get this error: Error in `$<-.data.frame`(`*tmp*`, "val", value = c("24181", : replacement has 1304 rows, data has 1077 . Problem appears on pivot_wider part –  Sep 21 '20 at 07:46
  • Can you try the update answer and see if it works on your original data? – Ronak Shah Sep 21 '20 at 07:50
  • with new code column Card_number_type_NA appears and also unnecessary column row –  Sep 21 '20 at 07:55
  • You can remove the column that you don't want by doing `ungroup %>% select(-row, -Card_number_type_NA)`. – Ronak Shah Sep 21 '20 at 08:11