1

I am trying to rename values within a dataframe in R based on their location within the dataframe itself. Here is the dataframe that I am currently working with:-

df<-structure(list(UserID = c("hdyyu-192", "yeui-1893", "dnnd-1882", "nopr-738", "ieka-1728"), 
                   `05/06/2021` = c("Activity B", "Activity A", "Activity B", "Activity C", "Activity D"), 
                   `06/06/2021` = c("Activity B", "Activity A", "Activity A", "Activity C", "Activity D"), 
                   `07/06/2021` = c("Activity D","Activity A", NA, "Activity C", "Activity D"), 
                   `08/06/2021` = c("Activity A","Activity B", NA, "Activity B", "Activity A"), 
                   `09/06/2021` = c("Activity A","Activity B", NA, "Activity B", "Activity C"), 
                   `10/06/2021` = c("Activity C",NA, NA, NA, NA), 
                   `11/06/2021` = c("Activity B", NA, NA, NA, NA)), 
              class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,-5L))

head(df)

# A tibble: 5 x 8
#  UserID    `05/06/2021` `06/06/2021` `07/06/2021` `08/06/2021` `09/06/2021` `10/06/2021` `11/06/2021`
#  <chr>     <chr>        <chr>        <chr>        <chr>        <chr>        <chr>        <chr>       
#1 hdyyu-192 Activity B   Activity B   Activity D   Activity A   Activity A   Activity C   Activity B  
#2 yeui-1893 Activity A   Activity A   Activity A   Activity B   Activity B   NA           NA          
#3 dnnd-1882 Activity B   Activity A   NA           NA           NA           NA           NA          
#4 nopr-738  Activity C   Activity C   Activity C   Activity B   Activity B   NA           NA          
#5 ieka-1728 Activity D   Activity D   Activity D   Activity A   Activity C   NA           NA  

And here is the result that I would like to achieve:-

# A tibble: 5 x 8
#  UserID    `05/06/2021`       `06/06/2021`       `07/06/2021`       `08/06/2021`       `09/06/2021`       `10/06/2021`       `11/06/2021`      
#  <chr>     <chr>              <chr>              <chr>              <chr>              <chr>              <chr>              <chr>             
#1 hdyyu-192 Activity B event_1 Activity B event_2 Activity D event_3 Activity A event_4 Activity A event_5 Activity C event_6 Activity B event_7
#2 yeui-1893 Activity A event_1 Activity A event_2 Activity A event_3 Activity B event_4 Activity B event_5 NA                 NA                
#3 dnnd-1882 Activity B event_1 Activity A event_2 NA                 NA                 NA                 NA                 NA                
#4 nopr-738  Activity C event_1 Activity C event_2 Activity C event_3 Activity B event_4 Activity B event_5 NA                 NA                
#5 ieka-1728 Activity D event_1 Activity D event_2 Activity D event_3 Activity A event_4 Activity C event_5 NA                 NA      

As you can see within the desired outcome, the term "event_" followed by the column number (skipping the first column) is pasted after the value in each corresponding cell. Is there a solution that would allow me to achieve this with my original dataframe? Many thanks :)

metaltoaster
  • 380
  • 2
  • 15

1 Answers1

1

The part that match the event number to the index of column is borrowed from this question: Get column index from label in a data frame.

I will be using a dplyr approach.

Use mutate() in everything column except the UserID. If the value in the column is NA, do nothing. If it's not NA, paste() the value in the column with "event_indexOfColumn", which is returned by using grep() and cur_column().

Note we have to subtract the "column index" by 1, since column 1 is not used to generate the column index.

df %>% mutate(across(-UserID, ~ ifelse(
  is.na(.x),
  .x, paste0(.x, " event_", as.numeric(grep(
    cur_column(), colnames(df)
  ) - 1))
),
.names = "{.col}"))

Output

You will see NA in the results as in your desired output (I'm using clipr::write_clip() to copy the output here, which removes the NA).

Or anyone could recommend how to include output with lots of columns?

UserID  05/06/2021  06/06/2021  07/06/2021  08/06/2021  09/06/2021  10/06/2021  11/06/2021
hdyyu-192   Activity B event_1  Activity B event_2  Activity D event_3  Activity A event_4  Activity A event_5  Activity C event_6  Activity B event_7
yeui-1893   Activity A event_1  Activity A event_2  Activity A event_3  Activity B event_4  Activity B event_5      
dnnd-1882   Activity B event_1  Activity A event_2                  
nopr-738    Activity C event_1  Activity C event_2  Activity C event_3  Activity B event_4  Activity B event_5      
ieka-1728   Activity D event_1  Activity D event_2  Activity D event_3  Activity A event_4  Activity C event_5      

dput() result FYI

structure(list(UserID = c("hdyyu-192", "yeui-1893", "dnnd-1882", 
"nopr-738", "ieka-1728"), `05/06/2021` = c("Activity B event_1", 
"Activity A event_1", "Activity B event_1", "Activity C event_1", 
"Activity D event_1"), `06/06/2021` = c("Activity B event_2", 
"Activity A event_2", "Activity A event_2", "Activity C event_2", 
"Activity D event_2"), `07/06/2021` = c("Activity D event_3", 
"Activity A event_3", NA, "Activity C event_3", "Activity D event_3"
), `08/06/2021` = c("Activity A event_4", "Activity B event_4", 
NA, "Activity B event_4", "Activity A event_4"), `09/06/2021` = c("Activity A event_5", 
"Activity B event_5", NA, "Activity B event_5", "Activity C event_5"
), `10/06/2021` = c("Activity C event_6", NA, NA, NA, NA), `11/06/2021` = c("Activity B event_7", 
NA, NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L))
benson23
  • 16,369
  • 9
  • 19
  • 38