1

I have this dataframe:

     id    a1    a2    b1    b2    c1    c2
  <int> <int> <int> <int> <int> <int> <int>
1     1    83    33    55    33    85    86
2     2    37     0    60    98    51     0
3     3    97    71    85     8    44    40
4     4    51     6    43    15    55    57
5     5    28    53    62    73    70     9
df <- structure(list(id = 1:5, a1 = c(83L, 37L, 97L, 51L, 28L), a2 = c(33L, 
0L, 71L, 6L, 53L), b1 = c(55L, 60L, 85L, 43L, 62L), b2 = c(33L, 
98L, 8L, 15L, 73L), c1 = c(85L, 51L, 44L, 55L, 70L), c2 = c(86L, 
0L, 40L, 57L, 9L)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

I want to: Combine columns with same starting character to one column by shifting each row of the second column by 1 down and naming the new column with the character of the two columns.

My desired output:

      id     a     b     c
   <dbl> <dbl> <dbl> <dbl>
 1     1    83    55    85
 2     1    33    33    86
 3     2    37    60    51
 4     2     0    98     0
 5     3    97    85    44
 6     3    71     8    40
 7     4    51    43    55
 8     4     6    15    57
 9     5    28    62    70
10     5    53    73     9

I have tried using lagfunction but I don`t know how to combine and shift columns at the same time!

To clarify a picture: enter image description here

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
TarJae
  • 72,363
  • 6
  • 19
  • 66

4 Answers4

5

You can use the following solution. I also have modified your data set an added an id column:

library(tidyr)

df %>%
  pivot_longer(!id, names_to = c(".value", NA), names_pattern = "([[:alpha:]])(\\d)")

# A tibble: 10 x 4
      id     a     b     c
   <int> <int> <int> <int>
 1     1    83    55    85
 2     1    33    33    86
 3     2    37    60    51
 4     2     0    98     0
 5     3    97    85    44
 6     3    71     8    40
 7     4    51    43    55
 8     4     6    15    57
 9     5    28    62    70
10     5    53    73     9
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
2

We can pivot_longer, remove the digits from name, then pivot_wider and unnest

library(stringr)
library(dplyr)
library(tidyr)

df %>% pivot_longer(cols = -id)%>%
        mutate(name=str_remove(name, '[0-9]'))%>%
        pivot_wider(names_from = name)%>%
        unnest(everything())

# A tibble: 10 x 4
      id     a     b     c
   <int> <int> <int> <int>
 1     1    83    55    85
 2     1    33    33    86
 3     2    37    60    51
 4     2     0    98     0
 5     3    97    85    44
 6     3    71     8    40
 7     4    51    43    55
 8     4     6    15    57
 9     5    28    62    70
10     5    53    73     9
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • 1
    Bravo my friend. – Anoushiravan R Jul 28 '21 at 16:22
  • 1
    thank you anoush. I still struggle quite a lot with pivoting issues (so many parameters to pivot_* ). Yours is definitely the neat way to go. – GuedesBF Jul 28 '21 at 16:27
  • 1
    It's ok. I once asked a similar question a couple months ago. Today I used it. I mean it's all about experience in my opinion and from now on you could use all that is mentioned here for future questions. However, your solution is very good too. – Anoushiravan R Jul 28 '21 at 16:30
2

Doing it as a pivot_longer(), then pivot_wider() is easier to read, but @Anoushiravan R's answer to more direct

library(tidyverse)

df %>% 
  rownames_to_column(var = "id") %>% # Add the id column
  pivot_longer(-id) %>% # Make long
  mutate(order = str_sub(name, -1), name = str_sub(name, 1, 1)) %>% # Breakout the name column
  pivot_wider(names_from = name) %>% # Make wide again
  select(-order) # Drop the ordering column
Tech Commodities
  • 1,884
  • 6
  • 13
2

I think ANoushiravan's solution is the tidiest way to do it. We could also use {dplyover} (disclaimer) for this:

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover

df %>% 
  group_by(id) %>% 
  summarise(across2(ends_with("1"),
                    ends_with("2"),
                    ~ c(.x,.y),
                    .names = "{pre}"),
            )
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.

#> # A tibble: 10 x 4
#> # Groups:   id [5]
#>       id     a     b     c
#>    <int> <int> <int> <int>
#>  1     1    83    55    85
#>  2     1    33    33    86
#>  3     2    37    60    51
#>  4     2     0    98     0
#>  5     3    97    85    44
#>  6     3    71     8    40
#>  7     4    51    43    55
#>  8     4     6    15    57
#>  9     5    28    62    70
#> 10     5    53    73     9

Created on 2021-07-28 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39