0

I have the following variables as part of a bigger data set. And each 3 successive variables measured the same (e.g. the first 3 variables c_0064, c_0065, c_0066 measured 3 brands the participant knows, the second 3 variables v_159_1, v_159_2, v_159_3 measured the attitude the participant has on each of the before mentioned brands, and so on. I displayed just the first and last columns in the data set. After the column v_159_3 it continues actually with v_160_1, v_160_2, v_160_3, v_161_1... until it reaches the columns v_182_1, v_182_2, v_182_3.

structure(list(lfdn = c(4, 6, 7, 8, 9, 11, 12, 19), c_0064 = c("x", 
"t", "x", "x", "t", "x", "z", "z"), c_0065 = c("z", "z", "z", 
"f", "f", "f", "t", "t"), c_0066 = c("x", "x", "x", "a", "f", 
"t", "z", "b"), v_159_1 = c(1, 1, 3, 2, 2, 5, 4, 3), v_159_2 = c(3, 
3, 3, 3, 3, 2, 5, 1), v_159_3 = c(5, 5, 1, 4, 4, 1, 2, 2), v_182_1 = c(1, 
1, 5, 5, 4, 4, 4, 4), v_182_2 = c(4, 2, 2, 2, 2, 3, 1, 5), v_182_3 = c(5, 
4, 5, 1, 2, 5, 2, 2)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))


> df
# A tibble: 8 x 10
   lfdn c_0064 c_0065 c_0066 v_159_1 v_159_2 v_159_3 v_182_1 v_182_2 v_182_3
  <dbl> <chr>  <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     4 x      z      x            1       3       5       1       4       5
2     6 t      z      x            1       3       5       1       2       4
3     7 x      z      x            3       3       1       5       2       5
4     8 x      f      a            2       3       4       5       2       1
5     9 t      f      f            2       3       4       4       2       2
6    11 x      f      t            5       2       1       4       3       5
7    12 z      t      z            4       5       2       4       1       2
8    19 z      t      b            3       1       2       4       5       2

And the goal is to restructure/stack always the 3 similar columns like the following:

structure(list(lfdn = c(4, 6, 7, 8, 9, 11, 12, 19, 4, 6, 7, 8, 
9, 11, 12, 19, 4, 6, 7, 8, 9, 11, 12, 19), c_0064_65_66 = c("x", 
"t", "x", "x", "t", "x", "z", "z", "z", "z", "z", "f", "f", "f", 
"t", "t", "x", "x", "x", "a", "f", "t", "z", "b"), v_159_1_2_3 = c(1, 
1, 3, 2, 2, 5, 4, 3, 3, 3, 3, 3, 3, 2, 5, 1, 5, 5, 1, 4, 4, 1, 
2, 2), v_181_1_2_3 = c(1, 1, 5, 5, 4, 4, 4, 4, 4, 2, 2, 2, 2, 
3, 1, 5, 5, 4, 5, 1, 2, 5, 2, 2)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))

> dflong
# A tibble: 24 x 4
    lfdn c_0064_65_66 v_159_1_2_3 v_181_1_2_3
   <dbl> <chr>              <dbl>       <dbl>
 1     4 x                      1           1
 2     6 t                      1           1
 3     7 x                      3           5
 4     8 x                      2           5
 5     9 t                      2           4
 6    11 x                      5           4
 7    12 z                      4           4
 8    19 z                      3           4
 9     4 z                      3           4
10     6 z                      3           2
# ... with 14 more rows

I failed already to melt the data, so the only procedure that came to my mind was to use stack command and stack every following 3 variables like stack(df, select=c("c_0064", "c_0065", "c_0066")) and then put these stacked variables together in the end. But I hope there is a much more economical way to do it since I have much more "repeating" variables in the data set beside the displayed ones.

Nadine M.
  • 95
  • 2
  • 10

1 Answers1

1

You can use pivot_longer with names_pattern. Based on the column names in your data use the pattern to accurately capture the column names.

tidyr::pivot_longer(df, cols = -lfdn, 
                    names_to = '.value', names_pattern = '(c|[a-z]_\\d+)')

#    lfdn c     v_159 v_182
#   <dbl> <chr> <dbl> <dbl>
# 1     4 x         1     1
# 2     4 z         3     4
# 3     4 x         5     5
# 4     6 t         1     1
# 5     6 z         3     2
# 6     6 x         5     4
# 7     7 x         3     5
# 8     7 z         3     2
# 9     7 x         1     5
#10     8 x         2     5
# … with 14 more rows
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much @Ronak Shah - for the reproducable example, it worked well. Could you nevertheless please let me know how the formula does change when the words "brand" is not included in each of the variables? In my real data set, the variable names are: lfdn, c_0064, c_0065, c_066, v_159_1, v_159_2, v_159_3, ....., v_174_1, v_174_2, v_174_3. Excuses for the inconvenience. – Nadine M. Aug 26 '21 at 07:14
  • So you should have different columns for `v` and `c` ? Try `tidyr::pivot_longer(df, cols = -lfdn, names_to = c('.value', 'col'), names_pattern = '(\\w+)_(.*)')` – Ronak Shah Aug 26 '21 at 07:16
  • Thank you @Ronak Shah, it almost works. The only thing is that it gives me 2418 instead of 1209 rows because it produces a column "c" that shows "0064", "0065", "0066", "1", "2", "3", "0064", "0065", "0066", "1", "2", "3", "0064", "0065", "0066", "1", "2", "3", and so on. So the rows with "1", "2", "3" should actually not appear. – Nadine M. Aug 26 '21 at 07:24
  • Umm...Try this - `tidyr::pivot_longer(df, cols = -lfdn, names_to = c('.value', 'col'), names_pattern = '([a-z]+)_(\\d+)')` If this doesn't work then you may need to update your post with your actual example. – Ronak Shah Aug 26 '21 at 07:27
  • Thank you again @Ronak Shah, your first example worked better than the second one, but I now revised my post to facilitate finding the right solution – Nadine M. Aug 26 '21 at 07:57
  • 1
    Do you need the column names as shown exactly? It's really about finding the right regex based on your column names. Would this work ? `tidyr::pivot_longer(df, cols = -lfdn, names_to = '.value', names_pattern = '(c|[a-z]_\\d+)')` – Ronak Shah Aug 26 '21 at 08:27
  • Yeah thanks so much for all the effort, this is exactly what I was looking for! – Nadine M. Aug 26 '21 at 10:06