1

I have this simplified data situation:

df <- data.frame(Year = 1:3,
                 A.x = 4:6,
                 A.y = 7:9,
                 A.z = 10:12,
                 B.x = 4:6,
                 B.y = 7:9,
                 B.z = 10:12,
                 C.x = 4:6,
                 C.y = 7:9,
                 C.z = 10:12)

My goal is this:

image.png

df_goal <- data.frame(Year = rep(1:3, each = 3),
                 x = rep(4:6, each = 3),
                 y = rep(7:9, each = 3),
                 z = rep(10:12, each = 3))

Please note that these values are arbitrary, only the column names, pattern, and the order of the target Year matter.

I tried to create the target dataframe "manually", e.g.: column i contains values from positions x to z of the original data set. But it looks really clumsy since the data I'm dealing with is huge. Are there better ways to reshape the data more systematically?

Thank you so much!

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
Theodore
  • 13
  • 4

1 Answers1

1

We may use pivot_longer

library(dplyr)
library(tidyr)
 df %>% 
   pivot_longer(cols = -Year, names_to = ".value", 
      names_pattern = "^[^.]+\\.(.*)")

# A tibble: 9 × 4
   Year     x     y     z
  <int> <int> <int> <int>
1     1     4     7    10
2     1     4     7    10
3     1     4     7    10
4     2     5     8    11
5     2     5     8    11
6     2     5     8    11
7     3     6     9    12
8     3     6     9    12
9     3     6     9    12
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you so much, that's what I need. Could you please also show me what should be the regular expression if the column names go like this: aB_1_1, aB_1_2,...aB_1_6, aB_2_1, aB_2_2..., aB_12_6? – Theodore Oct 06 '21 at 23:28
  • @Theodore what would be the column you need as output. If you want to get the lower case 'a' as column `names_pattern = "^([a-z])[A-Z]_.*"`. if it is both aB, then do you need to capture `"^([a-z][A-Z])_.*"` – akrun Oct 06 '21 at 23:29
  • I need to capture them both, also with the digits that come after, just like in the example. Thank you a lot! – Theodore Oct 06 '21 at 23:37
  • @Theodore do you want `aB_1`, `aB_2` as column names?? then do `names_pattern = "^([a-z][A-Z]_\\d+)_\\d+"` – akrun Oct 06 '21 at 23:41
  • 1
    Yes, you're amazing! I can finally go to sleep now. Many thanks ^^ – Theodore Oct 06 '21 at 23:46
  • Just one more question out of curiosity, Sir. What if I want to capture `aB_..._1`, `aB_..._2`, `aB_..._3`, `aB_..._4`, `aB_..._5`, `aB_..._6` (only the last digit)? – Theodore Oct 07 '21 at 00:04
  • Do you want aB1, aB2, ... Or if you want only the last digit/s `".*_(\\d+)$"` – akrun Oct 07 '21 at 00:05
  • aB_..._1, aB_..._2, aB_..._3, aB_..._4, aB_..._5, aB_..._6 – Theodore Oct 07 '21 at 00:06
  • Sorry. Let me type again – Theodore Oct 07 '21 at 00:07
  • Sorry your question is not clear. Your were inputting the same input as before – akrun Oct 07 '21 at 00:07
  • can you show what columns you expect as output – akrun Oct 07 '21 at 00:11
  • Column name (either is fine, I can change latter. I only care about the values): aB_..._1, aB_..._2, aB_..._3, aB_..._4, aB_..._5, aB_..._6 or aB_1, aB_2, aB_3, aB_4, aB_5, aB_6 Value: aB_1_1, aB_2_2, aB_1_3, aB_1_4, aB_1_5, aB_1_6 aB_2_1, aB_2_2, aB_2_3, aB_2_4, aB_2_5, aB_2_6 aB_3_1, aB_3_2, aB_3_3, aB_3_4, aB_3_5, aB_3_6 .... aB_12_1, aB_12_2, aB_12_3, aB_12_4, aB_12_5, aB_12_6 – Theodore Oct 07 '21 at 00:11
  • In the output, what would be column names i.e. if your input column is `aB_3_6` do you expect `aB6` as column name – akrun Oct 07 '21 at 00:12
  • Values: `aB_1_1, aB_2_2, aB_1_3, aB_1_4, aB_1_5, aB_1_6// aB_2_1, aB_2_2, aB_2_3, aB_2_4, aB_2_5, aB_2_6// aB_3_1, aB_3_2, aB_3_3, aB_3_4, aB_3_5, aB_3_6// ....// aB_12_1, aB_12_2, aB_12_3, aB_12_4, aB_12_5, aB_12_6` Sorry, it's not easy to type properly in this mini chat. I want to have 6 columns corresponding to the last digit (from 1 to 6). The positions of values for each column are shown above – Theodore Oct 07 '21 at 00:17
  • Try something like `df %>% rename_with(~ str_replace(., "^([^_]+)(_\\d+)_(\\d+)$", "\\1\\3\\2"), -Year) %>% pivot_longer(cols = -Year, names_to = ".value", names_pattern = "^([^_]+)_.*")` – akrun Oct 07 '21 at 00:18
  • Yes, you're right – Theodore Oct 07 '21 at 00:18
  • i.e. base don the same data you showed, if i change the column names to `names(df)[-1] <- c("aB_1_1", "aB_2_1", "aB_1_2", "aB_2_2", "aB_3_2", "aB_4_2", "aB_1_3", "aB_2_3", "aB_3_3")` – akrun Oct 07 '21 at 00:18
  • i have to leave now. thanks – akrun Oct 07 '21 at 00:21
  • 1
    That's perfect. You're the best! Thank you for being patient with a newbie like me :D – Theodore Oct 07 '21 at 00:23