0

How can I condense many different columns into one single column based on an identifier which itself is given in the first column and which should remain a separate column (growing along), whereas the other column items are appended below each other into one column with multiple new rows, resulting in two very long columns instead of many columns (as below)?

From this:

ID y1 y2 y3
1  4  7  100
2  5  8  11
3  6  9  120

To this:

ID y1
1  4
1  7
1  100
2  5
2  8
2  11
3  6
3  9
3  120

To illustrate this within an example, I intend to go from the df_start to the df_final data frame. df_start is given and df_final is the result I intend to achieve.

To provide some context, I furthermore have to deal with 30,000 observations (rows) of 29 variables (columns). Therein, I have up to 9000 different identifiers (=IDs) within these rows, so within the final solution I cannot account for these manually. The respective column names can be lost when condensing them into a single column with many rows, except for the identifying column which "grows along".

df_start = data.frame(
  ID = c(100, 100, 101, 101, 101, 105),
  X1 = c(1, 2, 3, 4, 5, 6),
  X2 = c(7, 8, 9, 1, 4, 6),
  X3 = c(3, 4, 1, 8, 3, 5)
)
df_final = data.frame(
  ID = c(100, 100, 101, 101, 101, 105, 100, 100, 101, 101, 101, 105, 100, 100, 101, 101, 101, 105),
  X1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 4, 6, 3, 4, 1, 8, 3, 5)
)

Thank you in advance for your help and time, it is highly appreciated, as I have no clue how to tackle this.

Structure of data frame within R (gvkey = ID "column" with 9000 different IDs): enter image description here

LB.
  • 43
  • 5
  • It seems like this would work `df_start %>% pivot_longer(-ID) %>% select(ID, X1=value, name=NULL)` (using `dplyr` and `tidyr`) – MrFlick Oct 26 '22 at 20:42

0 Answers0