I received a dataset from a survey, and it has a rather strange structure. Every row has 90 variables, with the first 15 being general questions and the other 75 being 5 sets of 15 variables, that are actually observations. The concrete case is an employer being asked questions about employees. It holds the following structure:
set.seed(42)
n <- 100
data <- data.frame(
EmployerVar1=1:n,,
EmployerVar2=rnorm(n),
EmployerVar3=rnorm(n),
EmployerVar4=rnorm(n),
EmployerVar5=rnorm(n),
EmployerVar6=rnorm(n),
EmployerVar7=rnorm(n),
EmployerVar8=rnorm(n),
EmployerVar9=rnorm(n),
EmployerVar10=rnorm(n),
EmployerVar11=rnorm(n),
EmployerVar12=rnorm(n),
EmployerVar13=rnorm(n),
EmployerVar14=rnorm(n),
EmployerVar15=rnorm(n),
Employee1Var1=rnorm(n),
Employee1Var2=rnorm(n),
Employee1Var3=rnorm(n),
Employee1Var4=rnorm(n),
Employee1Var5=rnorm(n),
Employee1Var6=rnorm(n),
Employee1Var7=rnorm(n),
Employee1Var8=rnorm(n),
Employee1Var9=rnorm(n),
Employee1Var10=rnorm(n),
Employee1Var11=rnorm(n),
Employee1Var12=rnorm(n),
Employee1Var13=rnorm(n),
Employee1Var14=rnorm(n),
Employee1Var15=rnorm(n),
Employee2Var1=rnorm(n),
Employee2Var2=rnorm(n),
Employee2Var3=rnorm(n),
Employee2Var4=rnorm(n),
Employee2Var5=rnorm(n),
Employee2Var6=rnorm(n),
Employee2Var7=rnorm(n),
Employee2Var8=rnorm(n),
Employee2Var9=rnorm(n),
Employee2Var10=rnorm(n),
Employee2Var11=rnorm(n),
Employee2Var12=rnorm(n),
Employee2Var13=rnorm(n),
Employee2Var14=rnorm(n),
Employee2Var15=rnorm(n),
Employee3Var1=rnorm(n),
Employee3Var2=rnorm(n),
Employee3Var3=rnorm(n),
Employee3Var4=rnorm(n),
Employee3Var5=rnorm(n),
Employee3Var6=rnorm(n),
Employee3Var7=rnorm(n),
Employee3Var8=rnorm(n),
Employee3Var9=rnorm(n),
Employee3Var10=rnorm(n),
Employee3Var11=rnorm(n),
Employee3Var12=rnorm(n),
Employee3Var13=rnorm(n),
Employee3Var14=rnorm(n),
Employee3Var15=rnorm(n),
Employee4Var1=rnorm(n),
Employee4Var2=rnorm(n),
Employee4Var3=rnorm(n),
Employee4Var4=rnorm(n),
Employee4Var5=rnorm(n),
Employee4Var6=rnorm(n),
Employee4Var7=rnorm(n),
Employee4Var8=rnorm(n),
Employee4Var9=rnorm(n),
Employee4Var10=rnorm(n),
Employee4Var11=rnorm(n),
Employee4Var12=rnorm(n),
Employee4Var13=rnorm(n),
Employee4Var14=rnorm(n),
Employee4Var15=rnorm(n),
Employee5Var1=rnorm(n),
Employee5Var2=rnorm(n),
Employee5Var3=rnorm(n),
Employee5Var4=rnorm(n),
Employee5Var5=rnorm(n),
Employee5Var6=rnorm(n),
Employee5Var7=rnorm(n),
Employee5Var8=rnorm(n),
Employee5Var9=rnorm(n),
Employee5Var10=rnorm(n),
Employee5Var11=rnorm(n),
Employee5Var12=rnorm(n),
Employee5Var13=rnorm(n),
Employee5Var14=rnorm(n),
Employee5Var15=rnorm(n))
For analysis the dataset needs to have every observation in a new row, being the employer characteristics with the 15 variables linked to one employee. So the number of rows needs to increase fivefold.
I asked this question once before and while I thought this was the solution, it eventually didn't work. The suggestion was to:
library(tidyr)
X_wide <- data.frame(id = 1:3, P1 = 4:6, P2 = 7:9, P3 = 10:12)
X_long <- pivot_longer(X_wide, cols = P1:P3, names_to = "person", values_to = "score")
X_long <- as.data.frame(X_long)
This does not work as it would collapse all observation-columns in one new variable. All the data needs to remain the same, it only needs to land in a new spot. To clarify, this is the structure that I would need for analysis.
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
---|---|---|---|---|---|
EmployerVar1 | EmployerVar2 | EmployerVar3 | Employee1Var1 | Employee1Var2 | Employee1Var3 |
EmployerVar1 | EmployerVar2 | EmployerVar3 | Employee2Var1 | Employee2Var2 | Employee2Var3 |
EmployerVar1 | EmployerVar2 | EmployerVar3 | Employee3Var1 | Employee3Var2 | Employee3Var3 |