Looking for help on automating a task that I need to do across multiple data sets of multiple types.
Here's some example data:
#create example data
x1_a<-rnorm(3)
x1_b<-rnorm(3)
x2_a<-rnorm(3)
x2_b<-rnorm(3)
x3_a<-rnorm(3)
x3_b<-rnorm(3)
df<-data.frame(x1_a, x1_b, x2_a, x2_b, x3_a, x3_b)
> df
x1_a x1_b x2_a x2_b x3_a x3_b
1 0.2912428 1.2737063 -0.9997475 1.0400489 -0.5323241 -0.2229865
2 2.0147965 -1.0180336 0.1080972 1.1411229 0.4791280 0.6230867
3 0.4189106 -0.6987785 -0.3890773 -0.3292366 -1.6186305 1.3913514
Essentially, this data is in wide format and I need it in long. But I'm not sure how to automate this transformation.
In this data, the numeric value in the column indicates the observation the column belongs to (e.g., x1_a goes with x1_b). The character values indicate what the columns actually are. So, columns matching in characters (e.g., x1_a matches x2_a, they're both "x_a" just from different observations) are the same thing, just repeated observations of that thing.
The long format data would have just the character string variable (here, "x_a" or "x_b"), and would contain the repeated observations of that thing. I'm agnostic as to whether the original numeric value is preserved.
What I'm looking for is an automated way to do the following code across variable length datasets and varying column names (that all match the basic pattern of "x1_a".
#laborious/embarassing, not good way to accomplish goal
df1<-df[,c("x1_a", "x1_b")]
names(df1)<-gsub("[[:digit:]]+", "", names(df1))
df2<-df[,c("x2_a", "x2_b")]
names(df2)<-gsub("[[:digit:]]+", "", names(df1))
df3<-df[,c("x2_a", "x2_b")]
names(df3)<-gsub("[[:digit:]]+", "", names(df1))
#desired data output
df<-do.call(rbind, list(df1, df2, df3))
> df
x_a x_b
1 0.2912428 1.2737063
2 2.0147965 -1.0180336
3 0.4189106 -0.6987785
4 -0.9997475 1.0400489
5 0.1080972 1.1411229
6 -0.3890773 -0.3292366
7 -0.9997475 1.0400489
8 0.1080972 1.1411229
9 -0.3890773 -0.3292366
I'm thinking that something automated like this might work: 1) using "grep" to pull out the column names that match in numeric value (e.g., x1_a goes with x1_b), 2) repeating this for as many unique numeric values/groups there are, 3) storing each of those "split" by column name variables in a list, 4) batch renaming the columns in those list elements, and 5) binding them back into a data frame.
Links to previous posts if I've overlooked something are welcome. Thank you in advance for the help!