I have a huge data set where there are multiple copies of the same group of column names (but with _1, _2 added by R when I use "Read_csv" to import the data - "Duplicated column names deduplicated"). For example, the following code would create a very small version of what I am working with*:
practiceData <- tibble(sheepID= (1:3),
orig= c('224/235/54', '356/876/58', '345/765/67'),
dest= c('444/634/87', '578/765/23', '365/987/11'),
date= c('2009-03-02', '2009-03-02', '2009-03-02'),
sheepID_1= (4:6),
orig_1= c('232/153/76', '456/812/48', '135/665/77'),
dest_1= c('424/434/37', '538/745/23', '345/932/51'),
date_1= c('2009-03-02', '2009-03-02', '2009-03-02'),
sheepID_2= (7:9),
orig_2= c('212/135/24', '336/826/48', '332/743/37'),
dest_2= c('434/654/47', '575/725/13', '325/937/41'),
date_2= c('2009-03-02', '2009-03-02', '2009-03-02'))
As you can see, there would be 12 columns in total with 3 copies of 4 repeated column names (with _1, _2 added at the end of the repeats).
Instead of having the duplicated column names deduplicated, I would like to have all the column names that are the same type in one column, and then an extra column with identifiers for the original column that it was in. The following code creates a tibble that is structured in this way:
practiceData2 <- tibble(sheepID= (1:9),
orig= c('224/235/54', '356/876/58', '345/765/67', '232/153/76', '456/812/48', '135/665/77', '212/135/24', '336/826/48', '332/743/37'),
dest= c('444/634/87', '578/765/23', '365/987/11', '424/434/37', '538/745/23', '345/932/51', '434/654/47', '575/725/13', '325/937/41' ),
date= c('2009-03-02', '2009-03-02', '2009-03-02', '2009-03-02', '2009-03-02', '2009-03-02', '2009-03-02', '2009-03-02', '2009-03-02'),
col_inc = c('1', '1', '1', '2', '2', '2', '3', '3', '3'))
I was thinking that I could perhaps use the gather() function as part of tidyverse. However, I'm not sure how to do this for multiple repeated column names. In my full data set, there are tens of thousands of copies of the same 40 column names!
*None of this is real data