I have an Excel spreadsheet that I would like to change from wide to long format in R. However, the original layout of the excelsheet makes this problematic.
Spreadsheet:
Date | Unit | Day | Treatment | Value | Unit | Day | Treatment | Value |
---|---|---|---|---|---|---|---|---|
01-03-2023 | 1 | 1 | treatment1 | 8.5 | 2 | 1 | treatment2 | 9.4 |
01-03-2023 | 1 | 2 | treatment1 | 10.2 | 2 | 2 | treatment2 | 10.5 |
This goes on for more than 200 different units, i.e. it is a very wide spreadsheet and there are more columns than added in this example
# I would like to change the format to the following:
Date | Unit | Day | Treatment | Value |
---|---|---|---|---|
01-03-2023 | 1 | 1 | treatment1 | 8.5 |
01-03-2023 | 2 | 1 | treatment2 | 9.4 |
02-03-2023 | 1 | 2 | treatment1 | 10.2 |
02-03-2023 | 2 | 2 | treatment2 | 10.5 |
The first problem arises when trying to import the sheet, because R creates unique names for each column by adding numbers sequentially to the names of each column, e.g.
Date | Unit...2 | day...3 | treatment...4 | value...5 | Unit...6 | day...7 | treatment...8 | value...9 |
---|
The second problem is how to obtain the desired long-format table using pivot_longer (or other alternatives), with grouped/repeated data like this?
The original layout of the spreadsheet cannot be modified since many persons use it and add data to it everyday.
I hope to find a solution using the pivot_longer function, since I'm familiar with this - but I will accept all suggestions with graditude
I found some inspiration from other posts, but these all have the "unit identifiers" added to each column name, whereas I have random unique identifiers for each column generated by R, e.g. https://stackoverflow.com/questions/59891956/converting-data-from-wide-to-long-format-when-id-variables-are-encoded-in-column
or