I extensively searched and found multiple questions and answers regarding similar situations but not what I'm facing here, so i'm relying on making this question.
I have a messy dataframe in R that have this structure, I obtained it just by running a query inside R after connecting to the database that have all the data stored.
id location year category subCategory numericValue Unit
1 place_1 1960 cat_1 subcat_1 34.567 kg
2 place_1 1961 cat_1 subcat_1 564.566 kg
3 place_1 1962 cat_1 subcat_1 3.4356 kg
4 place_1 1963 cat_1 subcat_1 654.44 kg
5 place_1 1960 cat_1 subcat_2 409.65 cm
6 place_1 1961 cat_1 subcat_2 985545 cm
7 place_1 1962 cat_1 subcat_2 5436.4 cm
8 place_1 1963 cat_1 subcat_2 324.23 cm
...
...
...
341 place_1 1960 cat_2 subcat_1 487.65 cm³
342 place_1 1961 cat_2 subcat_1 434.65 cm³
343 place_1 1962 cat_2 subcat_1 421.65 cm³
344 place_1 1963 cat_2 subcat_1 87.65 cm³
345 place_1 1960 cat_2 subcat_2 0.33 ton
346 place_1 1961 cat_2 subcat_2 1.65 ton
347 place_1 1962 cat_2 subcat_2 89.2 ton
348 place_1 1963 cat_2 subcat_2 1345 ton
...
...
12334 place_2 1960 cat_1 subcat_1 1111 kg
12334 place_2 1961 cat_1 subcat_1 2222 kg
12334 place_2 1962 cat_1 subcat_1 3333 kg
12334 place_2 1960 cat_1 subcat_2 4444 cm
12334 place_2 1961 cat_1 subcat_2 5555 cm
12334 place_2 1962 cat_1 subcat_2 6666 cm
...
...
...
99999 place_67 1982 cat_5 subcat_9 3455 Watt
To try and explain what's going on: I have a numeric value, the actual data, measured and categorized into a category and subcategory, it was obtained in a location on a given year. There are blocks of years for the combination of category+subcategory for a given location and it goes on until all locations were observed within all years and every category and it's subcategories.
What I want to obtain is a dataframe that have each category+subcategory combined and have the data measured under it's value, effectively making a very long dataframe into a very wide one.
id location year cat_1.subcat_1 cat_1.subcat_2 ... cat_5.subcat_9
1 place_1 1960 34.567 409.65 NA
2 place_1 1961 564.566 985545 NA
3 place_1 1962 3.4356 5436.4 444
4 place_1 1963 654.44 324.23 0.5
5 place_2 1960 1111 4444 3.22e-04
...
350 place_67 1982 NA NA 3455
I can sort of achieve this with a procedure I made on the database that have this all stored but it's very very costly to run it. I have no idea of what to do with the Unity column, I don't think I can incorporate it anywhere, cause It's unique to the measurement itself and not a combination like the category/subcategory but it's not critical if it ends up thrown away.
I can answer any doubts if things are not very clear.
Thanks!