1

Say I have a excel table like this

You can download the example data from this link

I need to melt 2020-09, 2020-10, 2020-11 as date and extract adj_price for each pair of id and name.

How could I convert it to the dataframe as follows in R? Many thanks at advance.

enter image description here

ah bon
  • 9,293
  • 12
  • 65
  • 148
  • do you need an automated process to melt `2020-09`, `2020-10`, etc. or do you only need to work with these 3 dates? – Paul Nov 30 '20 at 07:48
  • 1
    There are others `dates` in real data, `2020-08`, `2020-07`, etc. – ah bon Nov 30 '20 at 07:50

1 Answers1

2

I came out with this solution, it might be optimized but the output is what you want and it should work with any number of columns.

library(tidyverse)

df1 <- readxl::read_xlsx(path = "path to test_data.xlsx")

# get all dates from the column names
cols <- colnames(df1)[3:ncol(df1)]
dates <- cols[grep("^[0-9][0-9][0-9][0-9]-[0-9][0-9]$", cols)]

# make a vector that will be used to make column names
colnames(df1)[3:ncol(df1)] <- rep(dates, rep(3, length(dates)))


# make a table with id, name and dates

finaldf <- df1[-1,] %>% pivot_longer(cols = 3:last_col(), names_to = "dates", values_to = "values")

indicators <- df1[-1,]
colnames(indicators) <- c("id", "name", df1[1, 3:ncol(df1)])
indicators <- indicators %>% pivot_longer(cols = 3:last_col(), names_to = "indicator", values_to = "values")

# final join and formatting
finaldf <- cbind(finaldf, indicators[, "indicator"]) %>% 
  filter(indicator == "adj_price") %>% 
  select(-indicator) %>% 
  rename("adj_price" = values) %>% 
  mutate(adj_price = as.numeric(adj_price))

The output:

> finaldf
  id             name   dates adj_price
1  1     Stracke-Huel 2020-09      3.80
2  1     Stracke-Huel 2020-10      3.72
3  1     Stracke-Huel 2020-11      3.70
4  2     Gleason-Mann 2020-09      7.25
5  2     Gleason-Mann 2020-10      7.50
6  2     Gleason-Mann 2020-11      7.50
7  3 Bauch-Cartwright 2020-09        NA
8  3 Bauch-Cartwright 2020-10     13.03
9  3 Bauch-Cartwright 2020-11     12.38
Paul
  • 2,850
  • 1
  • 12
  • 37
  • Many thanks, but why `dates` column doesn't contain `2020-09` and `2020-10`? – ah bon Nov 30 '20 at 08:43
  • 1
    It is the `as.Date.character()` that screw this up, I am not very familiar with Dates conversions – Paul Nov 30 '20 at 08:49
  • 1
    Known problem, see [this SO question](https://stackoverflow.com/questions/6242955/converting-year-and-month-yyyy-mm-format-to-a-date). The answer depends on what packages you want to use. Using only base functions, you will need to add a day to the date, see [this answer](https://stackoverflow.com/a/6242980/10264278) – Paul Nov 30 '20 at 08:56
  • I print `colnames(df1)[3:ncol(df1)]` and find there are too many duplicates dates, any ideas? Please check the updates in question. – ah bon Nov 30 '20 at 09:09
  • I have tested with example data, it seems also have some duplicated dates. – ah bon Nov 30 '20 at 09:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225309/discussion-between-paul-and-ahbon). – Paul Nov 30 '20 at 09:19