-2

Sample Data :

df <- data.frame(ProdCode = c("C1","C2"), ProdName = c("Product 1", "Product 2"), Category = c("Categ 1", "Categ 2"), "Jan-16" = c(3,2), "Apr-16" = c(3,""), "Jul-16" = c(5,2), "Oct-16" = c(5,2))

The value corresponding each month and product cell is rating for that product. That need to be in Rating column in output dataframe:

> df
  ProdCode  ProdName Category Jan.16 Apr.16 Jul.16 Oct.16
1       C1 Product 1  Categ 1      3      3      5      5
2       C2 Product 2  Categ 2      2             2      2

I've above data which is required in below format:

ProdCode  Product.Name  Category    Rating.Date   Rating
C1          Product 1   Categ 1     Jan-16         3
C1          Product 1   Categ 1     Apr-16         3
C1          Product 1   Categ 1     Jul-16         5
C1          Product 1   Categ 1     Oct-16         5
C2          Product 2   Categ 2     Jan-16         2
C2          Product 2   Categ 2     Jul-16         2
C2          Product 2   Categ 2     Oct-16         2

The month column are dynamic and it will increase such as in future as per product it will come as Jan-2017 etc. I could do it by using for loop but that won't worth of using R.

user_az
  • 363
  • 2
  • 3
  • 17

2 Answers2

0

We can use gather to reshape it to 'long', filter out the blanks in 'Rating' and arrange by 'ProdCode'

library(tidyr)
library(dplyr)
gather(df, Rating.Date, Rating, Jan.16:Oct.16) %>%
            filter(Rating !="") %>%
            arrange(ProdCode) 
#  ProdCode  ProdName Category Rating.Date Rating
#1       C1 Product 1  Categ 1      Jan.16      3
#2       C1 Product 1  Categ 1      Apr.16      3
#3       C1 Product 1  Categ 1      Jul.16      5
#4       C1 Product 1  Categ 1      Oct.16      5
#5       C2 Product 2  Categ 2      Jan.16      2
#6       C2 Product 2  Categ 2      Jul.16      2
#7       C2 Product 2  Categ 2      Oct.16      2
akrun
  • 874,273
  • 37
  • 540
  • 662
0

These are the different other options we have :

Note : you have the liberty to name the last 2 columns, I kept it to the default(variable and value)

using melt() from data.frame

library(reshape2)
df1 = melt(df, id.vars = c("ProdCode" , "ProdName" ,"Category"), measure.vars = 4:7)
df1 = df1[df1$value != "",]

using melt from data.table)

library(data.table)
setDT(df) 
melt.data.table(df, 1:3,4:7)[value!=""]  # '1:3, 4:7' are the column indexes. read more from ?melt

#  ProdCode  ProdName Category variable value
#1       C1 Product 1  Categ 1   Jan.16     3
#2       C2 Product 2  Categ 2   Jan.16     2
#3       C1 Product 1  Categ 1   Apr.16     3
#5       C1 Product 1  Categ 1   Jul.16     5
#6       C2 Product 2  Categ 2   Jul.16     2
#7       C1 Product 1  Categ 1   Oct.16     5
#8       C2 Product 2  Categ 2   Oct.16     2

# if you want your specific column names :

melt.data.table(df, 1:3,4:7, variable.name = "Rating.Date", value.name = "Rating")[Rating!=""]
#   ProdCode  ProdName Category Rating.Date Rating
#1:       C1 Product 1  Categ 1      Jan.16      3
#2:       C2 Product 2  Categ 2      Jan.16      2
#3:       C1 Product 1  Categ 1      Apr.16      3
#4:       C1 Product 1  Categ 1      Jul.16      5
#5:       C2 Product 2  Categ 2      Jul.16      2
#6:       C1 Product 1  Categ 1      Oct.16      5
#7:       C2 Product 2  Categ 2      Oct.16      2
joel.wilson
  • 8,243
  • 5
  • 28
  • 48