0

I have a df:

  product   store   store1  review   review1 
  book      A       B                 
  shirt     A       B                 
  pen       A       B       
  cd        A       B        0         2
  dress     A       B        2         1
  magazine  A       B        3         1

I want the values in the store columns to become column names and I want to insert the review values in that column, so the output looks like this:

  product   A       B 
  book      0       2
  shirt     2       1
  pen       3       1

There are two issues with this problem. First of all, the store names will change a lot in the future, so I can't use code like this:

 names(newdf)[names(newdf) == 'store'] <- 'a'

Secondly, I need the values from the review and review1 column to start from the first row in the A and B column, so to say. For example, column a = book = 0, shirt = 2, magazine = 3.

I'm really stuck on this, any help would be much appreciated!

Reproducible code:

df <- data.frame(product = c("book","shirt", "pen", "cd", "dress", "magazine"), store=c("A", "A", "A", "A", "A", "A"),
                 store1=c("B", "B", "B", "B", "B", "B"), review=c("", "", "", 0, 2, 3), review1 =c("", "", "", 2, 1, 1))
Veraaa
  • 301
  • 2
  • 6
  • 15
  • how are you getting `2` for column `A` under product `shirt` ? – Aramis7d Oct 22 '17 at 22:25
  • These are the values from the review columns. The 0 from the review column belongs to book, the 2 belongs to shirt, the 3 to pen, etc. Thanks for asking, I'll edit the question. – Veraaa Oct 22 '17 at 22:29
  • 1
    Possible duplicate of [How to reshape data from long to wide format?](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – Aramis7d Oct 22 '17 at 22:32
  • I guess you're probably unaware of the terms that would make life easier in this situation... try looking into `long format` and `wide format` for data. also, you can check out the `spread` function from `tidyverse` libraries :) – Aramis7d Oct 22 '17 at 22:34

1 Answers1

1

It is difficult to develop a universal method for this because it is unclear whether the number of columns will be the same, or can they be matched one-to-one all the time (say, for every storeN there is reviewerN) etc. Here's the code that does what you wanted, but I am not sure if it fits the purpose. You should have explained your problem more thoroughly.

df <- data.frame(product = c("book","shirt", "pen", "cd", "dress", "magazine"), store=c("A", "A", "A", "A", "A", "A"),
                 store1=c("B", "B", "B", "B", "B", "B"), review=c("", "", "", 0, 2, 3), review1 =c("", "", "", 2, 1, 1))


# Convert factors to character
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)

# Blanks to NAs
df[df==""] <- NA

# Indicate which columns contain values to rename other columns
cols_with_values <- c(2,3)

# save first row in these columns
new_columns_names <- as.character(df[1,cols_with_values])

# Kill them!
df[,cols_with_values] <- NULL

# Rename columns
names(df) <- c(names(df[1]), new_columns_names)

# Show rows without NAs
df[complete.cases(df), ]
   product A B
4       cd 0 2
5    dress 2 1
6 magazine 3 1

As for your second problem, it is a mad task because it renders your data as very, very messy. I mean, how do you know that book = 3, not shirt?

Alex Knorre
  • 620
  • 4
  • 15