4

I have data.frame df1 (see code below). I would like to convert it to what df2 looks like (see code below).

Maybe this can be done with reshape cast or reverse melt? But I do not understand these functions. Can anyone help, please?

 df1 <- data.frame(
   stringsAsFactors = FALSE,
                    sample = c("a","a","a",
                               "a","b","c","c","c","c","c","c","c","c",
                               "d","d","e","e","e","g","g"),
                    LETTER = c("P","R","V",
                               "Y","Q","Q","R","S","T","U","W","X","Z",
                               "Q","X","Q","V","X","Q","T")
        )

 df2 <- data.frame(
   stringsAsFactors = FALSE,
             sample = c("a", "b", "c", "d", "e", "f", "g"),
                  P = c(1L, 0L, 0L, 0L, 0L, 0L, 0L),
                  Q = c(0L, 1L, 1L, 1L, 1L, 0L, 1L),
                  R = c(1L, 0L, 1L, 0L, 0L, 0L, 0L),
                  S = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
                  T = c(0L, 0L, 1L, 0L, 0L, 0L, 1L),
                  U = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
                  V = c(1L, 0L, 0L, 0L, 1L, 0L, 0L),
                  W = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
                  X = c(0L, 0L, 1L, 1L, 1L, 0L, 0L),
                  Y = c(1L, 0L, 0L, 0L, 0L, 0L, 0L),
                  Z = c(0L, 0L, 1L, 0L, 0L, 0L, 0L)
        )

Edit

It was suggested that I look at this post: How to reshape data from long to wide format. Unfortunately, that does not answer my question. The equivalent code would be as follows and throws the following error.

 df2 <- reshape(df, idvar = "sample", timevar = "LETTER", direction = "wide")
 Error in data[, timevar] : object of type 'closure' is not subsettable

First adding a third variable using df1$value <- 1L also does not solve it.

Please note that in my data, there is no exact match between length and width of the data, unlike in said post. Any help is still appreciated, please.

Sylvia Rodriguez
  • 1,203
  • 2
  • 11
  • 30
  • Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – ekoam Nov 02 '20 at 11:24
  • Thank you for suggesting. Great post, but unfortunately, that does not work. Please see my edit above. Thank you. – Sylvia Rodriguez Nov 02 '20 at 11:44
  • You need __3__ variables at least if you want to reshape. Please first create another variable in your dataframe `df1$value <- 1L` – ekoam Nov 02 '20 at 11:53
  • Thank you for suggesting that. I figured that too and tried it already, but it unfortunately still does not solve the problem. – Sylvia Rodriguez Nov 02 '20 at 11:56
  • So you also want to impute missing values like that "f" in-between "e" and "g"? – ekoam Nov 02 '20 at 12:06
  • 1
    I interpreted your desired result (df2) to be a frequency table (see my answer). But I just realized that df2 in your code has only zeroes for column "W". In contrast to this, df1 does contain a "W"-value (for sample = c). Is W = 0 for all samples really the expected result? Or is this a typo (if so, maybe edit this). – AndreasM Nov 02 '20 at 12:30
  • Great catch! That was an error. You are right, in df2 column "W" row "c", it should have said "1". I edited the question to correct it as you suggested. Thank you! - Syl. – Sylvia Rodriguez Nov 02 '20 at 12:34

2 Answers2

3

You can create a frequency table with table() and transform the result into a data.frame.

x <- table(df1$sample, df1$LETTER)
df2 <- cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))

sample P Q R S T U V W X Y Z
a      a 1 0 1 0 0 0 1 0 0 1 0
b      b 0 1 0 0 0 0 0 0 0 0 0
c      c 0 1 1 1 1 1 0 1 1 0 1
d      d 0 1 0 0 0 0 0 0 1 0 0
e      e 0 1 0 0 0 0 1 0 1 0 0
g      g 0 1 0 0 1 0 0 0 0 0 0

If you want to include sample = f (not present in df1) in the output you could add the missing value as a factor level to df$sample before calling table():

df1$sample <- factor(df1$sample, levels = letters[1:7])
x <- table(df1$sample2, df1$LETTER)
cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))

  sample P Q R S T U V W X Y Z
a      a 1 0 1 0 0 0 1 0 0 1 0
b      b 0 1 0 0 0 0 0 0 0 0 0
c      c 0 1 1 1 1 1 0 1 1 0 1
d      d 0 1 0 0 0 0 0 0 1 0 0
e      e 0 1 0 0 0 0 1 0 1 0 0
f      f 0 0 0 0 0 0 0 0 0 0 0
g      g 0 1 0 0 1 0 0 0 0 0 0

AndreasM
  • 902
  • 5
  • 10
1

You can create a dummy column and get data in wide format :

library(dplyr)

df1 %>%
  mutate(n = 1) %>%
  tidyr::pivot_wider(names_from = LETTER, values_from = n, values_fill = 0)

#  sample     P     R     V     Y     Q     S     T     U     W     X     Z
#  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a          1     1     1     1     0     0     0     0     0     0     0
#2 b          0     0     0     0     1     0     0     0     0     0     0
#3 c          0     1     0     0     1     1     1     1     1     1     1
#4 d          0     0     0     0     1     0     0     0     0     1     0
#5 e          0     0     1     0     1     0     0     0     0     1     0
#6 g          0     0     0     0     1     0     1     0     0     0     0

Or in data.table :

library(data.table)
setDT(df1)[, n := 1]
dcast(df1, sample~LETTER, value.var = 'n', fill = 0)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213