1

I have a dataset like this:

df <- data.frame(origin = c('A', 'B', 'C'), freq = c(100,3000,200))

  origin freq
1      A  100
2      B 3000
3      C  200

I want to convert this to a structure like this:

A   B    C
100 3000 200

So that I can cbind it to an existing dataframe


  1. tidyr::spread()
    I tried using tidyr::spread however this would require the rows to have unique IDs. I can do this however this won't give me the desired result:

    > df %>% mutate(id = row_number()) %>% spread(origin, freq) id A B C 1 1 100 NA NA 2 2 NA 3000 NA 3 3 NA NA 200

  2. t()
    I can transpose the dataframe, however then it will convert origin to a row instead of column names. Of course I can edit it by setting the column names manually but this seems way to cumbersome.

    as.data.frame(t(df)) V1 V2 V3 origin A B C freq 100 3000 200

CodeNoob
  • 1,988
  • 1
  • 11
  • 33
  • I am not getting the same thing, when I run rownames( data.frame( t ( df )) ) origin and freq are rownames – MatthewR Sep 29 '19 at 13:03

3 Answers3

2

We could add a temporary column before spread.

library(dplyr)
library(tidyr)

df %>%
  mutate(n = 1) %>%
  spread(origin, freq) %>%
  select(-n)

#   A    B   C
#1 100 3000 200

If you have newer version of tidyr, we can use pivot_wider instead of spread

df %>%
  mutate(n = 1) %>%
  pivot_wider(names_from = origin, values_from = freq) %>%
  select(-n)

We could also reconstruct the dataframe

data.frame(matrix(df$freq, ncol = nrow(df), dimnames = list(NULL, df$origin)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

Here are some alternatives.

Note that the origin column is sorted in the example but if it were not then the setNames, dcast and dplyr alternatives preserve the ordering while the other alternatives sort them and you may want one or the other behavior.

1) xtabs We can use xtabs to convert df to an object of class c("xtabs", "table") and from that to "data.frame". Omit as.data.frame.list if table output is ok. No packages are used.

as.data.frame.list(xtabs(freq ~ origin, df))
##     A    B   C
## 1 100 3000 200

1a) tapply We could use tapply in much the same way:

as.data.frame.list(tapply(df$freq, df$origin, c))
##     A    B   C
## 1 100 3000 200

1b) setNames Also setNames can be used in much the same way.

as.data.frame.list(setNames(df$freq, df$origin))
##     A    B   C
## 1 100 3000 200

2) split Another approach is to split the freq by origin giving a list and then convert that list to a data.frame. Again, no packages are used.

as.data.frame(split(df$freq, df$origin))
##     A    B   C
## 1 100 3000 200

3) reshape We could use reshape like this. The setNames line could be omitted if we are not fussy about the form of the names.

wide <- reshape(transform(df, id = 1), dir = "wide", timevar = "origin")[-1]
setNames(wide, df$origin)
##     A    B   C
## 1 100 3000 200

4) dcast This solution uses the data.table package -- there is also a dcast in the reshape2 package that works similarly.

library(data.table)

dcast(df, . ~ origin, value.var = "freq")
##     A    B   C
## 1 100 3000 200

5) dplyr/tibble. Using the indicated packages we remove the rownames (only needed if the data frame has row names but doesn't hurt if not), convert the origin column to row names, transpose what is left and convert that to a tibble:

library(dplyr)
library(tibble)

df %>%
  remove_rownames %>%
  column_to_rownames("origin") %>%
  t %>%
  as.tibble

## # A tibble: 1 x 3
##       A     B     C
##   <dbl> <dbl> <dbl>
## 1   100  3000   200
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

We could use for row names the first column and then transpose without it.

t(`rownames<-`(df,df[,1])[-1])
#        A    B   C
# freq 100 3000 200
jay.sf
  • 60,139
  • 8
  • 53
  • 110