Splitting dataframe column into multiple columns

Question

Say I have a dataframe that looks like this:

gene    drug    log2FC
Ubb    Naloxone    0.6375514
Tuba1a   Naloxone    0.5827224
Scd1    Naloxone    -0.7249997
Ubb    Aspirin    0.8000
Tuba1a    Aspirin  0.73324
Scd1    Aspirin    0.2497
Ubb    Haldol    0.0375
Tuba1a    Haldol    0.25824
Scd1    Haldol    -0.0249997

Would there be an easy way to create columns for each unique drug, so I'm left with something like this:

gene    Naloxone_log2FC    Asirin_Log2FC    Haldol_log2FC
Ubb     0.6375514    0.73324    0.0375
Tuba1a  ...
Scd1    ...

Thanks!

Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some sample data, e.g. with `dput()`. — mnist, Nov 11 '19 at 22:06
Have a look at the `pivot_wider` function from `tidyr` https://tidyr.tidyverse.org/reference/pivot_wider.html — fmarm, Nov 11 '19 at 22:10

score 5 · Accepted Answer · answered Nov 11 '19 at 22:11

You could use tidyr::spread() of the newer tidyr::pivot_wider() :

library(tidyr)

data <- read.table(h=T, strin=F, text="gene    drug    log2FC
Ubb    Naloxone    0.6375514
Tuba1a   Naloxone    0.5827224
Scd1    Naloxone    -0.7249997
Ubb    Aspirin    0.8000
Tuba1a    Aspirin  0.73324
Scd1    Aspirin    0.2497
Ubb    Haldol    0.0375
Tuba1a    Haldol    0.25824
Scd1    Haldol    -0.0249997")

data %>% spread(drug, log2FC)
#>     gene Aspirin     Haldol   Naloxone
#> 1   Scd1 0.24970 -0.0249997 -0.7249997
#> 2 Tuba1a 0.73324  0.2582400  0.5827224
#> 3    Ubb 0.80000  0.0375000  0.6375514

data %>% pivot_wider(names_from = "drug", values_from = log2FC)
#> # A tibble: 3 x 4
#>   gene   Naloxone Aspirin  Haldol
#>   <chr>     <dbl>   <dbl>   <dbl>
#> 1 Ubb       0.638   0.8    0.0375
#> 2 Tuba1a    0.583   0.733  0.258 
#> 3 Scd1     -0.725   0.250 -0.0250

^{Created on 2019-11-11 by the reprex package (v0.3.0)}

Hmm is there any way to modify the column names into what OP's expected output looks like? (i.e., `Naloxone_log2FC` and so on.) I took a quick glance at the `pivot()` vignette, and while I saw that adding prefixes is possible, there doesn't seem to be anything about being able to add suffixes. — Dunois, Nov 11 '19 at 22:19
indeed it's not possible by default, but just use `%>% rename_at(-1, paste0, "_log2FC")` at the end of the pipe chain and you'll get your suffixes :) — moodymudskipper, Nov 12 '19 at 00:12

Stanislas Morbieu · Answer 2 · 2019-11-11T22:35:59.417

2

You can use the spread function of the tidyr package (I am assuming your dataframe is df):

library("tidyr")

df = spread(df, drug, log2FC)

To append "_log2FC" to the column names you can use:

for (i in 2:length(colnames(df))) {
  colnames(df)[i] = paste0(colnames(df)[i], "_log2FC")
}

edited Nov 11 '19 at 22:35

answered Nov 11 '19 at 22:10

Stanislas Morbieu

1,721
7
11

Splitting dataframe column into multiple columns

2 Answers2