0

I have a dataframe looks like this

path:hsa00010   cpd:C00022
path:hsa00010   cpd:C00024
path:hsa00010   cpd:C00031
path:hsa00010   cpd:C00033
path:hsa00010   cpd:C00036
path:hsa00010   cpd:C00068
path:hsa00010   cpd:C00074
path:hsa00010   cpd:C00084
path:hsa00010   cpd:C00103
path:hsa00010   cpd:C00111
path:hsa00020   cpd:C00022
path:hsa00020   cpd:C00024
path:hsa00020   cpd:C00031
path:hsa00020   cpd:C00033
path:hsa00020   cpd:C00036
path:hsa00020   cpd:C00068
path:hsa00020   cpd:C00074
path:hsa00020   cpd:C00084
path:hsa00020   cpd:C00103
path:hsa00020   cpd:C00111

I would like to use second column as the rowname and obtain dataframe like this

cpd:C00022 path:hsa00010 path:hsa00020
cpd:C00024 path:hsa00010 path:hsa00020
...

Can somebody give any ideas? Thanks!

RRRRRRRR
  • 21
  • 3
  • 1
    did you try `rownames(mydata) <- mydata[,2]`? If that doesn't work, please provide a reproducible example (actual code and data) – scrameri Dec 21 '21 at 15:18
  • 1
    I just see now that you want to assign duplicate rownames (such as "cpd:C00022"). This is not possible in R data.frames. If you'd like to subset your data for "cpd:C00022" and "cpd:00024", you could try `mydata[mydata[,2] %in% c("cpd:C00022", "cpd:C00024"),]`, or use e.g. `mydata[grep("C0002\d", mydata[,2]),]`. – scrameri Dec 21 '21 at 15:22
  • @scrameri Thanks. I can have subset right now. But how can I convert this from column to row? – RRRRRRRR Dec 21 '21 at 15:35
  • If you start using `dplyr` verbs, though, many of them ignore or intentionally remove row names. While base R functions tend to do fine with row names (and not intentionally remove them), they can also *change them* without really notifying you, often with the premise of ensuring they are unique (e.g., adding `.1` or similar to uniquify the names). As such, it is commonly recommended to have your row-based index/***indices*** as column(s), not as row names, so (1) functions don't silently change them, and (2) you can have as many "indices" (columns) as you like. – r2evans Dec 21 '21 at 15:44
  • @RRRRRRRR it's difficult to understand what exactly you want to convert. Please make a minimum reproducible example using real code and data (not copy-pasted data). 3-4 lines of data should be enough to show us what you'd like to do on a much larger data.frame, and what you've tried and didn't work. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – scrameri Dec 21 '21 at 16:43

2 Answers2

0

Do you want something like this, using dplyr::group_by() and dplyr::summarize()? Once you have this, you can of course turn the cpd... column into rownames if you really need it as rownames.

library(dplyr)
library(tidyr)

df <- tibble::tribble(
  ~x,              ~y,
  "path:hsa00010", "cpd:C00022",
  "path:hsa00010", "cpd:C00024",
  "path:hsa00010", "cpd:C00031",
  "path:hsa00010", "cpd:C00033",
  "path:hsa00010", "cpd:C00036",
  "path:hsa00010", "cpd:C00068",
  "path:hsa00010", "cpd:C00074",
  "path:hsa00010", "cpd:C00084",
  "path:hsa00010", "cpd:C00103",
  "path:hsa00010", "cpd:C00111",
  "path:hsa00020", "cpd:C00022",
  "path:hsa00020", "cpd:C00024",
  "path:hsa00020", "cpd:C00031",
  "path:hsa00020", "cpd:C00033",
  "path:hsa00020", "cpd:C00036",
  "path:hsa00020", "cpd:C00068",
  "path:hsa00020", "cpd:C00074",
  "path:hsa00020", "cpd:C00084",
  "path:hsa00020", "cpd:C00103",
  "path:hsa00020", "cpd:C00111"
)

df %>% 
  group_by(y) %>% 
  summarise(x = list(x)) %>% 
  ungroup() %>% 
  unnest_wider(x, names_sep = "_")
#> # A tibble: 10 x 3
#>    y          x_1           x_2          
#>    <chr>      <chr>         <chr>        
#>  1 cpd:C00022 path:hsa00010 path:hsa00020
#>  2 cpd:C00024 path:hsa00010 path:hsa00020
#>  3 cpd:C00031 path:hsa00010 path:hsa00020
#>  4 cpd:C00033 path:hsa00010 path:hsa00020
#>  5 cpd:C00036 path:hsa00010 path:hsa00020
#>  6 cpd:C00068 path:hsa00010 path:hsa00020
#>  7 cpd:C00074 path:hsa00010 path:hsa00020
#>  8 cpd:C00084 path:hsa00010 path:hsa00020
#>  9 cpd:C00103 path:hsa00010 path:hsa00020
#> 10 cpd:C00111 path:hsa00010 path:hsa00020

Created on 2021-12-21 by the reprex package (v2.0.0)

0

We are probably looking for pivot_wider here.

library(tidyr)
library(dplyr)
library(stringr)

df %>% pivot_wider(values_from = path,
                   values_fn = \(x) str_remove_all(x, 'path:'),
                   names_from = path,
                   names_glue = 'path_{1:length(unique(path))}'
                   )%>%
    mutate(cpd = str_remove_all(cpd, "^cpd:"))

# A tibble: 10 × 3
   cpd    path_1   path_2  
   <chr>  <chr>    <chr>   
 1 C00022 hsa00010 hsa00020
 2 C00024 hsa00010 hsa00020
 3 C00031 hsa00010 hsa00020
 4 C00033 hsa00010 hsa00020
 5 C00036 hsa00010 hsa00020
 6 C00068 hsa00010 hsa00020
 7 C00074 hsa00010 hsa00020
 8 C00084 hsa00010 hsa00020
 9 C00103 hsa00010 hsa00020
10 C00111 hsa00010 hsa00020
GuedesBF
  • 8,409
  • 5
  • 19
  • 37