0

I want to do this: Input:

enter image description here

For the output i want to have the GTEX id as rows and ENSG as columns. Like this:

SAMPLEID ENSGO ENSGO ENSGO
GTEX-1117F-022~ 0 187 0
GTEX-1117F-042~ 0 109 0

I tried to do this command but it doesn't give me the correct output:

GTEx_Analysis <- as.data.frame(t(GTEx_Analysis_2017_06_05_v8_RNASeQCv1_1_9_gene_reads))
Rhea Bedi
  • 123
  • 6

1 Answers1

2

With base R

df_new <- data.frame(t(df[,-c(1:2)]))
colnames(df_new) <- df$Name

Or with tidyverse:

library(tidyverse)

  df_new <- df %>%
    select(-Description) %>%
    rowid_to_column() %>%
    pivot_longer(-c(Name, rowid)) %>%
    pivot_wider(names_from = c("Name", "rowid"), values_from = "value")

  colnames(df_new) <- str_replace_all(colnames(df_new), "\\_[0-9]","")

Output

                ENSG0~ ENSG0~ ENSG0~
GTEX-1117F-022~      0    187      0
GTEX-1117F-042~      0    109      0

Another option if you need to keep the Description would be to combine the Name and Description in the column heading.

library(tidyverse)
df %>%
  pivot_longer(-c(Name, Description)) %>%
  pivot_wider(names_from = c("Name", "Description"), values_from = "value")

  name            `ENSG0~_DDX11L1` `ENSG0~_WASH7p` `ENSG0~_MIR6859-1`
  <chr>                      <dbl>           <dbl>              <dbl>
1 GTEX-1117F-022~                0             187                  0
2 GTEX-1117F-042~                0             109                  0

Data

df <- structure(list(Name = c("ENSG0~", "ENSG0~", "ENSG0~"), Description = c("DDX11L1", 
"WASH7p", "MIR6859-1"), `GTEX-1117F-022~` = c(0, 187, 0), `GTEX-1117F-042~` = c(0, 
109, 0)), class = "data.frame", row.names = c(NA, -3L))
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • I used the third command that you gave: It does work and the output looks like: ` Name Description GTEX-1117F-022~ GTEX-1117F-042~ 1 ENSG0~ DDX11L1 0 0 2 ENSG0~ WASH7p 187 109 3 ENSG0~ MIR6859-1 0 0 ` But I want to do it for the entire file. There are 2604 rows. and I want to do it for the entire rows. – Rhea Bedi Feb 20 '22 at 17:59
  • I tried to do the tidyverse one. but it gave an error. could not find the function pivot_longer. I have installed and loaded the package tidyverse – Rhea Bedi Feb 20 '22 at 18:03
  • @RheaBedi Both the base R function and the `tidyverse` functions will work for your whole dataframe; just replace `df` with `GTEx_Analysis_2017_06_05_v8_RNASeQCv1_1_9_gene_reads`. Generally, on SO, you want to give a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), which you can do with `dput()`. So, I just made a small example here. In your expected output, you dropped the `Description` column so I did as well. One option would be to paste both the `Name` and `Description` together as the column name. – AndrewGB Feb 20 '22 at 18:43
  • @RheaBedi Did you get an error when you did `library(tidyverse)`? `pivot_longer` is a part of `tidyr`. So, you could try explicitly loading `tidyr` and see if you are getting an error. I just double-checked on my end and it runs fine. I'm running R version 4.1.2 and Rstudio version 2021.09.2 – AndrewGB Feb 20 '22 at 18:46
  • Yes, I was able to do it using base R function. With tidyverse it still gave me error as pivot_longer not found even though I installed and loaded tidyr as well as tidyverse package. – Rhea Bedi Feb 21 '22 at 01:06
  • @RheaBedi Hmmm, not sure then. You could try being explicit with the function, i.e., `tidyr::pivot_longer`. If that doesn't work, then it might have something to do with the version. What version of R are you running? – AndrewGB Feb 21 '22 at 02:22