0

I'm looking to find a way to decrease the dimensionality of a dataset by creating a new column.

My original dataset is like this

Gene  ID1   ID2   ID3 ... ID1000
 
ACE    1     1     2 ...    0

GBA    0     1     2 ...    1

...

However, as you can tell, the dimensionality of the dataset is too high that I can't import that table into SQL database.

That's why I want to transform the dataset into the following form

Gene   ID    Genotype

ACE     1       1

ACE     2       1

ACE     3       2

...

ACE    1000     0

GBA.    0       0

GBA.    1       1

.....

Basically, I'm trying to create a new column called ID so that I can decrease the number of columns even though this will increase the number of rows...

What's the best way of doing this in R?

Sorry about the format, I don't know how to insert a table properly here....

Phil
  • 7,287
  • 3
  • 36
  • 66
XXWANGL
  • 43
  • 4

2 Answers2

1

We can use

library(dplyr)
library(tidyr)
df1 %>%
   pivot_longer(cols = -Gene, names_to = 'ID', values_to = 'Genotype')
akrun
  • 874,273
  • 37
  • 540
  • 662
1

A base R option using reshape may help

reshape(
  setNames(df,gsub("ID","ID.",names(df))),
  direction = "long",
  idvar = "Gene",
  varying = -1
)
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81