1

I haven't been able to find a solution to this so far... This one came the closest: 1

Here is a small subset of my dataframe, df:

ANIMAL(chr)    MARKER(int)    GENOTYPE(int)
"1012828"      1550978        0
"1012828"      1550982        2
"1012828"      1550985        1
"1012830"      1550982        0
"1012830"      1550985        2
"1012830"      1550989        2

And what I want is this...

ANIMAL        MARKER_1550978    MARKER_1550982    MARKER_1550985    MARKER_1550989
"1012828"     0                 2                 1                 NA
"1012830"     NA                0                 2                 2

My thought, initially was to create columns for each marker according to the referenced question

markers <- unique(df$MARKER)
df[,markers] <- NA

since I can't have integers for column names in R. I added "MARKER_" to each new column so it would work:

df$MARKER <- paste("MARKER_",df$MARKER) markers <- unique(df$MARKER) df[,markers] <- NA

Now I have all my new columns, but with the same number of rows. I'll have no problem getting rid of unnecessary rows and columns, but how would I correctly populate my new columns with their correct GENOTYPE by MARKER and ANIMAL? Am guessing one-or-more of these: indexing, match, %in%... but don't know where to start. Searching for these in stackoverflow did not yield anything that seemed pertinent to my challenge.

Community
  • 1
  • 1

1 Answers1

1

What you're asking is a very common dataframe operation, commonly called "spreading", or "widening". The inverse of this operation is "gathering". Check out this handy cheatsheet, specifically the part on reshaping data.


library(tidyr)

df %>% spread(MARKER, GENOTYPE)
#>    ANIMAL 1550978 1550982 1550985 1550989
#> 1 1012828       0       2       1      NA
#> 2 1012830      NA       0       2       2
yeedle
  • 4,918
  • 1
  • 22
  • 22