How to Classify data frame Based on a Columns in R?

Question

I have a data frame and has columns like this:

gene    col1    col2    type
------------------------------
gene_1   a       b        1
gene_2   aa      bb       2
gene_3   a       b        1
gene_4   aa      bb       2

I want to find the column "type" using column "col2" and "col1". so I need a classification based on "col2" and "col1". how should I do this in R?

thanks a lot

akrun · Accepted Answer · 2019-09-05T17:51:11.433

3

Based. on the output, an option is to create group indices from columns 'col1', and 'col2'

library(dplyr)
df1 %>%
   mutate(type = group_indices(., col1, col2))
#.   gene col1 col2 type
#1 gene_1    a    b    1
#2 gene_2   aa   bb    2
#3 gene_3    a    b    1
#4 gene_4   aa   bb    2

If there are multiple names, then one option is to convert the string column names to symbols and then evaluate (!!!)

df1 %>%
    mutate(type = group_indices(., !!! rlang::syms(names(.)[2:3])))

Or in data.table

library(data.table)
setDT(df1)[, type := .GRP, .(col1, col2)]

data

df1 <- structure(list(gene = c("gene_1", "gene_2", "gene_3", "gene_4"
), col1 = c("a", "aa", "a", "aa"), col2 = c("b", "bb", "b", "bb"
), type = c(1L, 2L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-4L))

edited Sep 05 '19 at 17:51

answered Sep 05 '19 at 14:48

akrun

874,273
37
540
662

thanks a lot, it works for 2 columns, what if I have 400 columns? how can I write the code without naming all columns? – MMRA Sep 05 '19 at 17:49
@Mahshid. If your 400 columns are from 2 to 401th position, `rlang::syms(names(.)[2:401]`. make the changes. to the index accordingly – akrun Sep 05 '19 at 17:59

How to Classify data frame Based on a Columns in R?

1 Answers1

data

Linked