0

I have a dataframe like:

> head(bedfile)
                        chr start   end                        fam    class_subclass Jval_at_closing step strand    code            family
1 1.1000_SRR13070663.317319   582   665      LINE__rnd-6_family-16     LINE/I-Jockey               2    3      R    LINE   rnd-6_family-16
2 1.1000_SRR13070663.317319 20701 20804    LINE__rnd-5_family-6279           LINE/L2               3    2      F    LINE rnd-5_family-6279
3 1.1001_SRR13070663.317930  1023  1117     DNA__rnd-5_family-1403 DNA/TcMar-Mariner               2    3      R     DNA rnd-5_family-1403
4 1.1001_SRR13070663.317930  1139  1196 Unknown__rnd-5_family-4199           Unknown               3    3      F Unknown rnd-5_family-4199
5 1.1001_SRR13070663.317930  1199  1282 Unknown__rnd-5_family-6039           Unknown               4    3      R Unknown rnd-5_family-6039
6 1.1001_SRR13070663.317930  1317  1384 Unknown__rnd-6_family-2340           Unknown               5    3      F Unknown rnd-6_family-2340

and another one:

> head(CODE)
      V1 V2 V3  V4 V5
1   rDNA  F  S 500  3
2   rDNA  R  s 500  3
3 CL0015  F  O 300  3
4 CL0015  R  o 300  3
5 CL0076  F  P 300  3
6 CL0076  R  p 300  3

I would like to create a new column in the first one, and assign different values to selected rows based on criteria, using a nested for loop that creates the criteria:

    for (i in unique(CODE$V1)) {
        for (j in unique(CODE$V2)) {
                bedfile[bedfile$code == i & bedfile$strand == j,]$codification <- CODE[CODE$V1 == i & CODE$V2 == j,]$V3}}

The idea is that a particular value from one dataframe to be assigned to all rows that match these criteria in bedfile, but the actual code does not work. The problem is in the assignation line, since the following statement does not work either:

bedfile[bedfile$code == "LTR" & bedfile$strand == "+",]$codification <- "some_string"

Can anyone help me? I think that it may be also a way more efficient way for do that than with a nested for loop, but I don't know how

Thank you

em07
  • 1
  • This is a "merge" or "join" operation. In base R, `merge(bedfile, CODE[c("V1", "V2", "V3")], by.x = c("code", "strand"), by.y = c("V1", "V2"), all.x = TRUE)`. Or with `library(dplyr)` you can do `left_join(bedfile, select(CODE, V1, V2, codification = V3), on = c("code" = "V1", "strand" = "V2"))` – Gregor Thomas May 28 '21 at 15:50
  • That's awesome, thank you @GregorThomas ! – em07 May 28 '21 at 16:09

0 Answers0