0

It's probably a newbie question, but so far I never found any concise and solid answer. I probably coming from python understand badly the philosophy or R but I'm stuck in the following :

From a list of data, I want : - to iterate over it - use each iteration as a variable into a function.

The aim : I have dataframes coming from packages that I have to add to make a left join pretty much. In order to make a left join I should have the same columns in both dataframes, that's why the list is adding missing columns into the receiving (left) dataframe to permit the join.

That could be important : those dataframes are GRanges objects coming from GenomicRanges package. Nevertheless that's a problem I already had in the past

here is my list :

> ll
[1] "gc.name"   "test3" 

here are my dataframes :

> dft
DataFrame with 30 rows and 5 columns
      pvalue      qvalue meth.diff           gc.X  gc.score
   <numeric>   <numeric> <numeric>      <GRanges> <numeric>
1   2.898639e-04 0.007018699 0.2231039     MT:706-710        80
2   6.043240e-05 0.003882324 0.2243177   MT:1146-1150        80
3   9.170025e-05 0.005355496 0.1447536   MT:1986-1990        80
4   1.904443e-04 0.006558452 0.2158183   MT:2001-2005        80
5   1.899050e-04 0.006558452 0.1475142   MT:3091-3095        80
...          ...         ...       ...            ...       ...
26  0.0001936141 0.006558452 0.1865440 MT:14801-14805        40
27  0.0002909048 0.007018699 0.1306336 MT:14941-14945        40
28  0.0002731153 0.007018699 0.1362367 MT:15696-15700        60
29  0.0002383786 0.006960917 0.2309187 MT:16081-16085        80
30  0.0003304606 0.007269440 0.1783131 MT:16091-16095        20

> dfs
DataFrame with 1 row and 6 columns
    pvalue      qvalue meth.diff      gc.X     gc.name  gc.score
 <numeric>   <numeric> <numeric> <GRanges> <character> <numeric>
1 0.0002898639 0.007018699 0.2231039  MT:708:+  rs28412942         0

my function is the following :

> ff <- function(x){dft[1,x]=dfs[1,x]}

I would like x to be read as ll[1], ll[2].. etc.

I tried at least two different approaches : apply() and %>%

  • dplyr

    ll %>% ff() Error: subscript contains invalid names Called from: .subscript_error("subscript contains invalid ", what)

and with apply :

lapply(ll,function(g){
dft[1,g]=dfs[1,g]
})

I tried some things with deparse() for instance, but I stick having the issue that ll[1] is not read as a string.

Can you help me and tell me also why in R, using a loop seems to me so complicated? :)

best,

Bratten
  • 31
  • 5
  • 1
    Whoa. Where to start. First, it's not a good idea to name a list() as "list". You're using the name of a reserved function as an object name. Second, you haven't told us what subjectHits(m) or what mcols(s) is. Either subjectHits or m or mcols or s are unknown to us. – Adam Sampson Aug 07 '18 at 13:49
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Since we don't know what all your variables actually point to, it's difficult to see what has to happen here. – MrFlick Aug 07 '18 at 13:50
  • This may help you though...`myDataframe[row,col]` is the syntax for subsetting a dataframe or matrix. `row` and `col` are integer indexes of rows and columns. So you might say I want row 4. In your case you are saying you want column "gc.name" which is not an integer index of a column. There are a lot of ways to subset differently in R using packages. But one way in base r would be to use `myDataframe[row,which(names(myDataframe)==x)]` – Adam Sampson Aug 07 '18 at 13:55
  • Okay, thank you for your feeback, I'll give you a reproducible example – Bratten Aug 07 '18 at 14:10

1 Answers1

0

Issues I see:

1) You are trying to subset a dataframe with a non-integer index position. You can solve this by finding the integer position you are looking for.

dft[,which(names(dft)==g)]

2) In dft[1,g]=dfs[1,g] you are trying to set the values in a column in dft equal to the values in a column in dfs and then return that value. This would be the same as simply returning the column in dfs, but would only work if the two columns are the same length. This does not merge things together or change the dataframes. = puts data in objects but doesn't change the structure of the objects.

3) A join assumes that there will be different columns in two datasets. A join merges on one or more unique key values. You don't need to fill anything in for a join. Let's say you want to join in "gc.name" column. You can use the dplyr package join functions.

new_dataframe <- left_join(dft,dfs)
# which in the example is the same as:
new_dataframe <- left_join(dft,dft,
     by = c("pvalue" = "pvalue",
            "qvalue" = "qvalue",
            "meth.diff" = "meth.diff",
            "gc.X" = "gc.X",
            "gc.score" = "gc.score"
           )

4) A bind is where you attach two dataframes together without merging them (either on top of one another (rbind) or beside one another (cbind)). A bind [usually] requires that columns are the same. However, dplyr has a function bind_rows which will bind two dataframes on top of one another and fill in any missing columns for you.

Adam Sampson
  • 1,971
  • 1
  • 7
  • 15