0

In R I have a dataframe with index values as rownames (called index) that looks something like this:

index
  [1] "F0000014" "F0000015" "F0000024" "F0000036" ...hundreds more

And I have a dataframe that has rownames values (called table) that looks something like this:

table
  [1] "F0038001" "F0259700;F0259699.3" "F0259699;F0259700.4" "F0259247.4"                                                 
  ...thousands more

Many of the 'index' dataframe rownames will match values of the 'table' rownames, and these are the rows I would like to copy out of 'table' into a new 'match' dataframe.

The complication is that some rowname values of the table dataframe have sub-numbers (denoted with .#), and some are also multiple index values concatenated together with semicolons. So all I want to require for a match is that the index value being queried matches somewhere within the table value for it to be called as a matched row, and therefore copied to the 'match' dataframe.

I have tried something like this in R based on this previous Stacks post:

for(var in rownames(index))
{
  match <- table[grep(var, rownames(table)), ]
}

But I get an "incorrect number of dimensions" error.

I believe the problem is something to do with mis-specifying the rownames of the table dataframe improperly in this command, as that is what is different about my request compared to those in the linked post. Though any further modification I try seems to be met with another error.

Any help would be very much appreciated as I cannot seem to find a syntax that works!

Thanks so much.

Andrew

amrezans
  • 33
  • 1
  • 6
  • Are the index rownames all the same number of characters? – markhogue Mar 23 '20 at 16:54
  • can you put together a few examples that are supposed to match from index and table? Otherwise it's impossible to figure it out. Or do you have different transcript annotations – StupidWolf Mar 23 '20 at 16:55
  • 1
    Right now, there's two problems, your for loop is not storing anything and second, if index is a vector, rownames(index) is probably not index? – StupidWolf Mar 23 '20 at 16:56
  • I can show you how to join the two tables. Would that help? – markhogue Mar 23 '20 at 17:19
  • Thank you for your quick replies. Yes index row names will all have an F followed by seven digits. So for example if F0259700 is present in the 'index' dataframe it would need to match the third value in the 'table' dataframe if working properly. Does this for loop not iterate through every value of rownames in 'index'? I believe my test with a previous test grep proved this for me, by let me double check. I should note that 'index' is a dataframe with one other column of irrelevant info, not just a vector as far as i understand. – amrezans Mar 23 '20 at 17:24
  • 1
    One problem is that, in your loop, `match` has not been defined previously and is not indexed, so I think what happens is the value of match cannot be forced to the dimensions of `table` and would be replaced anyway each time it loops through. – markhogue Mar 23 '20 at 17:28
  • For the record I acheived what I was after using a linux command line grep as follows: ```grep -Ff index_newlines.txt rowstomatch_data.txt > matched_rows.txt``` Thanks again for your help despite the fact I abandoned the attempt in R! – amrezans Mar 24 '20 at 15:50

0 Answers0