3

I am looking to integrate data from one dataframe (A) selectively into another (B). The conditions are as follows: The data frames share two columns (miRNA & Gene). Dataframe A also contains column with a value for the pair.

I want to create a new column in dataframe B that is taken from the Value column in A and contains a value if the pair (same miRNA & Gene from a row in A) matches in B. If a pair does not match in B, create a new row with the score.

Pseudocode

#Initialize column in B that will house A value if first two columns match
B$A_Values <- 0

If A[,1:2] == B[,1:2]:
     Change initialized B$A_Value to A[VALUE] of row from A[,1:2]

If A[,1,2] is not in B[,1:2]: 
     Add row in B[,1:2] 
     Change initialized B$A_Value to A[Value] of row from A[,1:2]

The dataframes are not of equal length and there will be items in B not found in A, though I assume my initialization will default value them to 0. Any help will be appreciated.

Cheers

Frank
  • 66,179
  • 8
  • 96
  • 180
Cody Glickman
  • 514
  • 1
  • 8
  • 30
  • `merge(A, B, by=c("miRNA", "Gene"), all.x=TRUE)`? – Khashaa May 04 '15 at 05:09
  • Thank you Khashaa, This is spot on – Cody Glickman May 04 '15 at 05:30
  • Already a deep explanation give here: http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right – Agaz Wani May 04 '15 at 05:54
  • Thanks Aaghaz!! I didn't find that question in my research on merging. My research focused on selectively merging items based on the content of multiple columns, I did not even think about SQL like statements. This will be a great resource for future data table integration. Cheers, Cody – Cody Glickman May 04 '15 at 21:51

1 Answers1

2

This is what the merge function does.

AB <- merge(A, B, by = c("miRNA", "Gene"), all = TRUE)

or if there are values in A that aren't in B and you want to remove those values, use

AB <- merge(A, B, by = c("miRNA", "Gene"), all.y = TRUE)
shadowtalker
  • 12,529
  • 3
  • 53
  • 96
  • Spot on, this solved the issue and answered a follow-up question as well. It seems initializing a column prior will have no effect. It will simply create another column with additions as NAs. Thank you!! – Cody Glickman May 04 '15 at 05:30