-4

I have 4 data sets that aren't the same size. The datasets of some overlapping data that I want to merge into the same column but each dataset also has some unique data that I was to keep as well. Maybe it will make a bit more sense with an example.

Glucose Fructose Ox_Phos 
CACNA1I PIK3CA FYN
PLCB2 FGFR1 ITGA2B
CACNG1 PIK3R1 PIK3CA
CACNA2D2 PIK3C2G PIK3R1
MAP3K11 PIK3R5 PIK3R5

TCA Ox_Phos Sucrose ATP 
GYG1 FYN MAP3k11 CACNA1I
NA ITGA2B CACNA2D2 ITGA2B
NA FGFR1 PIK3R5 NA
NA NA CACNG1 NA

Fructose Galactose 
PIK3CA CACNG1
FGFR1 NA
PIK3R1 NA
PIK3C2G NA
PIK3R5 NA

ADP 
PIK3CA
CACNG1
PIK3C2G
NA
NA

So as I said before, I am trying to merge these 4 data sets into one set of data. I want to merge columns with similar colnames but then also have the unique columns become a new column.. if that makes sense? Here is what I hope the data will look like.

Glucose Fructose Ox_Phos ADP TCA Sucrose ATP Galactose
CACNA1I PIK3CA FYN PIK3CA GYG1 MAP3k11 CACNA1I CACNG2
PLCB2 FGFR1 ITGA2B CACNG1 FYN CACNA2D2 ITGA2B NA
CACNG1 PIK3R1 PIK3CA PIK3C2G NA PIK3R5 NA NA
CACNA2D2 PIK3C2G PIK3R1 NA NA CACNG1 NA NA
MAP3K11 PIK3R5 PIK3R5 NA NA NA NA NA
NA NA MAP3k11 NA NA NA NA NA
NA NA CACNA1I NA NA NA NA NA
NA NA ITGA2B NA NA NA NA NA
NA NA FGFR1 NA NA NA NA NA
NA NA NA NA NA NA NA NA
NA PIK4CA NA NA NA NA NA NA
NA FGFR7 NA NA NA NA NA NA
NA PIK4R2 NA NA NA NA NA NA
NA PIK5C3G NA NA NA NA NA NA
NA PIK4R6 NA NA NA NA NA NA

I think this can be done easily with dplyr but I am just not sure how to keep the unique column. Thanks in advance. Any help would be amazing

neuron
  • 1,949
  • 1
  • 15
  • 30
  • Sounds like you're just describing the usual join behavior. Have you looked at https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right? – IceCreamToucan Jul 06 '18 at 12:20
  • @Ryan I did look at that but my dataset is hundreds of columns. This was just a small example. It seemed like they were just selecting one column. I wasn't sure how to scale that they were doing. I could be wrong though – neuron Jul 06 '18 at 12:24
  • 1
    The last dataset has a unique column `ADP` that is not present in any of the other datasets. What are the merging rules in that case? – Maurits Evers Jul 06 '18 at 12:27
  • I want it to become its own unique column. There are more unique columns like `Glucose ` `TCA` `Sucrose` `ATP` `Galactose` `ADP`. The only columns that are represented twice are `Fructose` and `Ox_Phose` – neuron Jul 06 '18 at 12:32
  • 2
    @Brian *"I want it to become its own unique column."* That doesn't make sense. You can't merge if you don't have a common key/column by which you merge entries from two datasets. – Maurits Evers Jul 06 '18 at 12:34
  • Then just combine that dataset then? – neuron Jul 06 '18 at 12:37
  • 1
    @Brian Combine *how*? That's exactly my question. What are the rules? – Maurits Evers Jul 06 '18 at 12:58
  • @MauritsEvers If two or more columns have the same header, then combine the columns. If there is a unique column name then keep it unique. I just want to combine columns like `Ox_Phos` and `Fructose` that are represented more that once in multiple data sets in one columns and then have `Glucose` `TCA` `Sucrose` `ATP` `Galactose` `ADP` to be their own column because they aren't present in any other dataset but the one they are in – neuron Jul 06 '18 at 13:04
  • 2
    The point that @MauritsEvers is making is that there isn't a clear way to add the `ADP` column to the rest of the data. The rows in other sets can be matched on common columns, but your example relies on an implied assumption that the rows in `ADP` correspond to the rows in your first data set. (Further, there are also values of `Fructose` present in your desired output that aren't included in any of your data sets.) – Mikko Marttila Jul 06 '18 at 13:09
  • I don't think you are quite clear on the concepts of "merging"; from what I understand, you want to perform a full outer join based on multiple datasets; I'm giving an example using the first three datasets below; please take a look. – Maurits Evers Jul 06 '18 at 13:09
  • 2
    Perhaps also take a look at [this post](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) to better understand the different types of joins (merges). – Maurits Evers Jul 06 '18 at 13:11

1 Answers1

2

You don't explain how you plan on merging the last dataset that has no common key(s) with any of the other datasets, so I will omit the fourth dataset in the following example.

To merge the three datasets, we can use dplyr::full_join which performs a full outer join and automatically picks up and merges entries by common keys.

Reduce(dplyr::full_join, list(df1, df2, df2))
#   Glucose Fructose Ox_Phos  TCA  Sucrose     ATP
#1  CACNA1I   PIK3CA     FYN GYG1  MAP3k11 CACNA1I
#2    PLCB2    FGFR1  ITGA2B <NA> CACNA2D2  ITGA2B
#3   CACNG1   PIK3R1  PIK3CA <NA>     <NA>    <NA>
#4 CACNA2D2  PIK3C2G  PIK3R1 <NA>     <NA>    <NA>
#5  MAP3K11   PIK3R5  PIK3R5 <NA>     <NA>    <NA>
#6     <NA>     <NA>   FGFR1 <NA>   PIK3R5    <NA>
#7     <NA>     <NA>    <NA> <NA>   CACNG1    <NA>

Sample data

df1 <- read.table(text =
    "Glucose Fructose Ox_Phos
CACNA1I PIK3CA FYN
PLCB2 FGFR1 ITGA2B
CACNG1 PIK3R1 PIK3CA
CACNA2D2 PIK3C2G PIK3R1
MAP3K11 PIK3R5 PIK3R5", header = T, stringsAsFactors = F)

df2 <- read.table(text =
    "TCA Ox_Phos Sucrose ATP
GYG1 FYN MAP3k11 CACNA1I
NA ITGA2B CACNA2D2 ITGA2B
NA FGFR1 PIK3R5 NA
NA NA CACNG1 NA", header = T, stringsAsFactors = F)

df3 <- read.table(text =
    "Fructose Galactose
PIK3CA CACNG1
FGFR1 NA
PIK3R1 NA
PIK3C2G NA
PIK3R5 NA", header = T, stringsAsFactors = F)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Hey, I really appreciate the help. I am just getting into R and I do struggle a bit with the terminology. What you wrote worked out perfectly. I really appreciate the help. I wouldn't have been able to figure this without your help, so I am truly thankful. – neuron Jul 06 '18 at 13:15
  • 1
    No worries @Brian; yep, terminology can definitely be confusing; been there as well;-) take a look at the link on the different joins I gave above; lots of useful information in there. Good luck with your work! – Maurits Evers Jul 06 '18 at 13:18
  • I have them pulled up right now. Thanks so so so much. I really appreciate you sticking in there for me – neuron Jul 06 '18 at 13:20
  • 2
    Again you're very welcome @Brian; don't worry about the down-votes and do stick around! – Maurits Evers Jul 06 '18 at 13:22