0

I have table 1 (subset):

A10L    2048.33333333334    537.666666666665    17  7   0.00035473  0.00056334
A11R    706 200 6   5   0.00037119  0.00110825
A12L    209.666666666667    57.3333333333332    3   1   0.00067166  0.00104651
A13L    14  3.99999999999999    0   0   0.00000000  0.00000000
A14L    154.333333333333    40.6666666666666    0   0   0.00000000  0.00000000
A15L    205 55.9999999999999    2   2   0.00039427  0.00144330
A16L    724.333333333333    184.666666666667    8   4   0.00044806  0.00087536
A17L    477 126 7   1   0.00067518  0.00032073
A18R    1000.66666666667    277.333333333333    10  5   0.00042343  0.00079922
A19L    167.333333333333    45.6666666666666    4   1   0.00119768  0.00088494

And table 2 (subset):

A10L    119355
A11R    121185
A12L    121954
A13L    122373
A14L    122723
A15L    123169
A16L    123863
A17L    124740
A18R    125801
A19L    126639

I was wondering how I could plot column 2 of table 2 on the x-axis and column 6 of table 1 on the y-axis? Basically Table 2 is the midpoint coordinate of genes and table 1 is some diversity values for genes.

In my real example, table 1 and table 2 are not in the same order, i.e. different order of genes in both tables, but same genes, and all genes from table 1 are present in table 2 but NOT vice versa, table 2 might have 2-3 extra genes that were not analyzed.

I suppose I could sort them both with bash sort -k1,1 option and then merge them, but this would require manual inspection for missing genes... Is there anything else I could do?

Thanks, Adrian

AdrianP.
  • 435
  • 2
  • 4
  • 14
  • 1
    Firstly, you need to create a data set with matched columns. Look [here](http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) for an excellent SO answer on how to do SQL style joins in R. Then you'll have no problem doing your plot with something like `ggplot` – Shawn Mehan Oct 16 '15 at 15:47
  • 2
    try `merge(table1[c(1,6)], table2)` – Pierre L Oct 16 '15 at 15:48

1 Answers1

0

Thank you all for your answers. Here is what I found to work perfectly (albeit the coding may not be the most efficient):

coord <- read.csv("coordinates.csv", header = F)
pi <- read.table("pi_output")
coord2 <- data.frame(gene = coord[,1], crd = coord[,2])
pi2 <- data.frame(gene = pi[,1], nsyn = pi[,6], syn = pi[,7])
pi3 <- merge(coord2, pi2, by.x = "gene", by.y = "gene", all = TRUE)

final_merged <- pi3[order(pi3$crd),]

In which coord is the equivalent of table2 and pi would have values from table1 posted in the sample above.

AdrianP.
  • 435
  • 2
  • 4
  • 14