-1

I've used full.joint to combine two tables:

fsts = full_join(fstvarcal, fst, by = "SNP")

And this had the effect of grouping 1st rows for which there were values for the two datasets, followed by rows for which there were values for the 1st dataset only (and NAs for the 2nd), followed by rows for which there were values for the 2nd dataset only (and NAs for the 1st).

I'm now trying to order by natural order.

Looking for the equivalent of sort -V -k1 in bash.

I've been tried:

library(naturalsort);

fstordered = fsts[naturalorder(fsts$SNP),]

which works, but it's very slow.

Any faster ways of doing this? Or of doing merging the two datasets without loosing the natural order?

I have:

SNP fst
scaffold_0   0.186473
scaffold_9   0.186475
scaffold_10  0.186472
scaffold_11  0.186470
scaffold_99  0.186420
scaffold_100 0.186440

and

SNP fstvarcal
scaffold_0    0.186472
scaffold_8    0.186475
scaffold_20   0.186477
scaffold_21   0.186440
scaffold_999  0.186450
scaffold_1000 0.186420

and wan to combine into

SNP fstvarcal fst
scaffold_0    0.186472 0.186473
scaffold_8    0.186475    NA
scaffold_9       NA    0.186475
scaffold_10      NA    0.186472
scaffold_11      NA    0.186470
scaffold_20   0.186477   NA    
scaffold_21   0.186440   NA    
scaffold_99      NA    0.186420
scaffold_100     NA    0.186440
scaffold_999  0.186450   NA    
scaffold_1000 0.186420   NA  
  • 1
    Please make this a *reproducible question*, refs: https://stackoverflow.com/questions/5963269/, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. Starters: list all non-base packages being used (`dplyr`?); include small consumable data, and not as an image, I will not transcribe from an image, suggest `dput(head(x))`; include code for other functions relevant to the question, such as `naturalorder`. – r2evans Sep 18 '18 at 23:51
  • 1
    Can you please provide *representative* sample data for `fsts`. From what I can tell `fsts` will have the same alphabetical and natural ordering. – Maurits Evers Sep 19 '18 at 00:00
  • k just updated the info. The table goes from scaffold_0_14500 to scaffold_3015_5000 (for which there's data for both BP.x and BP.y) then jumps back into scaffold_0_1000 and goes up to scaffold_3015_5500 (for which there's data only for BP.y). – Madza Farias-Virgens Sep 19 '18 at 00:52

1 Answers1

1

Perhaps you can do the following:

I generate some representative sample data first.

set.seed(2018)
df <- data.frame(
    SNP = sprintf("scaffold_%i", 1:1000),
    val = rnorm(1000))
df <- df[df$SNP, ]

We now use tidyr::separate to separate SNP into "id" and "no", and arrange rows by "id" and "no" to ensure natural ordering (convert = T automatically converts "no" to an integer column vector).

library(tidyverse)
df %>%
    separate(SNP, into = c("id", "no"), remove = F, convert = T) %>%
    arrange(id, no) %>%
    select(-id, -no)
#               SNP           val
#1       scaffold_1 -0.4229839834
#2       scaffold_2 -1.5498781617
#3       scaffold_3 -0.0644293189
#4       scaffold_4  0.2708813526
#5       scaffold_5  1.7352836655
#6       scaffold_6 -0.2647112113
#7       scaffold_7  2.0994707023
#8       scaffold_8  0.8633512196
#9       scaffold_9 -0.6105871453
#10     scaffold_10  0.6370556066
#11     scaffold_11 -0.6430346953
#...
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Hey Thanks! Not sure this is what I'm looking for. I have rephrased the question to make it easier to understand. – Madza Farias-Virgens Sep 19 '18 at 01:59
  • @MadzaYasodaraFariasVirgens I'm sorry but at least for me, your edit doesn't do much. Just combine the two `data.frame`s, and then sort by `SNP` as I show in my answer. That will give you natural ordering of `SNP` entries in your `data.frame`. – Maurits Evers Sep 19 '18 at 03:15