2

I have two dataframes, each with two columns - the chromosome name and the counts of that chromosome which I want to plot chronologically. However, I have problems if some chromosomes aren't counted at all. Below is a small sample of my data:

df1$chrom
chr1 chr10 chr3 chr4 chr5
df1$count
1 2 1 4 5

and

df2$chrom
chr1 chr10 chr3 chr5
df2$count
1 4 3 1

To order them chronologically I'm using factor

chrOrder <-c(paste0("chr",1:22),"chrX","chrY")
df1$chrom <- factor(df2_8$Chromosome, chrOrder, ordered=TRUE)
df1<- df1[do.call(order, df1[, c("chrom ", "count")]), ]

which for df1 gives me

df1$chrom
chr1 chr3 chr4 chr5 chr10
df1$count
1 1 4 5 2

And it also works for the second dataframe.

But in order to plot them effectively I need the second dataframe to contain a 0 for chromosome 4 which hasn't been counted in this data.

df2$chrom
chr1 chr3 chr4 chr5 chr10
df2$count
1 3 0 1 4

I've tried when using factor to add NA and then replace it with 0, using addNA but it doesnt work. Could anyone help me? Thank you.

My question is similar to sort by chromosome name but I'm not sure how to solve the specific part of my problem.

Zophai
  • 104
  • 9
RAHenriksen
  • 143
  • 2
  • 12

1 Answers1

1

You could use tidyr::complete to get missing chrom values from df1

df3 <- tidyr::complete(df2, chrom = factor(chrom, levels = levels(df1$chrom)), 
                fill = list(count = 0))

# chrom count
#  <chr> <dbl>
#1 chr1      1
#2 chr10     4
#3 chr3      3
#4 chr4      0
#5 chr5      1

To sort them we could use gtools::mixedorder

df3[gtools::mixedorder(df3$chrom), ]

# chrom count
#  <chr> <dbl>
#1 chr1      1
#2 chr3      3
#3 chr4      0
#4 chr5      1
#5 chr10     4

Or make something custom

df3[order(as.integer(gsub("[^0-9]", "", df3$chrom))), ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213