0

I have a file with following columns:

seqname start SNP gene
ch01    21900 ch01_21900 H024400 

I have a gff file with following columns:

V1 V2 V3 V4 V5 V6 V7 V8 V9
ch01 MAKER cds 21882 22007 . - 0 ID=H0224400

I want the difference in base pair between start column of File1 and V4 column of gff file. I used:

distance<-File1$start - gff$V4

It did not output anything. File1 has 20 rows and gff file 10000 rows. I want to calculate the difference for same genes (Column gene in File1 and Column V9 in gff file). thank you!

  • 2
    [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. Right now it's unclear what you're looking at or what's going wrong – camille Jun 01 '21 at 19:45
  • 1
    When you say *"difference for same genes"*, that sounds to me like a merge/join-operation *first*, and then differencing *second*. The concept is not difficult, and it might be as simple as `merge(df1, df2, by.x="seqname", by.y="V1", all=TRUE)`, but it does rely on *equality*. With your sample data here, we see both `"CH01"` and `"ch01"`, which are not the same. I suggest you look at https://stackoverflow.com/q/1299871/3358272 and https://stackoverflow.com/a/6188334/3358272 for merge mechanics and try on your code. (Also improve your sample-data quality here, please.) Thanks! – r2evans Jun 01 '21 at 20:06

0 Answers0