3

I'm having some trouble using Velveth to assemble reads downloaded from the NCBI SRA.

The command I used was:

velveth velvet 27 -fastq -shortPaired -interleaved /home/bilalm/H_glaber_quality_filtering/AfterQC/good_reads/SRR530529.good.fq

(velvet - the directory), (27 - hash length), (fastq - the file format), (shortPaired - is the read type), (interleaved - file contains paired reads interleaved in the one file).

But the assembly process has stopped prematurely with the error message: Killed

The tail of the error message is:

Inputting sequence 180000000 / 193637763
Inputting sequence 181000000 / 193637763
Inputting sequence 182000000 / 193637763
Inputting sequence 183000000 / 193637763
Inputting sequence 184000000 / 193637763
Inputting sequence 185000000 / 193637763
Inputting sequence 186000000 / 193637763
Inputting sequence 187000000 / 193637763
Inputting sequence 188000000 / 193637763
Inputting sequence 189000000 / 193637763
Inputting sequence 190000000 / 193637763
Killed

The Velveth version is v.1.2.09. The fastq file size is: 52G

What has happened? Why has the whole process been killed? A 'Log', 'Roadmaps' and 'Sequences' file have been created but no .config file.

Cheers, Billy.

Billy
  • 69
  • 5
  • 2
    My guess is that it's using a lot of memory and the out of memory killer killed it. What does `dmesg` say after the process is stopped? – shay Jul 30 '20 at 14:33
  • **Many lines of** [UFW BLOCK] IN=enp0s31f6 OUT= MAC=01:00:5e:00:00:01:00:18:71:b2:04:00:08:00:00 SRC=143.53.223.100 DST=224.0.0.1 LEN=32 TOS=0x00 TTL=1 ID=45821 PROTO=2 @shay – Billy Jul 30 '20 at 15:16
  • Those are all firewall related messages. Maybe try something like `dmesg | grep oom -A 150` and see if anything shows up? Also, which linux distribution are you running? – shay Jul 30 '20 at 16:08
  • **The code above yields this message again** [15032959.993886] [UFW BLOCK] IN=enp0s31f6 OUT= MAC=01:00:5e:00:00:01:00:18:71:b2:04:00:08:00 SRC=143.53.223.100 DST=224.0.0.1 LEN=32 TOS=0x00 PREC=0x00 TTL=1 ID=45595 PROTO=2 @shay – Billy Jul 30 '20 at 17:42
  • **The Linux distribution is** DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS" @shay – Billy Jul 30 '20 at 17:47
  • It looks like your run almost completed. Do you get the same error if you try a larger hash size (e.g. 31) ? – jared_mamrot Jul 31 '20 at 12:02
  • its running, I'll let you know when its done (using#31) @jared_mamrot – Billy Aug 01 '20 at 22:19
  • 2
    In the meantime, try using https://github.com/tseemann/VelvetOptimiser to estimate memory usage for different velvet options. If the estimate is ~close to your machine's RAM capacity you will need to look at a bigger server, filtering reads more stringently or normalizing/subsampling your reads. There are so many hurdles with assembly that you might find it a better use of your time to use other people's resources e.g. http://www.naked-mole-rat.org/downloads/ (BLAST database, ABySS-assembled genome, genes/proteins/scaffolds files, etc), but obviously it depends on your goals / intended usage – jared_mamrot Aug 02 '20 at 00:50
  • Hi @jared_mamrot I re-ran Velveth with #31 and I got the same "Killed" message: Inputting sequence 158000000 / 193637763 Inputting sequence 159000000 / 193637763 Inputting sequence 160000000 / 193637763 Inputting sequence 161000000 / 193637763 Inputting sequence 162000000 / 193637763 Inputting sequence 163000000 / 193637763 Inputting sequence 164000000 / 193637763 Inputting sequence 165000000 / 193637763 Inputting sequence 166000000 / 193637763 Inputting sequence 167000000 / 193637763 Inputting sequence 168000000 / 193637763 Killed – Billy Aug 02 '20 at 14:56
  • My thesis title: Working with published Illumina/PacBio sequencing data from the NCBI, is it possible to make an improved de novo assembly from either the reads of one of both NMR genomes. Thanks for the direction @jared_mamrot I would need sudo privileges to install VelvetOptimiser so I've requested that to be downloaded onto the terminal – Billy Aug 02 '20 at 15:06
  • 1
    Based on the stage this job was killed (168000000 / 193637763 at k=31 vs 190000000 / 193637763 at k=27) you need more RAM to assemble the genome using those options (as @Shay said). You could try further filtering your reads to reduce the size of the dataset, you could try compiling Velvet with the flag 'BIGASSEMBLY=0' instead of 'BIGASSEMBLY=1', or you could try using smaller has sizes (i.e. k=23) to overcome the issue, but the best 'fix' is to move to a larger server or use different software. – jared_mamrot Aug 03 '20 at 03:34
  • Hi @jared_mamrot I tried re-running with a hash value of #23 the code used: `velveth velvet_3 23 -fastq -shortPaired -interleaved /home/bilalm/H_glaber_quality_filtering/AfterQC/good_reads/SRR530529.good.fq` this gave me the result: === sequences loaded in 47348.688849 s === Done inputting sequences destroying splay table splay table destroyed. The assembly has stopped, is this another issue with RAM? – Billy Aug 05 '20 at 12:44
  • 1
    If there are no further error messages check whether you have logs, roadmaps, and sequences files. If you do, the run finished successfully (see: http://seqanswers.com/forums/showthread.php?t=42786) and you can move on to `velvetg` – jared_mamrot Aug 05 '20 at 23:26
  • 1
    @jared_mamrot great thankyou. Currently velvetg is reading the roadmaps file. – Billy Aug 06 '20 at 09:59

0 Answers0