1

I read in Goodrich's Data Structures and Algorithms book that it is possible to use multiple iterators for read only operation on a linked-list, but I quite don't understand how to implement it.

I'm trying to search if a String DNAFragment exists in a List of WholeDNA and add only those DNAs to separate results list. And I want to speed up the searching. I'm open to any suggestions, Thank you in advance.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
chin2
  • 11
  • 1
  • 6
    What they are saying is basically that you can simply call `iterator()` multiple times and each of them iterates independently, that's it. Nothing fancy or fast or special about this. – luk2302 Dec 20 '22 at 20:22
  • Please provide enough code so others can better understand or reproduce the problem. – Community Dec 20 '22 at 20:45
  • 1
    You should not use `LinkedList` in Java (and maybe not an equivalent in any other language). `ArrayList` is fine for your purposes. The same holds true, though: you can have multiple iterators iterating it as long as no modifications are made to the list. – Ole V.V. Dec 20 '22 at 21:27
  • 1
    I do not understand how 2 iterators can help you with speed up. Iterator is used to iterate all elements, looks like you want to iterate from different points. Maybe do you want to use multithreading? – Rustam Dec 20 '22 at 21:30
  • "You should not use LinkedList in Java" is just wrong. E.g. if you frequently iterate the list and remove some elements or insert elements then ArrayList is a stupid choice. As the name suggests ArrayList works best if you want to work with the index and no insert/remove, only append data at the end. – maraca Dec 20 '22 at 22:05
  • It could also mean the `descendingIterator()` and the standard `iterator()` which would give you a descending and an ascending iterator. – Janos Vinceller Dec 20 '22 at 23:03
  • @maraca I know that theory, of course. In my 25+ years as a professional programmer I have never seen a real-world example were it applied. Have you? – Ole V.V. Dec 21 '22 at 05:42
  • @OleV.V. yes I have. Often ArrayList is better, especially for a certain amount of data, but if you only have a few, let's say 1 to 100 elements and are always iterating in order, then LinkedList is better afaik and this case is happening every now and then. Also addFirst and removeFirst can be useful, but they are not on the List interface (although in most cases a Queue could be the better choice then). – maraca Dec 21 '22 at 10:36
  • I was just wondering if there's a way to iterate through half of the list in parallel, if that is a possibility? – chin2 Dec 21 '22 at 21:26
  • Does this answer your question? [Java 8: Parallel FOR loop](https://stackoverflow.com/questions/27558605/java-8-parallel-for-loop). Search for more. – Ole V.V. Dec 22 '22 at 08:09
  • 1
    @maraca why should a `LinkedList` be better than `ArrayList` for 1 to 100 elements? That’s within the range were even inserting and removing at arbitrary indices is faster with `ArrayList`. – Holger Jan 04 '23 at 08:56
  • I didn't want to start a discussion. There are many benchmarks out there comparing both. My point is that it always depends on the use-case and the theory is not just theory. Sorry if I have given a bad example, but we had cases where LinkedList was faster in practice. Just can't recall, but you can more or less take anything with add/remove not at the end. Anyway I think it's a minor issue, e.g. a more common mistake is that someone uses list where hash set should have been used. – maraca Jan 05 '23 at 03:19

1 Answers1

2

You don't really need lists, except maybe to read the data from somewhere. Afterwards you can put the data in an array. This makes it very easy to split the work (runnables) onto several workers (threads in threadpool) by partitioning the indexes. You can use any of the concurrent data structures to store the matching indexes.

Also because you are working with DNA you only need 2 bits to store one element (4 acids). Instead of strings you could have int arrays (values 0 - 3) or BitSets. Like this you could easily calculate a rolling hash. This could improve the search speed of a fragment in whole DNA a lot, especially if the fragments are long.

maraca
  • 8,468
  • 3
  • 23
  • 45
  • 1
    Constructive (+1). How many bits are required for the OP’s data depends. They could for example be [IUPAC Codes](https://www.bioinformatics.org/sms/iupac.html). The OP knows better. – Ole V.V. Dec 21 '22 at 05:47
  • 1
    @OleV.V. Interesting. In this case rolling hash wouldn't work because some letters can represent others. Needs special matching. – maraca Dec 21 '22 at 12:02
  • 1
    …and this could be combined with upcoming Vector API, so you might be able to process 256 or 512 bits at once with your CPU. – Holger Jan 04 '23 at 08:58