2

I want to implement algorithms like quicksort, mergesort, etc., while working with a big data files, like 100 million elements. I can't use std::vector, std::sort, or anything like that, since this is a school assignment to get to know these specific algorithms. I can only use things that I will only write on my own.

Should I implement sorting algorithms using linked lists or arrays? Which from these two is more efficient in term of working with big data? What are the advantages of using one of them?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
77jt777
  • 69
  • 1
  • 8
  • *I can't use STL* -- You mean `std::sort`? If not, why not? Also, if you want space efficiency, a linked list is not a good option, as each item requires a previous and next pointers. In addition to that, a linked list is not cache friendly. Other than that, please [see this](https://stackoverflow.com/questions/24650626/how-to-implement-classic-sorting-algorithms-in-modern-c), as that shows you various implementations of most sorting algorithms using modern C++. – PaulMcKenzie May 02 '21 at 10:36
  • @PaulMcKenzie I can't use even std::vector, or anything like that, I can't tell you why, it's my proffesor's idea. What I mean by that is that I can use things that I will only write on my own. The main problem is just what is faster, implementing algorithm on an array or a list? – 77jt777 May 02 '21 at 10:39
  • I didn't ask about vector. I asked about `std::sort`. That is an algorithm function, not a container. The STL is more than containers. As far as faster, I already mentioned that linked lists are not cache friendly. – PaulMcKenzie May 02 '21 at 10:40
  • @PaulMcKenzie I can't use it because the main idea is to implement some algorithms, I got these specified, like mergesort, quicksort, introspective sort etc... I just can't sort it using a given algorithm because the main task is to get to know these specific algorithms. – 77jt777 May 02 '21 at 10:43
  • @77jt777 At the link in my previous comment, there is no usage of `std::sort` at all, and all of those sorting methods have been implemented. So you really need to be specific as to what you can or cannot use. – PaulMcKenzie May 02 '21 at 10:44
  • @PaulMcKenzie It's just straightforward question: to implement quicksort,mergesort should I use array or list? – 77jt777 May 02 '21 at 10:47
  • I guess you didn't read my comment. By not acknowledging what I already mentioned, it looks more like you want code. – PaulMcKenzie May 02 '21 at 10:48
  • @PaulMcKenzie you can trust me, I don't want any code, maybe I just don't understand things you are saying to me, I don't know how to connect question ,,which of these two is more effecient" with std::sort and a website that you gave me. I have to pick between list and an array, there are only these 2 methods that I can use, and pick which one is more worth it. – 77jt777 May 02 '21 at 10:54
  • 1
    Do you know what cache-friendly means? Do you know that for each data item, a linked list requires more memory? So given that, which one would you choose to sort 100 million elements? I thought my previous comments would make the choice obvious. – PaulMcKenzie May 02 '21 at 10:56
  • @PaulMcKenzie I just still wasn't sure... Thank you a lot and sorry for my stupidity. – 77jt777 May 02 '21 at 10:59

1 Answers1

1

If the number of elements is large, the better option would be an array (or any type that has contiguous memory storage, i.e. a std::vector, allocating with new [], etc.). A linked list usually does not store its nodes in contiguous memory. The contiguous memory aspect leads to better cache friendliness.

In addition to this, a linked list (assuming a doubly-linked list), would need to store a next and previous pointers to the next and previous elements for each data item, thus requiring more memory per data item. Even for a singly-linked list, a next pointer has to exist, so even though less overhead than a doubly-linked list, it is still more overhead than an array.

Another reason that isn't related to efficiency why you want to use an array is ease of implementation of the sorting algorithm. It is more difficult to implement a sorting algorithm for a linked list than it is for an array, and especially an algorithm that works with non-adjacent elements.

Also, please note that std::sort is an algorithm, it is not a container. Thus it can work with regular arrays, std::array, std::vector, std::deque, etc. So comparing std::sort to an array is not a correct comparison.

PaulMcKenzie
  • 34,698
  • 4
  • 24
  • 45