-3

i want to make an efficient program in c that reads elements in a file and put it into an array. My idea was to have a function to look for each element if exists in the array, but it takes too long and each file has like millions of elements(duplicated). Do you have an idea of how insert these elements in the array without duplicate?

Note: these elements are numbers with 10 digits, a matrix containing the occupied places it's not an option

  • Is there something wrong with using a hashmap to keep track of each element already in the array? – Daniel Jan 31 '19 at 22:18
  • 1) i'm using c 2) Using a hashmap has the same problem, you look one by one searching if the element its repeated, so it has the same O() – Agustin Ferrante Jan 31 '19 at 22:22
  • 1
    @AgustinFerrante -- *Using a hashmap has the same problem* -- No it doesn't. if you have a map of integer and the count associated with the integer, how is that the same complexity? – PaulMcKenzie Jan 31 '19 at 22:24
  • @PaulMcKenzie How do you associate a key to a value of 10 digits, i mean how do you search it? – Agustin Ferrante Jan 31 '19 at 22:29
  • 1
    I know this is `C`, but how does a `std::map` or `std::unordered_map` work in C++? It works by using key/value pairs. Your issue has more to do with data structures and understanding them. Making the claim that a hasmap or hash table is slow to look up a value, where you go through the keys one-by-one is totally incorrect. Why do hash tables exist? Not for reasons for curiousness but for real reasons, i.e. fast lookup times. – PaulMcKenzie Jan 31 '19 at 22:32
  • 1
    @AgustinFerrante How much about hashmaps do you know? To search for it, you using a hashing algorithm to convert the input into an output in O(1) time (ignoring collisions). – Daniel Jan 31 '19 at 22:33
  • Thats the point, how do you avoid collisions. What would be a good algorithm for extremely high numbers? – Agustin Ferrante Jan 31 '19 at 22:36
  • You can do what you want in C, but it has the potential to be a lot of work. Are you open to using a different language that has tools to do this easily built in? – user4581301 Jan 31 '19 at 22:37
  • @AgustinFerrante The way it's usually done is to use modulus. There is a lot of information on the web that explains these aspects. The issue is that you are using a bare-bones language (C) that doesn't have hash table available. – PaulMcKenzie Jan 31 '19 at 22:37
  • It's for an university task, i cant use another language – Agustin Ferrante Jan 31 '19 at 22:38
  • @AgustinFerrante frankly it sounds like you don't understand hashmaps. It's starting to seem more like a "help I don't know how to implement a hashmap". [That's already been answered on stack overflow](https://stackoverflow.com/questions/838404/implementing-a-hashmap) – Daniel Jan 31 '19 at 22:39
  • Hashmaps have O(1) average case lookups in well implemented versions. You don't need to believe me, you [can check wikipedia](https://en.wikipedia.org/wiki/Hash_table). Avoiding collisions is implementation specific, and there's many good implementations. – Daniel Jan 31 '19 at 22:41
  • You could make a 10 billion element array of booleans. Set the whole array to false. Take the number from the file and use it as the index in the array. If the element is false, set it to true and store the number. If it's true, you've got a duplicate. Discard. – user4581301 Jan 31 '19 at 22:42
  • 1
    @AgustinFerrante It also looks like the university expected you to do independent research into hashing and hash maps. – PaulMcKenzie Jan 31 '19 at 22:44
  • They could also be wanting the asker to figure out resizable arrays and binary search. Get the task down to a nice, respectable O(log(N)) with some O(N) pain whenever they need to ordered-insert. – user4581301 Jan 31 '19 at 22:50
  • 1
    Yes, I'm believing that we're doing the OP's homework. The question seems to be designed for the student to do the research on how to cut down the time, and not get ready-made answers here. – PaulMcKenzie Jan 31 '19 at 23:01

1 Answers1

0

If you want to store all the element just once you can use STL set in c++(a example of doing this in c is also given below). It store a value no more than one time. You can declare a set like this:

set<int>sett;

To insert a value in set

sett.insert(6);

To print all the value in set:

for(auto a: sett){
    cout<<a<<endl;
}

You can go to geekforgeek and cplusplus website for more information.

If you want to track a number is in array or not, you can do it efficiently like this:

Take a Boolean Array or Map and make it's all initial value false. After scanning from the file, if the number is N than make the Nth index of Boolean Array or Map true.

Now if you want to check number M is in the array or not. You can easily get this by checking if the Mth index of Boolean array or Map is true or not in just O(1) time.

See the example bellow, this is done using Boolean array. Remember to declear the boolean array globally if the size of boolean array is to large. You can do this by using STL Map too in c++. For learning more about SLT MAP visit cplusplus , cppreference, geeksforgeeks.

#include <stdio.h>
#include <stdbool.h>

bool mapp[999999999] = {false};

int main()
{
    int i, n, t, m, ara[1000];
    printf("Input the size of array: ");
    scanf("%d", &n);
    for(i = 0; i < n; i++){
        scanf("%d", &ara[i]);
        mapp[ara[i]] = true;
    }
    printf("Input number of time you want to check a number in array or not: ");
    scanf("%d", &t);
    while(t--){
        scanf("%d", &m);
        if(mapp[m] == true)
            printf("Number Exists\n");
        else
            printf("Number Not Exists\n");

    }
}

You can also use this trick to don't include duplicate value in array. To do that, if the inputed number is 'a'. Check if mapp[a] is already marked as true or not. Include 'a' in array only and only is if mapp[a] != ture. This little change will exclude all duplicate value from your array.

scanf("%d", &n);
j = 0;
for(i = 0; i < n; i++){
    scanf("%d", &a);
    if(mapp[a] != true){
        mapp[a] = true;
        ara[j++] = a;
    }
}
Niloy Rashid
  • 687
  • 4
  • 16
  • when I commented there are both C and C++ tag in the post. I try to help accordingly – Niloy Rashid Feb 01 '19 at 00:12
  • If only you read the first few words "i want to make an efficient program *in c*" you would have been able to realize that it was in C before expending such effort – Daniel Feb 01 '19 at 00:13
  • sometimes people refer c, c++, c++11, c++14, c++17 all together as c. And the two tag of the the post convinced me it will be useful if i write solution in any variant of C. Thanks for you comment i will edit the second part of my and Change it in C. Thanks again – Niloy Rashid Feb 01 '19 at 00:34