0

I have a function that calculates the next unique value on the fly. These values are 100% unique except one. Each value is 256 bit, total values are 2128. The amount of space needed to store this many values is obviously too large: (2128 x 256 bits) How can i find these 2 duplicates indexes from starting point? I tried to XOR each value with next, but when function returns duplicate i cant figure out that this is duplicate because XOR excludes it from crc. simple example:

a=[755,167,986,343,566,996,245,343]
crc=a[0]
for i in range(1,len(a)):
    crc=crc ^ a[i]

this code makes crc like crc([755,167,986,566,996,245]) right answer for this example is 343 or #4 (3 if counting from 0) best code that i can produce is double for from ending element till start, and from current element of main for till end:

for i in range(maxX,1,-1):
    crc=crc^x1
    for j in range(i+1,maxX):
        if(x1==x2):
            break
        x2 = f(x2)#iterate x2 back like "f(x2,+1)"
     x1=f(x1) #iterate x1 forward like "f(x1,-1)"

Are there any subexponential algorithms to solve this problem?

function produces values like : f(x1)=13082069741720843297365566874415150979823939205374148607908286415326657186361

in fact there is no difference for me coding python/c/javascript/php/asm, so the code for answer accepted in any language Any fastest algorithm for finding 2 duplicates for this?

Win Pei
  • 1
  • 1
  • I might have not fully understood your question. Your function generates one 256 bit value at a time and you want to check if that value is already inside a list of generated values? – John Moody Jun 16 '22 at 20:46
  • yes. at least i want to know, is new value present in the list of already genereted values before it. of course using O(1) storage/memory – Win Pei Jun 17 '22 at 04:30
  • 1
    At 1000 GHz and one computed value per clock, you will need 10^20 years to compute all of them. The age of our Universe is 1.38*10^10 years. – n. m. could be an AI Jun 17 '22 at 07:45
  • I don't think XOR would work here : 10 ^ 01 ^ 11 = 00, but all three values are different. How do you use XOR to know when there as been a duplicate exactly ? And you say the function calculate the next value on the fly, but do you need to pass the precedent value as a parameter ? – Alois Christen Jun 17 '22 at 08:13
  • >How do you use XOR to know when there as been a duplicate exactly ? im trying to do this. i try invert last crc value , and compare it with current, but always get removed duplicates. function needs previous value for this, but i can pass arbitrary index to f(1234567) and get its value – Win Pei Jun 17 '22 at 09:01
  • >At 1000 GHz and one computed value per >clock, you will need 10^20 years to compute >all of them. The age of our Universe is >1.38*10^10 years. i know that, thats why i need to make computations faster than subexponential time. the main idea was to get next value, remove duplicates using xor, invert the result and xor? and? or? it to check if the crc changed, so determine that i have found dup. but i dont find out how to do this:( – Win Pei Jun 17 '22 at 09:09
  • Forget xor, or, and, and fancy words like subexponential. Here are 100 numbers: 92 51 11 69 24 35 17 36 26 98 67 39 83 2 75 56 59 18 32 40 91 86 57 12 14 42 27 62 63 58 30 96 13 68 3 87 71 64 9 22 66 80 20 79 89 81 73 0 94 41 88 28 52 43 16 60 49 7 84 72 29 61 6 45 53 76 8 33 37 15 25 55 70 31 82 47 48 10 90 34 23 74 77 19 85 44 93 54 97 1 38 95 4 21 42 5 78 65 50 46 There is one duplicate. Can you find it without looking at all 100? – n. m. could be an AI Jun 17 '22 at 12:16
  • the 10^20 years to compute is linear time : that's the time you need to call all f(xi) possible. You need to compute all numbers to find the duplicate, thus you need at least 10^20 years, no matter how you want to find the duplicate. ((expect if you have more information on the function f, then MAYBE you can go sublinear time) – Alois Christen Jun 17 '22 at 12:17
  • ok, i see. for 100 numbers the bad case is duplicate number will be "the last". but i want to find it in 1 pass. if i have a unlimited memory it will be trivial, saving each value to uniq hashtable, and "on error adding new value - thats it" . so i thought that hashing with xor helps to remove duplicates and invert it to find missing 1... but no luck. 10^20 years if i iterate all values, but i find duplicate faster , but not much, but there is a probability at least... function f(x) cant be edited so there is no case lookin at it – Win Pei Jun 17 '22 at 12:38
  • anyway thx you all for trying to help. i will try to find another way to find it. – Win Pei Jun 17 '22 at 12:40
  • Maybe we can find another way to help you, if we have more information about what you're trying to do. What is the concrete application of the function, and why do you need the find the duplicate ? – Alois Christen Jun 17 '22 at 13:16
  • One approach could be to avoid adding duplicate items in the first place. You can remove duplicates after adding each new item to the list, for example: If ```a``` is your list and ```i``` your new item. ```a.apprend(i)``` adds ```i``` to the list ```a```, and ```vlist(dict.fromkeys(a))``` returns ```a``` without duplicate items. You can use the same approach with *pandas* as well, probably more efficiently. The information here might also help you: https://stackoverflow.com/questions/9835762/how-do-i-find-the-duplicates-in-a-list-and-create-another-list-with-them – John Moody Jun 17 '22 at 19:57

0 Answers0