I have a task at my university and still can not get it.
On input I have N
<= 1000000 (how many strings will be next) and strings. Strings are less then 1000 chars.
I need to print one number stdout - value how many unique strings. How to do that? The major restriction is that I can use only numpy lib, 5 seconds limit time and 5 Mb RAM (!). It also says the answer is correct if it's not more than 5% difference from real answer.
I tried this code:
import numpy as np
N = int(input())
a = np.array([])
for i in range(N):
x = input()
if not np.any(a == x):
a = np.append(a, x)
print(len(a))
but it took 12 Mb and 97 ms. Another code:
N = int(input())
results = np.empty(N, dtype=object)
for i in range(N):
results[i] = input()
print(len(np.unique(results)))
..it took 10 Mb
Any ideas how to get that? :)
UPDATED: I don't know what a.. but I checked now this code:
N = int(input())
a = np.array([])
cc = 0
for i in range(N):
x = input()
cc += 1
if cc < 500:
if not np.any(a == x):
a = np.append(a, x)
print(len(a))
and it showed me 81ms and 8.7Mb. How is that possible if I filled only 500 elems in array?
TEST 3:
this took 98ms and 6.36Mb (almost 5!)
N = int(input())
s = set()
for i in range(N):
x = input()
s.add(x)
print(len(s))
TEST 4:
this took 98ms and 5.41Mb.
import hashlib
N = int(input())
s = set()
for i in range(N):
x = input()
s.add(hashlib.md5(x.encode()).hexdigest())
print(len(s))
TEST 5:
5.32Mb
import hashlib
N = int(input())
s = set()
s_add = s.add
for i in range(N):
s_add(hashlib.md5(input().encode()).hexdigest()[:-3])
print(len(s))
TEST 6:
98ms and 5.63Mb
import hashlib
import itertools
N = int(input())
s = set()
s_add = s.add
for _ in itertools.repeat(None, N):
s_add(str(abs(hash(input())))[:-3])
print(len(s))
TEST 7:
179ms and 6.92Mb
import itertools
N = int(input())
s = set()
s_add = s.add
for _ in itertools.repeat(None, N):
s_add(abs(hash(input())))
print(len(s))
TEST 8:
286ms and 5.15Mb
N = int(input())
s = set()
s_add = s.add
for i in range(N):
s_add(abs(hash(input())))
print(len(s))