Rigth tool for the job: put your records into a database. Unless you have a Postgres or MySQL installation handy already, I'd take sqlite.
$ sqlite3 uniqueness.sqlite
create table chk (
ident char(n), -- n as in first n characters
lineno integer -- for convenience
);
^D
Then I'd insert the unique identifier and line number into that table, possibly using a Python script like this:
import sqlite3 # install pysqlite3 before this
n = ... # how many chars are in the key part
lineno = 0
conn = sqlite3.connect("uniqueness.sqlite")
cur = conn.cursor()
with open("giant-file") as input:
for line in input:
lineno +=1
ident = line[:n]
cur.execute("insert into chk(ident, lineno) values(?, ?)", [ident, lineno])
cur.close()
conn.close()
After this, you can index the table and use SQL:
$ sqlite3 uniqueness.sqlite
create index x_ident on chk(ident); -- may take a bit of time
-- quickly find duplicates, if any
select ident, count(ident) as how_many
from chk
group by ident
having count(ident) > 1;
-- find lines of specific violations, if needed
select lineno
from chk
where ident = ...; -- insert a duplicate ident
Yes, I tried most of this code, it should work :)