I am trying to get the unique lines in a file with multiple columns.
My file "file.txt" contains sample record below
20230830,52678,004,Apple,21
20230830,52678,004,Apple,20
20230830,52678,004,Apple,19
20230831,47689,001,Orange,15
20230901,47620,002,Grape,29
My desired output is to print only uniques lines from column 1 to 4. Regardless of the value on their column 5
20230831,47689,001,Orange,15
20230901,47620,002,Grape,29
I tried using sed to add a unique separator between columns 1-4 and column 5
And then I use awk command to get unique lines from col 1-4
sed 's/,/|/4' file.txt | awk -F"|" '{arr[$1]++} END{for(i in arr) if(arr[i]==1) print $0}'
With this code, it works with small set of data but when I use in a file with 1000 lines, I get...
20230831,47689,001,Orange,15
20230831,47689,001,Orange,15
20230831,47689,001,Orange,15
20230831,47689,001,Orange,15
...
unique values keeps on comming. They are duplicating. Seems like I'm only getting one unique line and it's keeps duplicating.
Can you help me if there's something wrong with my code?
I am expecting to print only unique lines like this
20230831,47689,001,Orange,15
20230901,47620,002,Grape,29