Regarding duplicate entries from a file

Asked Jun 11 '14 at 14:14

Active Jun 11 '14 at 14:14

Viewed 27 times

Is there an awk script to remove duplicate entries from a file. My file is a txt file but is very large,has 7 million rows each row containing some entries separated by tabs. I read it through awk .I want to see if combination of two variables let say 1 and 15 is same in two rows than i want to keep only the row with there first occurence, is there a way to do it using awk script?

asked Jun 11 '14 at 14:14

calor

you said it has 7m lines, can you tell how big the file is? the size. – Kent Jun 11 '14 at 14:22
@Kent , it's size is 1.4 Gb – calor Jun 11 '14 at 14:23
1

@fedorqui ,uhr... I thought in a wrong way.. you are absolutely right..... forgive me, time to coffee... – Kent Jun 11 '14 at 14:24
@TapanBohra try the cmd in fedorqui's comment. – Kent Jun 11 '14 at 14:25
1

`awk -F"\t" '!a[$1 FS $15]++' file` this should work for you... – Kent Jun 11 '14 at 14:27
@Kent , how to write this command in an awk file, sorry i am very new to awk . I want to write this in awk file because i have to extract some other columns also from this file . THis is my awk file currently. BEGIN{FS="\t"; OFS="\t"; } { if($16=="256" ) print $1, $5, $10, $13, $17; } – calor Jun 11 '14 at 14:37
Since this question is closed you should ask a new one. – Jotne Jun 12 '14 at 06:00

Regarding duplicate entries from a file

0 Answers0