0

Is there an awk script to remove duplicate entries from a file. My file is a txt file but is very large,has 7 million rows each row containing some entries separated by tabs. I read it through awk .I want to see if combination of two variables let say 1 and 15 is same in two rows than i want to keep only the row with there first occurence, is there a way to do it using awk script?

calor
  • 1
  • you said it has 7m lines, can you tell how big the file is? the size. – Kent Jun 11 '14 at 14:22
  • @Kent , it's size is 1.4 Gb – calor Jun 11 '14 at 14:23
  • 1
    @fedorqui ,uhr... I thought in a wrong way.. you are absolutely right..... forgive me, time to coffee... – Kent Jun 11 '14 at 14:24
  • @TapanBohra try the cmd in fedorqui's comment. – Kent Jun 11 '14 at 14:25
  • 1
    `awk -F"\t" '!a[$1 FS $15]++' file` this should work for you... – Kent Jun 11 '14 at 14:27
  • @Kent , how to write this command in an awk file, sorry i am very new to awk . I want to write this in awk file because i have to extract some other columns also from this file . THis is my awk file currently. BEGIN{FS="\t"; OFS="\t"; } { if($16=="256" ) print $1, $5, $10, $13, $17; } – calor Jun 11 '14 at 14:37
  • Since this question is closed you should ask a new one. – Jotne Jun 12 '14 at 06:00

0 Answers0