2

I have an array of URLs and corresponding domains
Delimiter for array values is \n
Domain/Url delimiter is comma

site1.com,www.site1.com/blahA-blahB-blahC   
site2.com,site2.com/blahD-blahE-blahF   
site2.com,site2.com/blahG-blahH-blahI   
site3.com,site3.com/blahJ-blahK-blahL

I would like to filter this array and remove lines that contain domain duplicates (1st occurrence stays). Required output is as follows:

site1.com,www.site1.com/blahA-blahB-blahC   
site2.com,site2.com/blahD-blahE-blahF   
site3.com,site3.com/blahJ-blahK-blahL

Please, advise.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
user1192422
  • 131
  • 1
  • 2
  • 12
  • This thread might help you out: http://stackoverflow.com/questions/7099887/is-there-a-set-data-structure-in-bash – Bryan May 23 '14 at 07:26

1 Answers1

0

Try this awk command,

awk -F/ '!x[$1]++' file

Output:

site1.com,www.site1.com/blahA-blahB-blahC
site2.com,site2.com/blahD-blahE-blahF
site3.com,site3.com/blahJ-blahK-blahL
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274