0

I'm starting out with regular expressions and grep and I want to find out how to do this. I have this list:

1. 12493 6530
2. 12475 5462
3. 12441 5450
4. 12413 5258
5. 12478 4454
6. 12416 3859
7. 12480 3761
8. 12390 3746
9. 12487 3741
10. 12476 3557
...

And I want to get the contents of the middle column only (so NF==2 in awk?). The delimiter here is a space.

I then want to find which numbers are there more than once (duplicates). How would I go about doing that? Thank you, I'm a beginner.

jvitasek
  • 810
  • 8
  • 16
  • 1
    This is more of a programming task than a regex exercise, as regex won't help you at all here. – Qtax Nov 11 '14 at 22:22
  • 1
    What would your expected output be given that input file? If the answer is "nothing" then edit your question to provide an input file that WOULD produce output from the tool you want and the associated expected output. – Ed Morton Nov 11 '14 at 23:04

3 Answers3

4

Using :

awk '{count[$2]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file

But you don't have duplicate numbers in the 2nd column.

  • the second column in awk is $2
  • count[$2]++ increment an array value with the treated number as key
  • the END block is executed @the end, and we test each array values to find those having +1

And with a better concision (credits for jthill)

awk '++count[$2]==2{print $2}' file
Community
  • 1
  • 1
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
2

Using perl:

perl -anE '$h{$F[1]}++; END{ say for grep $h{$_} > 1, keys %h }'

Iterate the lines and build a hash (%h/$h{...}) with the count (++) of the second column values ($F[1]), and after that (END{ ... }) say all hash keys with count ($h{$_}) which is > 1.

Qtax
  • 33,241
  • 9
  • 83
  • 121
-1

With the data stored in test,

Using a combination of awk, uniq and grep commands

 cat test | awk -v x=2 '{print $x}' | sort | uniq -c | sed  '/^1 /d' | awk -v x=2 '{print $x}'

Explanation:

awk -v x=2 '{print $x}'

selects 2nd column

uniq -c 

counts the appearance of each number

sed  '/^1 /d'

deletes all the entries with only one appearance

awk -v x=2 '{print $x}'

removes the number count with awk again

vincentleest
  • 925
  • 1
  • 8
  • 18