0

I have multiple files in the same directory, each file represents a user and contains IP's used to log into this account, each in a new line.

I want to create a script that will check if the same IP occurs in multiple files and of course print duplicates.

I've tried using awk but with no luck, any help appreciated!

Flawlesss
  • 11
  • 1
  • 3
    [edit] your question to show concise, testable sample input and expected output plus what you've tried so far (i.e. a [mcve]) so we can start trying to help you. – Ed Morton Nov 11 '16 at 00:51
  • You mention matching same values in different files and duplicates. Could you clarify if you only want to find matching values in different files or also duplicate entries in the same files? Those would be two different results. – artdanil Nov 11 '16 at 18:47
  • Where's your try? – Deanie May 20 '17 at 21:47
  • Related: Find duplicates in two files: https://stackoverflow.com/q/15470260/873282 – koppor Feb 08 '18 at 00:59

4 Answers4

1

Assuming that there are no repeated IP addresses on the same file, this should work for IPv4 addresses in many Bash versions:

#!/bin/bash
#For IP addresses v4, assuming no repeated IP addresses on the same file; result is stored on the file /tmp/repeated-ips
mkdir -p /tmp
grep -rhEo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /home/user/folder > /tmp/ipaddresses-holder
sort /tmp/ipaddresses-holder | uniq -d > /tmp/repeated-ips
Exit 0

The script below is a little more complex, but it would work whether or not there are repeated IP addresses on a single file:

#!/bin/bash
#For IP addresses v4, result is stored on the file /tmp/repeated-ips
mkdir -p /tmp
grep -rEo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /home/user/folder > /tmp/ipaddresses-holder
sort -u /tmp/ipaddresses-holder  > /tmp/ipaddresses-holder2
grep -rhEo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/ipaddresses-holder2 > /tmp/ipaddresses-holder3
sort /tmp/ipaddresses-holder3 | uniq -d > /tmp/repeated-ips
Exit 0

In both cases, the result is stored on the file /tmp/repeated-ips

Jamil Said
  • 2,033
  • 3
  • 15
  • 18
0

Use the following awk command:

awk '$0 in a {print FILENAME, "IP:", $0, "also in:", a[$0]; next} {a[$0] = FILENAME}' /tmp/user*

Assuming that you have file just with the IP like this

[tmp]$cat /tmp/user1
1.1.1.1
[tmp]$cat /tmp/user2
2.2.2.2
[tmp]$cat /tmp/user3
1.1.1.1

Output

[tmp]$awk '$0 in a {print FILENAME, "IP:", $0, "also in:", a[$0]; next} {a[$0] = FILENAME}' /tmp/user*
/tmp/user3 IP: 1.1.1.1 also in: /tmp/user1

Explanation

awk '
  $0 in a {                        # if IP already exists in array a
    print FILENAME, "IP:", $0, \   # print the output
       "also in:", a[$0];
    next;                          # get the next record without further
  }                                # processing
  {a[$0] = FILENAME}               # if reached here, then we are seeing IP
'                                  # for the first time, so store it
Jay Rajput
  • 1,813
  • 17
  • 23
  • My understanding is that there is only a single IP in the file. It is tricky to answer the question without knowing the format for the file storing the IP for the user – Jay Rajput Nov 11 '16 at 01:21
  • You've reverted your change, so I'm reposting my comment: If the same IP is listed in the same file multiple times, your script will write about that, but the OP only wants information about the same IP appearing in different files. – chw21 Nov 11 '16 at 01:44
  • Yeah I thought about that. Without knowing the requirements, it was unnecessary cluttering the code. I will let the OP comment and let us know the requirements, before I change. There are tons of things..like what happens if the IP can be expanded in one place and compressed at other place..Shall that be matched? – Jay Rajput Nov 11 '16 at 01:57
0

Not sure I understand your question correctly, so here's what I think you want to do:

You have several files. Each file refers to a specific user and logs every IP address that that user has used to log in from. Example:

$ cat alice.txt
192.168.1.1
192.168.1.5
192.168.1.1
192.168.1.1
$ cat bob.txt
192.168.0.1
192.168.1.3
192.168.1.2
192.168.1.3
$ cat eve.txt
192.168.1.7
192.168.1.5
192.168.1.7
192.168.0.7

You want to find out whether the same IP address appears in multiple files.

Here's what I came up with.

#!/usr/bin/env bash
SEARCH_TERMS="search_terms.txt"
for source_file in $@
do
    for search_term in $(sort -u $source_file)
    do
        found=$(grep -F "${search_term}" $@ --exclude=${source_file})
        if [[ -n "${found}" ]]; then
            echo "Found ${search_term} from ${source_file} also here:"
            echo ${found}
        fi
    done
done

It's probably not the best solution.

chw21
  • 7,970
  • 1
  • 16
  • 31
0

How about something like:

diff -u <(cat * | sort) <(cat * | sort | uniq)

In other words, the difference between all the files concatenated and sorted, and all the files concatenated, sorted, and then the duplicates removed.

EvansWinner
  • 158
  • 1
  • 5