2

I have two files. i am trying to remove any lines in file2 when they match values found in file1. One file has a listing like so:

File1

ZNI008
ZNI009
ZNI010
ZNI011
ZNI012

... over 19463 lines

The second file includes lines that match the items listed in first: File2

copy /Y \\server\foldername\version\20050001_ZNI008_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI010_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI012_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI009_162635.xml \\server\foldername\version\folder\

... continues listing until line 51360

What I've tried so far:

grep -v -i -f file1.txt file2.txt > f3.txt

does not produce any output to f3.txt or remove any lines. I verified by running

wc -l file2.txt

and the result is

51360 file2.txt

I believe the reason is that there are no exact matches. When I run the following it shows nothing

comm -1 -2 file1.txt file2.txt

Running

( tr '\0' '\n' < file1.txt; tr '\0' '\n' < file2.txt ) | sort | uniq -c | egrep -v '^ +1'

shows only one match, even though I can clearly see there is more than one match.

Alternatively putting all the data into one file and running the following:

grep -Ev "$(cat file1.txt)" 1>LinesRemoved.log

says argument has too many lines to process.

I need to remove lines matching the items in file1 from file2.

i am also trying this in python: `

    #!/usr/bin/python
s = set()

# load each line of file1 into memory as elements of a set, 's'
f1 = open("file1.txt", "r")
for line in f1:
    s.add(line.strip())
f1.close()

# open file2 and split each line on "_" separator,
# second field contains the value ZNIxxx
f2 = open("file2.txt", "r")
for line in f2:
    if line[0:4] == "copy":
        fields = line.split("_")
        # check if the field exists in the set 's'
        if fields[1] not in s:
            match = line
        else:
            match = 0
    else:
        if match:
            print match, line,

`

it is not working well.. as im getting 'Traceback (most recent call last): File "./test.py", line 14, in ? if fields[1] not in s: IndexError: list index out of range'

useBSD
  • 21
  • 1
  • 3
  • in file1 are there any line breaks?If so, how frequently.You say there are 19463 lines but in the example there is only one. – byrondrossos Apr 18 '12 at 13:15
  • i fixed that, should be clearer now – c00kiemon5ter Apr 18 '12 at 13:16
  • thanks for fixing that. there are line breaks in each file. – useBSD Apr 18 '12 at 13:17
  • If I run the test data you provided using the grep command as you input, I also get no output, but isn't that to be expected? The first file matches with everything in the second and the `-v` says show only non-matches of which there are none. – potong Apr 18 '12 at 13:45
  • ah, i want to remove any matches from file2 which exist in file1 – useBSD Apr 18 '12 at 13:58
  • Does this answer your question? [Deleting lines from one file which are in another file](https://stackoverflow.com/questions/4780203/deleting-lines-from-one-file-which-are-in-another-file) – Pound Hash Oct 17 '22 at 23:43

4 Answers4

11

What about:

grep -F -v -f file1 file2 > file3
c00kiemon5ter
  • 16,994
  • 7
  • 46
  • 48
byrondrossos
  • 2,107
  • 1
  • 15
  • 19
  • thanks, tried and i got no results removed. grep -F -v -f file1.txt file2.txt

    wc -l file2.txt 51361 file2.txt

    wc -l file1.txt 19463 file1.txt
    – useBSD Apr 18 '12 at 13:24
  • @useBSD It is supposed to print on screen.Editing to reflect that. – byrondrossos Apr 18 '12 at 13:28
  • i tried and again.. no lines in file3. grep -F -v -f file1.txt file2.txt > file3.txt # wc -l file3.txt 0 file3.txt – useBSD Apr 18 '12 at 13:49
  • if all lines in file1 match all lines of file2 then there nothing to display or save to file3. – c00kiemon5ter Apr 18 '12 at 13:55
  • why, I tried with your example input and it works fine.Could you please recheck your problem specification just in case you got any detail wrong? – byrondrossos Apr 18 '12 at 13:55
  • yes, the grep is working as expected but not as needed. i need file3 to include only lines which do not match. – useBSD Apr 18 '12 at 14:02
  • @useBSD: That's exactly what file3 will include. Running it against the sample data you provided will result in an empty output file because all lines in file1 match with file2. – sorpigal Apr 18 '12 at 14:10
1

I like the grep solution from byrondrossos better, but here's another option:

sed $(awk '{printf("-e /%s/d ", $1)}' file1) file2 > file3
Nick Atoms
  • 572
  • 2
  • 5
  • Great, I just utilized this method to extract all running and network-active Docker containers: `docker ps -aq -f status=running | sed "$(docker ps -aq -f status=running -f network=none | awk '{printf("-e /^%s$/d ", $1)}')"` – KaiserKatze Jun 17 '19 at 07:12
0

this is using Bash and GNU sed because of the -i switch

cp file2 file3
while read -r; do
    sed -i "/$REPLY/d" file3
done < file1

there is surely a better way but here's a hack around -i :D

cp file2 file3
while read -r; do
    (rm file3; sed "/$REPLY/d" > file3) < file3
done < file1

this exploits shell evaluation order


alright, I guess the correct way with this idea is using ed. This should be POSIX too.

cp file2 file3
while read -r line; do
    ed file3 <<EOF
/$line/d
wq
EOF
done < file1

in any case, grep seems to do be the right tool for the job.
@byrondrossos answer should work for you well ;)

c00kiemon5ter
  • 16,994
  • 7
  • 46
  • 48
  • is file 3 here an empty output file? or file1 the output? – useBSD Apr 18 '12 at 13:28
  • file1 is the file with the lines to match (entries like `ZNI008` `ZNI009` etc). file2 is the file from which the matching entries will be removed. file3 is the results. – c00kiemon5ter Apr 18 '12 at 13:31
0

This is admittedly ugly but it does work. However, the path must be the same for all of the (except of course the ZNI### portion). All but the ZNI### of the path is removed so the command grep -vf can run correctly on the sorted files.

First Convert "testfile2" to "testfileconverted" to just show the ZNI###

cat /testfile2 | sed 's:^.*_ZNI:ZNI:g' | sed 's:_.*::g' > /testfileconverted

Second use inverse grep of the converted file compared to the "testfile1" and add the reformatted output to "testfile3"

bash -c 'grep -vf <(sort /testfileconverted) <(sort /testfile1)' | sed "s:^:\copy /Y \\\|server\\\foldername\\\version\\\20050001_:g" | sed "s:$:_162635\.xml \\\|server\\\foldername\\\version\\\folder\\\:g" | sed "s:|:\\\:g" > /testfile3
E1Suave
  • 268
  • 2
  • 10