0

I looking for easiest way to filter two files.

INPUT: txt files
File 1 (bigger one):

abc111
abc112
abc113
abc114
abc115
...
zbc999

File 2 (smaller one):

abc111
abc112
abc113

OUTPUT: On the output I want to have new file with non-recurring (unique) data set. In other words, in output file must be only those entries from a larger file 1 that are unique and do not occur in a smaller file 2.

BTW:
How to do it easily if the file names are long and difficult to enter them every time the console?

Mark
  • 3,609
  • 1
  • 22
  • 33
MarekW
  • 41
  • 5
  • While this is doable with only standard functionality, the best way I can think of is still extremely painful. So what you want is simply to find a program that can do this easily, which makes the question off topic. – Jon May 30 '14 at 10:58

1 Answers1

0

"extremely painful"?

@echo off
REM step 1: remove doublettes from file1
echo.>file1.tmp
for /f %%i in (file1.txt) do (
 findstr /x /L "%%i" file1.tmp>nul ||echo %%i>>file1.tmp
)

REM step 2: extract lines that doesn't exist in file2
findstr /v /x /L /g:file2.txt file1.tmp >output.txt

type output.txt
Stephan
  • 53,940
  • 10
  • 58
  • 91
  • Is there any way to make method to choose file1.txt and file2.txt from popup window? – MarekW May 30 '14 at 11:30
  • Well, **that** would be the "extremely painful" part. But do you know, that you can use the `TAB` key to complete a filename at the command prompt? Enter the first two or three characters and press `TAB` several times, it will toggle through all matching filenames. – Stephan May 30 '14 at 11:35
  • Your basic logic is sound, but this has bugs, one of which cannot be solved. FIND is a poor choice because it will report that `234` matches `1234`. FINDSTR is better because you can specify the `/X` option to match the entire line exactly. It also needs the `/L` option to prevent regex interpretation. But \\ and \" literals within search strings require the leading \ to be escaped as \\. More troubling is a horrific FINDSTR bug that can cause searches with multiple strings of different lengths to sometimes miss matches. See http://stackoverflow.com/q/8844868/1012053. – dbenham May 30 '14 at 11:42
  • @dbenham 's last point can be resolved with another `for` to check for one string after another. Ugly and slow, but reliable (?). I didn't get the "But \` and \"` literals..." part - maybe some characters are not shown as intended. (I read the link, but parts of that is (yet) a bit behind my horizon.) I edited my answers to include the first parts of dbenhams comment. – Stephan May 30 '14 at 12:33
  • @Stephan: So... do we agree on "extremely painful" now? ;-) Frankly, batch files is about the most masochistic way to do this. PowerShell at the very least. – Jon May 30 '14 at 20:36
  • @jon: well - uhm - ok, point for you. But it's not the logic that's painful, but the working around some bugs. – Stephan May 31 '14 at 06:06
  • @Stephan: Agreed. That's just how batch files turn out a lot of the time: the logic is simple, but the execution is another issue entirely. – Jon May 31 '14 at 22:24