4

file 1

A
B
C

file 2

B
C
D

file1 + file2 =

A
B
C
D

Is it possible to do using cmd.exe?

Mark
  • 3,609
  • 1
  • 22
  • 33
Victor
  • 23,172
  • 30
  • 86
  • 125
  • 3
    Both input files are sorted? Do you want the output in same order? – Aacini Nov 06 '13 at 16:20
  • 1
    It's not simple, but doing it in any programming language would be. Why not use PowerShell or something else? – mojo Nov 06 '13 at 19:53

6 Answers6

9

If you can affort to use a case insensitive comparison, and if you know that none of the lines are longer than 511 bytes (127 for XP), then you can use the following:

@echo off
copy file1.txt merge.txt >nul
findstr /lvxig:file1.txt file2.txt >>merge.txt
type merge.txt

For an explanation of the restrictions, see What are the undocumented features and limitations of the Windows FINDSTR command?.

Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • +1, but the unique lines of file2.txt will appear after file1.txt. This is not the original order nor sorted order... – Aacini Nov 06 '13 at 19:37
  • @Aacini - What is the "original order" of the merged data? The merged data never existed before in any order, so how can it have an original order? – dbenham Nov 06 '13 at 19:41
  • I mean, your method with the files in `file2 file1` order will produce: `B C D A`, that is, just the order of the first file is preserved... – Aacini Nov 06 '13 at 19:53
  • @Aacini - Actually, it produces 'A B C D' because it first copies file1 in its entirety, then appends new records from file2. But I am uncomfortable stating that this, or any other order, is the "original" order. – dbenham Nov 06 '13 at 19:58
  • Perhaps I didn't expressed myself correctly. My first solution produce a result in the same order of both sorted input files. All other solutions produce a sorted result because `sort` is required in order for the method to work. The `findstr` method does _not_ produce a sorted result. – Aacini Nov 07 '13 at 00:29
8

Using PowerShell:

Get-Content file?.txt | Sort-Object | Get-Unique > result.txt

For cmd.exe:

@echo off
type nul > temp.txt
type nul > result.txt,
copy file1.txt+file2.txt temp.txt
for /f "delims=" %%I in (temp.txt) do findstr /X /C:"%%I" result.txt >NUL ||(echo;%%I)>>result.txt
del temp.txt
sschuberth
  • 28,386
  • 6
  • 101
  • 146
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70
  • 2
    Or even a bit shorter (for PowerShell): `Get-Content file?.txt | Sort-Object -Unique > result.txt`. – sschuberth Mar 08 '17 at 19:51
  • @BLUEPIXY When I use this powershell command it removes the most duplicate occurrence but the size of files almost doubles(input file 14KB > output file 27KB) which mustn't happen at all. Any way we can force prevent the size block ambiguity ? – Vicky Dev Aug 02 '22 at 00:51
4

First part (merging two text files) is possible. (See Documentation of copy command)

copy file1.txt+file2.txt file1and2.txt

For part 2, you can use sort and uniq utilities from CoreUtils for Windows. This are windows port of the linux utilities.

sort file1and2.txt filesorted.txt
uniq filesorted.txt fileunique.txt

This has a limitation that you will lose track of original sequencing.

Update 1

Windows also ships with a native SORT.EXE.

Update 2

Here is a very simple UNIQ in CMD script

Litmus
  • 10,558
  • 6
  • 29
  • 44
  • 1
    Question does not state it is unsorted files. The examples included imply files are sorted. Anyway, adding a qualifier – Litmus Nov 06 '13 at 16:35
  • file1and2.txt will have `A B C B C D`. That is the output of step 1. Read my answer again. `file1and2.txt` is just concatination of two files in cmd.exe. – Litmus Nov 06 '13 at 16:42
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/40649/discussion-between-erbureth-and-eternal-learner) – Erbureth Nov 06 '13 at 16:45
  • 1
    Summary: even if `file1.txt` and `file2.txt` are sorted, their contatenation (`file1and2.txt`) is not, therefore `uniq` cannot be performed before `sort`-ing that file. – Erbureth Nov 06 '13 at 16:48
  • There is not such `cp` command! The Windows command to copy is `copy`. – Aacini Nov 06 '13 at 17:19
  • Note that Windows SORT is ***not*** case sensitive. So it may not give the correct result if case is important. – dbenham Nov 06 '13 at 18:46
3

You may also use the same approach of Unix or PowerShell with pure Batch, developing a simple uniq.bat filter program:

@echo off
setlocal EnableDelayedExpansion
set "prevLine="
for /F "delims=" %%a in ('findstr "^"') do (
   if "%%a" neq "!prevLine!" (
      echo %%a
      set "prevLine=%%a"
   )
)

EDIT: The program below is a Batch-JScript hybrid version of uniq program, more reliable and faster; copy this program in a file called uniq.bat:

@if (@CodeSection == @Batch) @then

@CScript //nologo //E:JScript "%~F0" & goto :EOF

@end

var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
   line = WScript.Stdin.ReadLine();
   if ( line != prevLine ) {
      WScript.Stdout.WriteLine(line);
      prevLine = line;
   }
}

This way, you may use this solution:

(type file1.txt & type file2.txt) | sort | uniq > result.txt

However, in this case the result lost the original order.

Aacini
  • 65,180
  • 12
  • 72
  • 108
  • 1
    How about eliminating intermediate file using `(type file1.txt & type file2.txt)|sort|uniq>result.txt`. A JScript or VBS (or hybrid JScript/batch) implementation of uniq would perform better and be more reliable. The tag says batch-file, but the question just speaks of cmd.exe. – dbenham Nov 06 '13 at 17:58
  • @dbenham: I like it, Dave! I just modified my answer including it :-) – Aacini Nov 06 '13 at 18:22
  • Windows SORT is ***not*** case sensitive, so this may not give the correct result if case is important. – dbenham Nov 06 '13 at 18:51
  • Yes, the funny thing is that we elaborate so much so far on this question, but the OP have not answered yet! I added the Batch-JScript version of uniq. – Aacini Nov 06 '13 at 19:00
0

The solution below assume that both input files are sorted in ascending order using the same order of IF command's comparison operators and that does not contain empty lines.

@echo off
setlocal EnableDelayedExpansion

set "lastLine=ÿ"
for /L %%i in (1,1,10) do set "lastLine=!lastLine!!lastLine!"

< file1.txt (
   for /F "delims=" %%a in (file2.txt) do (
      set "line2=%%a"
      if not defined line1 set /P line1=
      if "!line1!" lss "!line2!" call :advanceLine1
      if "!line1!" equ "!line2!" (
         echo !line1!
         set "line1="
      ) else (
         echo !line2!
      )
   )
)
if "!line1!" neq "%lastLine%" echo !line1!
goto :EOF


:advanceLine1
echo !line1!
set "line1="
set /P line1=
if not defined line1 set "line1=%lastLine%"
if "!line1!" lss "!line2!" goto advanceLine1
exit /B
Aacini
  • 65,180
  • 12
  • 72
  • 108
0

this joins, sorts and reduce excessive size after PowerShell

Get-Content file?.txt | Sort-Object | Get-Unique | Set-Content -Encoding UTF8 result.txt