9

Say I have a file like:

apple
pear
lemon
lemon
pear
orange
lemon

How do I make it so that I only keep the unique lines, so I get:

apple
pear
lemon
orange

I can either modify the original file or create a new one.

I'm thinking there's a way to scan the original file a line at a time, check whether or not the line exists in the new file, and then append if it doesn't. I'm not dealing with really large files here.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Kache
  • 15,647
  • 12
  • 51
  • 79

8 Answers8

12
@echo off
setlocal disabledelayedexpansion
set "prev="
for /f "delims=" %%F in ('sort uniqinput.txt') do (
  set "curr=%%F"
  setlocal enabledelayedexpansion
  if "!prev!" neq "!curr!" echo !curr!
  endlocal
  set "prev=%%F"
)

What it does: sorts the input first, and then goes though it sequentially and outputs only if current line is different to previous one. It could have been even simpler if not for need to handle special characters (that's why those setlocal/endlocal are for).
It just echoes lines to stdout, if you want to write to file do (assuming you named your batch myUniq.bat) myUniq >>output.txt

wmz
  • 3,645
  • 1
  • 14
  • 22
  • Awesome, thanks! I ended up writing an `echo ... >> myUniq.bat` to every line, ran `myUniq`, then deleted it, all at once. – Kache Oct 11 '12 at 21:42
  • 3
    @Kache Glad I could help. If you're open to `powershell`, you could also use simple 1 liners: (unsorted) `gc uniqinput.txt |select -unique` or (sorted) `gc uniqinput.txt |sort|unique` – wmz Oct 11 '12 at 22:16
  • Thanks, but not works for such file 0000\n1111\n2222\n. (\n - real CRLF) Just prints 1111 and 2222. Maybe 0000 mean something specifal for batch. Anyway- my please take my vote :) – user1503944 Nov 29 '16 at 11:28
  • 1
    @user1503944 Good catch there is something funny going on when comparing zeros with none (as the value is not defined at that point). This is probably due to the fact that cmd tries a numeric comparison. I changed the comparison to be strictly strings (by adding quotes, should be done anyway) - that should fix it – wmz Nov 29 '16 at 12:35
2

There's no easy way to do that from the command line without an additional program.

uniq will do what you want.

Or you can download CoreUtils for Windows to get GNU tools. Then you can just use sort -u to get what you want.

Either one of those should be callable from a batch file.

Personally though, if you need to do a lot text manipulation like that I think you'd be better off getting Cygwin. Then you'd have easy access to sort, sed, awk, vim, etc.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
embedded.kyle
  • 10,976
  • 5
  • 37
  • 56
2

Run PowerShell from the command prompt.

Assuming the items are in a file call fruits.txt, the following will put the unique lines in uniques.txt:

type fruits.txt | Sort-Object -unique | Out-File uniques.txt
Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
2

In Windows 10 sort.exe has a hidden flag called /unique that you can use

C:\Users>sort fruits.txt
apple
lemon
lemon
lemon
orange
pear
pear

C:\Users>sort /unique fruits.txt
apple
lemon
orange
pear
Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
phuclv
  • 37,963
  • 15
  • 156
  • 475
0

The SORT command in Windows 10 does have an undocumented switch to remove duplicate lines.

SORT /UNIQ File.txt /O Fileout.TXT

But a more bullet proof option with a pure batch file you could use the following.

@echo off
setlocal disableDelayedExpansion
set "file=MyFileName.txt"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^


::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
  set "prev="
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    if /i "!ln!" neq "!prev!" (
      endlocal
      (echo %%A)
      set "prev=%%A"
    ) else endlocal
  )
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"
Squashman
  • 13,649
  • 5
  • 27
  • 36
-2

I also used Powershell from the command prompt, in the directory in which my text file is located, and then I used the cat command, the sort command, and Get-Unique cmdlet, as mentioned at http://blogs.technet.com/b/heyscriptingguy/archive/2012/01/15/use-powershell-to-choose-unique-objects-from-a-sorted-list.aspx.

It looked like this:

PS C:\Users\username\Documents\VDI> cat .\cde-smb-incxxxxxxxx.txt | sort | Get-Unique > .\cde-smb-incxxxxxxx-sorted.txt
Luc M
  • 16,630
  • 26
  • 74
  • 89
-2

Use GNU sort utility:

sort -u file.txt

If you're on Windows and using Git, then sort and many more useful utilities are already here: C:\Program Files\Git\usr\bin\

Just add this path to your %PATH% environment variable.

-5

You can use SORT command

eg

SORT test.txt > Sorted.txt