1

I have a basic Windows batch script comparing hashes of two files spitting out hashes that don't exist in both txt files.

First it generates a clean file without headers and ignores misc files like Thumbs.db or desktop.ini and stores them in "#_file1_clean.txt", "#_file2_clean.txt". This works great.

Then I use that clean file to generate a log file containing ONLY the hashes. This is where the issue lies, simply from ECHO %%b>>logfile.txt statement, it occasionally throws a "The process cannot access the file because it is being used by another process." error.

Then I use the findstr command to output lines that don't match. This works fine.

Here's the code:

@ECHO OFF
SET "batchpath=%~dp0"
CD /D "%batchpath%"

ECHO Cleaning up temp log files
del #_*.txt 2>NUL
timeout 2

REM *** ENTER TWO HASHLOGS TO COMPARE ***
set "file1=LOGS\hashlog_syno_archive.txt"
set "file2=LOGS\hashlog_Win_archive.txt"

CALL :SETSRC1 "%file1%"
CALL :SETSRC2 "%file2%"

findstr /G:"exclude.txt" /V "%file1%" > #_%fname1%_clean.txt
findstr /G:"exclude.txt" /V "%file2%" > #_%fname2%_clean.txt

CLS
FOR /F %%a in ('Find "" /v /c ^< "#_%fname1%_clean.txt"') DO (SET /a "line1=%%a")
ECHO Number of files to process in %file1%: %line1%

FOR /F %%a in ('Find "" /v /c ^< "#_%fname2%_clean.txt"') DO (SET /a "line2=%%a")
ECHO Number of files to process in %file2%: %line2%

TIMEOUT 3

ECHO,
ECHO Extracting %line1% Hashes from '%file1%'
FOR /F "usebackq tokens=1,2,3* delims=," %%a in ("#_%fname1%_clean.txt") do (ECHO %%b>>"#_hash1.txt")

ECHO,
ECHO Extracting %line2% Hashes from '%file2%'
FOR /F "usebackq tokens=1,2,3* delims=," %%a in ("#_%fname2%_clean.txt") do (ECHO %%b>>"#_hash2.txt")


ECHO,
ECHO Extracting NON-MATCHING Hashes
findstr /G:"#_hash1.txt" /V /I /L "#_%fname2%_clean.txt" > #_HASH_IN_%fname2%_NOT_IN_%fname1%.txt
findstr /G:"#_hash2.txt" /V /I /L "#_%fname1%_clean.txt" > #_HASH_IN_%fname1%_NOT_IN_%fname2%.txt

ECHO,
ECHO **COMPLETE**

GOTO :END

:SETSRC1
SET "fname1=%~n1"
GOTO :EOF

:SETSRC2
SET "fname2=%~n1"
GOTO :EOF

:END
PAUSE

Input files compared have filesize as number, hash value, filename like this (sample from log):

228825,91eaf030a59ee15f3846b25454350f29,Documents/Computer Review/P150SM-A/titanfall max settings no AA gpuz.jpg
14795,8c0c1533f1ee0ae0bf67235f8439d552,Documents/Computer Review/P150SM-A/charts/cpu cinebench.jpg
30590,673bd509c401b4b405243dc7a2fda73f,Documents/Computer Review/P150SM-A/charts/bf4 - fps.jpg
14026,be371bc60dbe70cc5e4667e11914ffbc,Documents/Computer Review/P150SM-A/charts/cpu fritz.jpg
13522,8dae26001302effaa3dacd93372d805a,Documents/Computer Review/P150SM-A/charts/cpu wprime.jpg
15666,f45893ec97e3e1177aa563cdd4f4f714,Documents/Computer Review/P150SM-A/charts/cpu 7zip.jpg
8463,351834a1d43c6181864d8647892864d9,Documents/Computer Review/P150SM-A/charts/game coh2.jpg
14711,cdc011f776b48148f51acc40e6c769eb,Documents/Computer Review/P150SM-A/charts/cpu x264.jpg

So it's just extracting the md5 hash value as %%b.

Problem is sometimes I get an error "The process cannot access the file because it is being used by another process." and I've narrowed it down to ECHO %%b>>"#_hash1.txt" (or hash2.txt). This results in missed lines output to the log.

This is the only batch file running, only process that would be touching those files. I've tried running it on another PC with the same result. The issue is it's sporadic. It's not all the time. Sometimes one line, sometimes multiple, and not always same line(s).

This seems like it should be a straight forward process, but echo to log file seems to be causing issues and I cannot figure out why.

Thanks for any assistance.

HTWingNut
  • 213
  • 1
  • 7

1 Answers1

0

The following two command lines are the problem as already correct analyzed:

FOR /F "usebackq tokens=1,2,3* delims=," %%a in ("#_%fname1%_clean.txt") do (ECHO %%b>>"#_hash1.txt")
FOR /F "usebackq tokens=1,2,3* delims=," %%a in ("#_%fname2%_clean.txt") do (ECHO %%b>>"#_hash2.txt")

The reason is that the Windows command processor opens the output file, appends the line to the file and closes the output file every time a line must be appended to the file. That makes it possible that another process running in background like an anti-virus application also opens the file for reading and doing something with the read data after cmd.exe closed the file and before opening it again for the next data write. The other process prevents now cmd.exe to open the file again on next line to append to the output file.

The solution is the usage of the following two command lines:

(FOR /F "usebackq tokens=2 delims=," %%I in ("#_%fname1%_clean.txt") do ECHO %%I)>"#_hash1.txt"
(FOR /F "usebackq tokens=2 delims=," %%I in ("#_%fname2%_clean.txt") do ECHO %%I)>"#_hash2.txt"

In this case the Windows command processor creates the output text file new before running the command FOR and keeps the output file permanently open while processing with the FOR loop the input text file and appending the lines to the output text file. That is not only much more efficient, it prevents also other processes to open also the output file between lines are written into it.

Some more hints for improving the batch file code:

  1. timeout 2 in sixth line is completely useless. It was added like TIMEOUT 3 for some troubleshooting with forgetting the removal of those two lines before posting the code.

  2. CALL :SETSRC1 "%file1%" and the entire subroutine SETSRC1 can be replaced by the single line:

    FOR %%I IN ("%file1%") DO SET "fname1=%%~nI"
    

    That is more efficient as just this single line must be read by cmd.exe from the batch file to get the environment variable fname1 defined with the string hashlog_syno_archive. The next line and subroutine SETSRC2 can be replaced by:

    FOR %%I IN ("%file2%") DO SET "fname2=%%~nI"
    
  3. SET /a "line1=%%a" can be simplified to SET "line1=%%a" and also SET /a "line2=%%a" to SET "line2=%%a". There is no need to use an arithmetic expression which converts the number string assigned to the loop variable a to a 32-bit signed integer and next converts the integer back to the same number string to define the environment variable line1 respectively line2 with the number string.

  4. ECHO, should be replaced by ECHO/ or ECHO(. For the reasons read the DosTips forum topic: ECHO. FAILS to give text or blank line - Instead use ECHO/

  5. The FOR /F option tokens=1,2,3* can be shortened to tokens=1-3* although even better is the usage of just tokens=2 as only the second comma delimited substring (token) is of interest and none of the others.

  6. Finally should be read Issue 7: Usage of letters ADFNPSTXZadfnpstxz as loop variable.

Mofi
  • 46,139
  • 17
  • 80
  • 143