0

I have 2 txt files, the first is the result of a json file and the second must contain the content of the JSON + the content of an other TXT file.

database1.txt

word1
word2
word3
word8

Database2.txt (from JSON)

word1
word5
word7
word8

Database3.txt (Database1+Database2)

Word1
Word2
Word3
Word5
Word7
Word8

Here is my CODE:

@ECHO OFF
setlocal enabledelayedexpansion
IF EXIST "%LOCALAPPDATA%\xxx\xxx\database.json". (

for /f "delims=" %%a in ('type "%LOCALAPPDATA%\xxx\xxx\database.json"') do for %%b in (%%a) do (
ECHO %%b >>json.tmp
)

for /f "tokens=* skip=1 delims= " %%a in (json.tmp) do (
call :sub1 %%a

>> Json_cl.txt echo.!S!
)

set row=
for /F "delims=" %%j in (Json_cl.txt) do (
  if  defined row echo.!row!>>Password_JD.txt
  set row=%%j
)


findstr /V /g:"Password_list.txt" "Password_JD.txt">1.out
        type Password_list.txt 1.out>Updated_PW.txt

del Json_cl.txt
del json.tmp
del Password_JD.txt
del 1.out
goto :eof

:sub1
set S=%*
set S=!S:"=!

goto :eof
)

The code works well but sometimes looks like if the FINDSTR dosnt find missing word.

Can someone help me to fix it or can tell me a better way to compare ?

Thank you

René Girardi
  • 47
  • 2
  • 7

2 Answers2

0

This script uses a robust tool called Uniq.bat by aacini

@echo off
copy database1.txt + database2.txt tmp.txt >nul
type tmp.txt | sort |uniq >database3.txt
del tmp.txt

UNIQ.BAT

@if (@CodeSection == @Batch) @then

@CScript //nologo //E:JScript "%~F0" & goto :EOF & Rem aacini 2013

@end

var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
   line = WScript.Stdin.ReadLine();
   if ( line != prevLine ) {
      WScript.Stdout.WriteLine(line);
      prevLine = line;
   }
}
foxidrive
  • 40,353
  • 10
  • 53
  • 68
  • This will not work reliably because SORT is not case sensitive. SORT could yield the following sequence: `A a A`, and UNIQ.BAT would preserve all three values, whereas you only want `A a`. My [JSORT.BAT](http://www.dostips.com/forum/viewtopic.php?f=3&t=5595) could be used to solve the case sensitivity issue, but I think the OP may still have an issue with extra spaces on some of the lines. – dbenham Jun 07 '14 at 13:10
  • @dbenham It does solve the question as posed. `Lousy question in` often leads to `answer that may not work`. – foxidrive Jun 07 '14 at 13:17
  • If you are going to be that literal about the question, then this certainly does not solve the problem, as the text inputs are all lower case, and the final output is mixed case. But I don't think that was the OP's intent. Also, the fact that the code refers to passwords implies that case is probably important since most passwords are case sensitive. But agreed, the question could be better written. – dbenham Jun 07 '14 at 13:24
0

Problem 1: Potential for duplicate words in final output

FINDSTR has a nasty bug - If matching against multiple literal search strings of varying length, then it may miss some matches. You do not explicitly specify a regular expression or literal search, so FINDSTR will do a regular expression search if the first line in the /G file contains a regex meta-character, otherwise it will do a literal search. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info about both the bug and the literal vs. regex issue.

If you are going to use FINDSTR against multiple search strings, then you should explicitly force a literal search using the /L option, and force a case insensitive search with the /I option. Of course this is not acceptable if your passwords are case sensitive.

If the passwords are case sensitive, then you can do a regex search instead using the /R option, but then you must ensure that either none of the search strings contain a regex meta-character, or else all the meta-characters must be escaped with a leading backslash \.

The regex meta-characters that would have to be escaped are: . * ^ $ [ \. But it is extremely difficult to do search and replace involving * in batch. However, this must not be an issue or else your JSON parser would fail since the simple FOR loop would corrupt words containing * or ?.

Problem 2: Words missing from final output

By default, FINDSTR will look for a search string anywhere within a target line. So a word like WIN would match TWINS. So a new word WIN in your password file would fail to show up in the output if your JSON file contains TWINS.

The solution would be to do an exact match with the FINDSTR /X option. But then you have a potential space problem because your JSON parser is appending a space to the end of each word. That can be fixed by changing ECHO %%b >>json.tmp to ECHO %%b>>json.tmp

Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390