0

I'm working with very large FIX message log files. Each message represents a set of tags separated by SOH characters.

Unlike MQ messages, individual FIX tags (and overall messages) do not feature fixed length or position. Log may include messages of different types (with a different number & sequence of tags).

Sample (of one of many types of messages):

07:00:32 -SEND:8=FIX.4.0(SOH)9=55(SOH)35=0(SOH)34=2(SOH)43=N(SOH)52=20120719-11:00:32(SOH)49=ABC(SOH)56=XYZ(SOH)10=075

So the only certain things are as follows: (1) tag number with equal sign uniquely identifies the tag, (2) tags are delimited by SOH characters.

For specific tags (just a few of them at a time, not all of them), I need to get a list of their distinct values - something like this:

49=ABC 49=DEF 49=GHI...

Format of the output doesn't really matter.

I would greatly appreciate any suggestions and recommendations.

Kind regards, Victor O.

dbenham
  • 127,446
  • 28
  • 251
  • 390
Victor O.
  • 3
  • 1
  • 3
  • Can you give another example where the SOH characters are substituted with text? e.g.: replace each with `{SOH}`. Note: Don't put them here. Edit your post instead. – Jay Jul 20 '12 at 06:31
  • Jay, thank you for your sugestion. I added (SOH) between the tags to emulate a separator. – Victor O. Jul 20 '12 at 13:45
  • May I assume you want to do it using a batch file from a Window command prompt? – Jay Jul 20 '12 at 14:05
  • Jay, yes - I'd prefer batch file. – Victor O. Jul 20 '12 at 14:48
  • Jay, thank you so very much, that's great! I'll try a.s.a.p. and let you know. – Victor O. Jul 20 '12 at 19:34
  • @Jay, Did you receive any of my recent comments/feedback? – Victor O. Jul 20 '12 at 21:13
  • I was away. I noticed two new comments from you in this post. – Jay Jul 20 '12 at 21:38
  • @Jay, That means my previous comments were lost. Batch is EXCELLENT; the only thing is huge output. In reality, distinct values for specified tags only - one or just a few - are needed for analysis. Would it be possible to pass them as a parameter? – Victor O. Jul 20 '12 at 21:53
  • I changed the code, but I simply split the generated files by tag ID number since it's much easier. Check the *Changes* section. Don't forget to fix the *SOH* character in the code. – Jay Jul 21 '12 at 05:00
  • @Jay, I'm extremely grateful and appreciative of your effort (and I liked the idea of individual logs!). But my description was apparently not good enough, so I edited it. Tags have no fixed length or position, and their number in a message could be different (not necessarily 9). Leading segment of the message could also have different length (not necessarily 15 characters). With this in mind, passing specific tag number as a parameter is not just a convenience, but could be the only approach IMHO. Would it be too much trouble if I respectfully ask you to take anbother look? – Victor O. Jul 21 '12 at 16:39
  • @dbenham, OPTION 1 IS JUST FANTASTIC - THANK YOU SO VERY MUCH!! – Victor O. Jul 22 '12 at 13:38
  • @VictorO. - Thanks for the compliment, but generally comments about an answer should be added to the answer, and question comments are reserved for, well... the question :-) Kudos for taking the time to accept an answer. Be careful in the future about asking how to do something without also showing what you have tried or exactly what has you stumped. – dbenham Jul 22 '12 at 16:30

2 Answers2

2

Option 1

The batch script below has decent performance. It has the following limitations

  • It ignores case when checking for duplicates.
  • It may not properly preserve all values that contain = in the value

EDIT - My original code did not support = in the value at all. I lessened that limitation by adding an extra SOH character in the variable name, and changed the delims used to parse the value. Now the values can contain = as long as unique values are differentiated before the =. If the values differentiate after the = then only one value will be preserved.

Be sure to fix the definition of the SOH variable near the top.

The name of the log file is passed as the 1st parameter, and the list of requested tags is passed as the 2nd parameter (enclosed in quotes).

@echo off
setlocal disableDelayedExpansion

:: Fix the definition of SOH before running this script
set "SOH=<SOH>"
set LF=^


:: The above 2 blank lines are necessary to define LF, do not remove.

:: Make sure there are no existing tag_ variables
for /f "delims==" %%A in ('2^>nul set tag_') do set "%%A="

:: Read each line and replace SOH with LF to allow iteration and parsing
:: of each tag/value pair. If the tag matches one of the target tags, then
:: define a tag variable where the tag and value are incorporated in the name.
:: The value assigned to the variable does not matter. Any given variable
:: can only have one value, so duplicates are removed.
for /f "usebackq delims=" %%A in (%1) do (
  set "ln=%%A"
  setlocal enableDelayedExpansion
  for %%L in ("!LF!") do set "ln=!ln:%SOH%=%%~L!"
  for /f "eol== tokens=1* delims==" %%B in ("!ln!") do (
    if "!!"=="" endlocal
    if "%%C" neq "" for %%D in (%~2) do if "%%B"=="%%D" set "tag_%%B%SOH%%%C%SOH%=1"
  )
)

:: Iterate the defined tag_nn variables, parsing out the tag values. Write the
:: values to the appropriate tag file.
del tag_*.txt 2>nul
for %%A in (%~2) do (
  >"tag_%%A.txt" (
    for /f "tokens=2 delims=%SOH%" %%B in ('set tag_%%A') do echo %%B
  )
)

:: Print out the results to the screen
for %%F in (tag_*.txt) do (
  echo(
  echo %%F:
  type "%%F"
)

Option 2

This script has almost no limitations, but it significantly slower. The only limitation I can see is it will not allow a value to start with = (the leading = will be discarded).

I create a temporary "search.txt" file to be used with the FINDSTR /G: option. I use a file instead of a command line search string because of FINDSTR limitations. Command line search strings cannot match many characters > decimal 128. Also the escape rules for literal backslashes are inconsistent on the command line. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info.

The SOH definition must be fixed again, and the 1st and 2nd arguments are the same as with the 1st script.

@echo off
setlocal disableDelayedExpansion

:: Fix the definition of SOH before running this script
set "SOH="
set lf=^


:: The above 2 blank lines are necessary to define LF, do not remove.

:: Read each line and replace SOH with LF to allow iteration and parsing
:: of each tag/value pair. If the tag matches one of the target tags, then
:: check if the value already exists in the tag file. If it doesn't exist
:: then append it to the tag file.
del tag_*.txt 2>nul
for /f "usebackq delims=" %%A in (%1) do (
  set "ln=%%A"
  setlocal enableDelayedExpansion
  for %%L in ("!LF!") do set "ln=!ln:%SOH%=%%~L!"
  for /f "eol== tokens=1* delims==" %%B in ("!ln!") do (
    if "!!"=="" endlocal
    set "search=%%C"
    if defined search (
      setlocal enableDelayedExpansion
      >search.txt (echo !search:\=\\!)
      endlocal
      for %%D in (%~2) do if "%%B"=="%%D" (
        findstr /xlg:search.txt "tag_%%B.txt" || >>"tag_%%B.txt" echo %%C
      ) >nul 2>nul
    )
  )
)
del search.txt 2>nul

:: Print out the results to the screen
for %%F in (tag_*.txt) do (
  echo(
  echo %%F:
  type %%F
)
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • EDIT - Added discussion about need for temporary search.txt file in option 2. – dbenham Jul 22 '12 at 16:18
  • Option 1 is practically PERFECT - provides a flexibility to pass only required tag IDs as parameters; features both screen and log output for those tags' distinct values; and has a very good performance. GREATLY APPRECIATED! – Victor O. Jul 23 '12 at 01:29
  • @VictorO. - EDIT - The first option now supports `=` in the values. – dbenham Jul 23 '12 at 21:37
1

Try this batch file. Add the log file name as parameter. e.g.:

LISTTAG.BAT SOH.LOG

It will show all tag id and its value that is unique. e.g.:

9=387
12=abc
34=asb73
9=123
12=xyz

Files named tagNNlist.txt (where NN is the tag id number) will be made for finding unique tag id and values, but are left intact as reports when the batch ends.

The {SOH} text shown in below code is actually the SOH character (ASCII 0x01), so after you copy & pasted the code, it should be changed to an SOH character. I have to substitute that character since it's stripped by the server. Use Wordpad to generate the SOH character by typing 0001 then press ALT+X. The copy & paste that character into notepad with the batch file code.

One thing to note is that the code will only process lines starting at column 16. The 07:00:32 -SEND: in your example line will be ignored. I'm assuming that they're all start with that fixed-length text.

Changes:

  • Changed generated tag list file into separate files by tag IDs. e.g.: tag12list.txt, tag52list.txt, etc.

  • Removed tag id numbers in generated tag list file. e.g.: 12=abc become abc.

LISTTAG.BAT:

@echo off
setlocal enabledelayedexpansion
if "%~1" == "" (
  echo No source file specified.
  goto :eof
)
if not exist "%~1" (
  echo Source file not found.
  goto :eof
)
echo Warning! All "tagNNlist.txt" file in current
echo directory will be deleted and overwritten.
echo Note: The "NN" is tag id number 0-99. e.g.: "tag99list.txt"
pause
echo.
for /l %%a in (0,1,99) do if exist tag%%alist.txt del tag%%alist.txt
for /f "usebackq delims=" %%a in ("%~1") do (
  rem *****below two lines strip the first 15 characters (up to "-SEND:")
  set x=%%a
  set x=!x:~15,99!
  rem *****9 tags per line
  for /f "tokens=1,2,3,4,5,6,7,8,9 delims={SOH}" %%b in ("!x!") do (
    call :dotag "%%b" %*
    call :dotag "%%c"
    call :dotag "%%d"
    call :dotag "%%e"
    call :dotag "%%f"
    call :dotag "%%g"
    call :dotag "%%h"
    call :dotag "%%i"
    call :dotag "%%j"
  )
)
echo.
echo Done.
goto :eof

rem dotag "{id=value}"
:dotag
for /f "tokens=1,2 delims==" %%p in (%1) do (
  set z=0
  if exist tag%%plist.txt (
    call :chktag %%p "%%q"
  ) else (
    rem>tag%%plist.txt
  )
  if !z! == 0 (
    echo %%q>>tag%%plist.txt
    echo %~1
  )
)
goto :eof

rem chktag {id} "{value}"
:chktag
for /f "delims=" %%y in (tag%1%list.txt) do (
  if /i "%%y" == %2 (
    set z=1
    goto :eof
  )
)
goto :eof
Jay
  • 4,627
  • 1
  • 21
  • 30