3

I have a batch file that takes input from a txt file that looks like this..

Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.


Server name lak-print01
Printer name Microsoft XPS Document Writer
Share name 
Driver name Microsoft XPS Document Writer
Port name XPSPort:
Comment 
Location 
Print processor WinPrint
Data type RAW
Parameters 
Attributes 64
Priority 1
Default priority 1
Average pages per minute 0
Printer status Idle 
Extended printer status Unknown 
Detected error state Unknown 
Extended detected error state Unknown 

Server name lak-print01
Printer name 4250_Q1
Share name 4250_Q1
Driver name Canon iR5055/iR5065 PCL5e
Port name IP_192.168.202.84
Comment Audit Department in Lakewood Operations
Location Operations Center
Print processor WinPrint
Data type RAW
Parameters 
Attributes 10826
Priority 1
Default priority 0
Average pages per minute 0
Printer status Idle 
Extended printer status Unknown 
Detected error state Unknown 
Extended detected error state Unknown 

Server name lak-print01
Printer name 3130_Q1
Share name 3130_Q1
Driver name Canon iR1020/1024/1025 PCL5e
Port name IP_192.168.202.11
Comment Canon iR1025 
Location Operations Center
Print processor WinPrint
Data type RAW
Parameters 
Attributes 10824
Priority 1
Default priority 0
Average pages per minute 0
Printer status Idle 
Extended printer status Unknown 
Detected error state Unknown 
Extended detected error state Unknown 

and parses it to get certain things in the list, like server name, printer name, driver name, etc.. and then puts each block entry into its own comma deliminated row. So i can have multiple rows, each one for a block of text, which each column having the particular information. Some of these txt files have 100+ entries. When it gets to parsing, each file I try to parse takes 5-10 minutes

The Parse code is as follows.

:Parselak-print01
SETLOCAL enabledelayedexpansion
:: remove variables starting $
FOR  /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
(FOR /f "delims=" %%a IN (lak-print01.txt) DO CALL :analyse "%%a")>lak-print01.csv
attrib +h lak-print01.csv
GOTO :EOF

:analyse
SET "line=%~1"
SET /a fieldnum=0
FOR %%s IN ("Server name" "Printer name" "Driver name"
            "Port name" "Location" "Comment" "Printer status" 
        "Extended detected error state") DO CALL :setfield %%~s
GOTO :eof

:setfield
SET /a fieldnum+=1
SET "linem=!line:*%* =!"
SET "linet=%* %linem%"
IF "%linet%" neq "%line%" GOTO :EOF 
IF "%linem%"=="%line%" GOTO :EOF
SET "$%fieldnum%=%linem%"
IF NOT DEFINED $8 GOTO :EOF 
SET "line="
FOR /l %%q IN (1,1,7) DO SET "line=!line!,!$%%q!"
ECHO !line:~1!
:: remove variables starting $
FOR  /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
GOTO :eof

and the output I get is

lak-print01,Microsoft XPS Document Writer,Microsoft XPS Document Writer,XPSPort:,,,Idle 
lak-print01,4250_Q1,Canon iR5055/iR5065 PCL5e,IP_192.168.202.84,Operations Center,Audit Department in Lakewood Operations,Idle 
lak-print01,3130_Q1,Canon iR1020/1024/1025 PCL5e,IP_192.168.202.11,Operations Center,Canon iR1025 ,Idle 
lak-print01,1106_TRN,HP LaserJet P2050 Series PCL6,IP_172.16.10.97,Monroe,HP P2055DN,Idle 
lak-print01,1101_TRN,HP LaserJet P2050 Series PCL6,IP_10.3.3.22,Burlington,Training Room printer,Idle 
lak-print01,1096_Q3,Canon iR1020/1024/1025 PCL5e,IP_192.168.96.248,Silverdale,Canon iR 1025,Idle 
lak-print01,1096_Q2,Kyocera Mita KM-5035 KX,IP_192.168.96.13,Silverdale,Kyocera CS-5035 all in one,Idle 
lak-print01,1096_Q1,HP LaserJet P4010_P4510 Series PCL 6,IP_192.168.96.12,Silverdale,HP 4015,Idle 
lak-print01,1095_Q3,HP LaserJet P4010_P4510 Series PCL 6,IP_192.168.95.247,Sequim,HP LaserJet 4015x,Idle 

Everything is perfect, and the code works as intended.. but its just super freaking slow!

How do I speed this up? the problem is there is no true delim and the tokens vary.. for instance comment needs token 2, but printer name, needs token 3.

Any help to increase the speed of parsing.. the program works perfectly, but super slow during parsing.

Alkemdah
  • 59
  • 7

3 Answers3

6

If speed is what you need, I'd suggest Marpa, a general BNF parser, in Perlcode, output.

It would take some time to get used to, but does the job and gives you a very powerful tool you can use easily — note how natural the grammar resembles the input.

Hope this helps.

rns
  • 771
  • 4
  • 9
3

Using Call is very slow - see if this gives you the output you need, and it will be interesting to hear how much quicker it is in comparison.

@echo off
:Parselak-print01
SETLOCAL enabledelayedexpansion
(FOR /f "delims=" %%a IN (lak-print01.txt) DO (
for /f "tokens=1,2,*" %%b in ("%%a") do (
   if "%%b"=="Server"   set "server=%%d"
   if "%%b"=="Printer"  if "%%c"=="name" (set "printer=%%d") else (set "printerstatus=%%d")
   if "%%b"=="Driver"   set "driver=%%d"
   if "%%b"=="Port"     set "port=%%d"
   if "%%b"=="Location" for /f "tokens=1,*"   %%e in ("%%a") do set "location=%%f"
   if "%%b"=="Comment"  for /f "tokens=1,*"   %%e in ("%%a") do set "comment=%%f"
   if "%%b"=="Extended" for /f "tokens=1-4,*" %%e in ("%%a") do if "%%f"=="detected" set "extendeddetected=%%i"
   )
if defined extendeddetected (
   echo !server!,!printer!,!driver!,!port!,!location!,!comment!,!printerstatus!,!extendeddetected!
   set "server="
   set "printer="
   set "driver="
   set "port="
   set "location="
   set "comment="
   set "printerstatus="
   set "extendeddetected="
)
))>lak-print01.csv
attrib +h lak-print01.csv
pause
foxidrive
  • 40,353
  • 10
  • 53
  • 68
  • So.. I parse all of the 11 printer text files in about 1 minute or less. This is an IMMENSE improvement!! THANK YOU SOOOO MUCH – Alkemdah Oct 01 '14 at 00:36
  • @Alkemdah: Excuse me. I am pretty sure that [my solution](http://stackoverflow.com/questions/26107314/speed-up-my-batch-file-parsing/26112519#26112519) below should run faster than this one, but I may be wrong. It would be very useful if you may post the timing of both programs because this comparison would help us to improve the methods we use in our solutions. TIA – Aacini Oct 01 '14 at 01:59
  • @Aacini Your solution is very clever Antonio, and if the file format is fixed then it is faster (but it misses the last data point which is only listed in the OP code "Extended detected error state"). On a file of `1 million lines` your code takes `59 seconds` and my version takes `93 seconds` – foxidrive Oct 01 '14 at 03:07
  • @foxidrive: Thanks for the timing results! In order to add the last data field in the output, just add `set "row[18]=Extended detected error state"` in the desired rows, and the same variable in the `echo` command... – Aacini Oct 01 '14 at 04:17
  • I think that @rns solution should be faster still - processing text files is what Perl does best – mvp Oct 01 '14 at 06:00
  • @mvp Yes, for sure, batch is not built for speed - but batch is native to Windows and is what the OP asked about. :) – foxidrive Oct 01 '14 at 06:10
  • But question was about performance as most important issue. With that, installing Perl does not seem like big impediment – mvp Oct 01 '14 at 06:33
  • @mvp: Talking about performance: How many time do you think the OP would take to download Perl and Marpa, and learn both subjects up to a degree enough to solve this problem? Assuming he knows nothing about Perl... – Aacini Oct 01 '14 at 16:51
  • @Aacini: in this case, rns was kind enough to provide full working solution. So OP would need like 5-10 min to get it working – mvp Oct 01 '14 at 16:54
  • 1
    @mvp: The [output](https://gist.github.com/rns/eab040af2e9e7336b1f1#file-output) produced by such "full working solution" is not the one requested by the OP. Also, I don't see where the input file name is placed in the Perl code ("lak-print01.txt" in this case)... – Aacini Oct 01 '14 at 20:28
  • @Aacini Your code works as well, I simply chose one at a whim and tested it. The redone code for both of them are exceptionally fast. I appreciate everyones help on this! – Alkemdah Oct 01 '14 at 21:26
3

The solution below assume that the input file have a fixed format, that is, that it has two header lines followed by blocks of 18 lines placed always in the same order. If this is true, this solution generate the output in a very fast way; otherwise, it must be modified accordingly...

@echo off
setlocal EnableDelayedExpansion

rem Create the array of variable names for the *desired rows* of data in the file
set "row[1]=Server name"
set "row[2]=Printer name"
set "row[4]=Driver name"
set "row[5]=Port name"
set "row[6]=Comment"
set "row[7]=Location"
set "row[15]=Printer status"

set i=0
(for /F "skip=2 delims=" %%a in (lak-print01.txt) do (
   set /A i+=1
   if defined row[!i!] (
      set "line=%%a"
      for %%i in (!i!) do for /F "delims=" %%v in ("!row[%%i]!") do set "%%v=!line:*%%v =!"
   )
   if !i! equ 18 (
      echo !Server name!,!Printer name!,!Driver name!,!Port name!,!Location!,!Comment!,!Printer status!
      set i=0
   )
)) > lak-print01.csv
Aacini
  • 65,180
  • 12
  • 72
  • 108