0

My scenario is the following:

I have a huge file with 700.000+ lines. I have to work with this, I name this file now trc.txt

The structure of each line of this file is like so:

20958 191014 07:43:57.08 CCComRPC DCMSGCFW_E PID:00000864.00001F40 Data:23
< PREP_FIXED::Process 0

I have a seconde file, I call it classID.txt with 300 lines. Each line have the following structure:

ID_Key;ClassName
720;ComEFM
721;CCComRPC
725;ComSSL
730;WOSA-CRD
731;WOSA-PIN

The aim is now to check my trc.txt how often a specific Class can be found.

The different possible class names are stored in the classID.txt and the name can be found in the fourth element from the left in each line inside the trc.txt.

My procedure right now was to save the different possible ClassNames inside a list-variable. For this I used this for-loop (oriented by this post)

set trcClasses=
for /f "tokens=2 delims=;"  %%i in (classID.txt) do set trcClasses=!trcClasses!,%%i

This seems to work perfectly.

Now to cope with my aim, I thought to iterate through my big-list trc.txt line by line and check each time if one element of the trcClasses occur. If this is so, to count, I implement a simple counter which then increments by one and for that I am using the following code:

for /f "tokens=4 delims= "  %%t in (trc.txt) do (
set "dataRow=%%~t"
set "break="
    for %%l in (%trcClasses%) do if not defined break (
        if not "!dataRow:%%l=!"=="!dataRow!" (
            set /a kumSum%%l+=1
            set "break=1"
        )
    )
)

I then return my values with this:

for%%l in (%trcClasses%) do (
    if (!kumSum%%l! NEQ 0) echo %%l !kumSum%%l!
)

First problem: Console have problems with some items in the classID.txt. I receive something like this:

Error: Division durch Null.
Missing operator

In my opinion this is caused by some of the names inside classID.txt like WOSA-PTR or TCP/IP

The bigger problem: Running the code takes approx. 12 minutes!

Any suggestions would be appreciated.

Compo
  • 36,585
  • 5
  • 27
  • 39
SRel
  • 383
  • 1
  • 11
  • 1
    `Set /A` is for performing basic 32-bit signed integer math. You're not doing math, so remove the `/A` option. I also don't understand the need for a second [tag:for-loop], why not just use, `@For /F "UseBackQ Tokens=2 Delims=;" %%I In ("%temp_dir%\ModulID.txt") Do @Set "kumSum%%I=0"` – Compo Jan 29 '20 at 13:33
  • thanks I will try it. I made the first one just to create my local variable as I need the list afterwards sometimes – SRel Jan 29 '20 at 13:41
  • @Compo Later on I need to perform arithmetic operations as I want to incremetn the counter later on – SRel Jan 29 '20 at 13:49
  • 1
    For us to help you with a solution for an issue, and for it to be within the context of your whole task, we need to see sufficient of your script to meet that requirement. Please [edit your question](https://stackoverflow.com/posts/59968084/edit) to create such an environment for us to work with. That said, my initial comment does not change that, do not use `Set /A` when you're not performing math. Also your error messages can only be a result of a mathmatic process, your only shown mathmatic command is `set /a kumSum%%l=0 ` which as I've stated should not be used with `/A`. – Compo Jan 29 '20 at 14:49
  • I edeited the code @Compo I hope this makes my aim more clearly – SRel Jan 29 '20 at 18:35

1 Answers1

1

You didn't specify your desired output format, so I had to guess. Reading each line with a for /f loop is slow, so it's better to process the 300 line file than the 700000 line file (the caching system of modern computers will help a lot).

@echo off
setlocal
for /f "skip=1 tokens=2 delims=;" %%a in (classID.txt) do (
  <nul set /p "=%%a;"
  <trc.txt find /c "%%a"
)

I added skip=1 to skip the header line in classID.txt.
The downside is, you have to read the big file ~300 times, but that still should be faster than processing it line by line (I'd appreciate some feedback about speed comparison of the two methods)

Output with your example files:

ComEFM;0
CCComRPC;1
ComSSL;0
WOSA-CRD;0
WOSA-PIN;0

PS: I assume you want to have the result in a file. Don't write it line by line (>> out.txt inside the for loop. That takes ages because the file has to be opened, read until "end of file", appended and closed again for each single line. Instead redirect the whole loop at once:

(for /f "skip=1 tokens=2 delims=;" %%a in (classID.txt) do (
  <nul set /p "=%%a;"
  <trc.txt find /c "%%a"
))>out.txt
Stephan
  • 53,940
  • 10
  • 58
  • 91
  • (about 6 minutes for 300/700000 lines) – Stephan Jan 29 '20 at 19:29
  • I could offer a 45sec solution, depending on some prerequisites (`ClassName` always the third word in the line, no other lines, all ClassNames counted (not only those in `classID.txt`), ClassNames don't contain characters that are used for arithmetic in `set /a`) – Stephan Feb 04 '20 at 19:20
  • this sounds intereseting, could you may give a little hint? in ModulID the name of the class is alsways the second word, inside the other file it is also every time at the same place, however there are also some dummy-lines with no value – SRel Feb 04 '20 at 19:29
  • Would be nice, if we could filter those dummy lines. What do they have in common (how to distinguish them from the "good" lines)? – Stephan Feb 04 '20 at 19:35