4

I have a .txt document with over 32,000 lines of commented machine code. It looks like this:

Display menu window
C0/000E:    E220        SEP #$20
C0/0010:    C210        REP #$10
C0/0012:    20640B      JSR $0B64
C0/0015:    20750B      JSR $0B75
C0/0018:    C220        REP #$20
C0/001A:    A90001      LDA #$0100

I need to convert the code as follows for compiling purposes:

; Display menu window
SEP #$20
REP #$10
JSR $0B64
JSR $0B75
REP #$20
LDA #$0100

Specifically, that means:

  • Blank lines must remain unchanged.
  • If a line starts with "C0/" then the first 18 characters are to be deleted, including tabs.
  • Otherwise, it's a function title, so add a semi-colon followed by a space at the beginning (not mandatory).

Any help would be greatly appreciated.

UnknownOctopus
  • 2,057
  • 1
  • 15
  • 26
Sheldon M.
  • 49
  • 7
  • Strong suggestion: use a "real language" - not a .bat file! You can probably do it with a .bat file ... just as you can probably eat string beans through your nose instead of your mouth. It's just not recommended ;) SUGGESTION: Perhaps a [Powershell script](http://windows-powershell-scripts.blogspot.com/2009/06/awk-equivalent-in-windows-powershell.html)? – paulsm4 Jul 09 '15 at 01:30
  • This could also be done very easily in [java](http://stackoverflow.com/tags/java/info). Let me know if you choose this language and i'll be happy to help :D – UnknownOctopus Jul 09 '15 at 01:33
  • I'm unfortunately not yet familiar with any other language. If you can provide code for a different language and that can easily be converted into an executable of any sort, then I'll gladly accept that suggestion. :P – Sheldon M. Jul 09 '15 at 01:37
  • @SheldonM. In order to run the java code you would need to have [JDK](http://www.oracle.com/technetwork/java/javase/downloads/index.html) installed. But, i could provide the code for a java program to do what you ask. Like i said though, you would need to have [JDK](http://www.oracle.com/technetwork/java/javase/downloads/index.html) installed so you can compile and run the program. – UnknownOctopus Jul 09 '15 at 01:45
  • I can download JDK if need be. But you'll have to replace every "C0/" check with "C3/". (Sorry, I provided the wrong value for privacy reasons, I was going to make that change myself to the BAT code.) – Sheldon M. Jul 09 '15 at 01:50
  • @SheldonM. Do you want the output to be printed to a different file, or to override the current text in the original? – UnknownOctopus Jul 09 '15 at 02:01
  • It doesn't really matter, either way I'll keep a backup copy just in case. :P – Sheldon M. Jul 09 '15 at 02:04
  • @SheldonM. I've got the code in java, do you still want it? (Sorry for the late reply, i got sidetracked) – UnknownOctopus Jul 09 '15 at 02:54
  • Of course! I've been waiting to get my hands on your code before actually starting the JDK download. :) – Sheldon M. Jul 09 '15 at 03:01
  • Ok, and do you mind adding java to the tag list so the syntax highlighting shows up? – UnknownOctopus Jul 09 '15 at 03:15
  • Your requirements don't match your output. The first 18 characters of `C0/001A: A90001 LDA #$0100` are `C0/001A: A90001`, leaving ` LDA #$0100` remaining. But your example has the leading spaces removed. – indiv Jul 09 '15 at 03:16
  • Added the tag. My actual code has tabs in it, but this site converts them into spaces, which explains the discrepancy. The actual format: C3/XXXX: – Sheldon M. Jul 09 '15 at 03:24

5 Answers5

2

So, the following code (This is in java btw) will read the text from the file you provide, process it, and if the line starts with C3/, will print the line with the first 18 characters removed, and the white space on the beginning and end trimmed off. If the line does not start with C3/ then the line will be printed as is. (FYI this java code is probably faster than a batch file in terms of processing your enormous text file, which is why i recommended java in the first place :P)

import java.io.*;


public class ClassName{
    public static void main(String args[])throws IOException{
        PrintWriter file_out = new PrintWriter("OutputFileName.txt");
        BufferedReader br = new BufferedReader(new FileReader("OriginalFileName.txt"));

        String line, temp, out = "";
        while((line = br.readLine()) != null){
            temp = line.substring(0,3);
            if(temp.equals("C3/")){
                out = line.substring(18, line.length()).trim();

                file_out.println(out);

            }else{
                file_out.println(line);
            }



        }
        file_out.close();
    }

}

Of course replace OutputFileName.txt and OriginalFileName.txt with your text files. To compile and run this you will need to install and setup JDK. To see how to do this, click here. You can also find numerous other tutorials on the web on how to setup and use JDK. After you setup JDK, save this code as ClassName.java, compile it, and run it. Make sure that this program is saved in the same folder as your input/output files.

Note: Normally i wouldn't give out code like this but i was bored and was feeling nice :)

Also, i highly recommend you try to program in java a bit yourself. It's a very interesting and versatile language. If you have any other questions, feel free to let met know :D.

Example input:

Display menu window
C3/000E:    E220        SEP #$20
C3/0010:    C210        REP #$10
C3/0012:    20640B      JSR $0B64
C3/0015:    20750B      JSR $0B75
C3/0018:    C220        REP #$20
C3/001A:    A90001      LDA #$0100

Example output:

Display menu window
SEP #$20
REP #$10
JSR $0B64
JSR $0B75
REP #$20
LDA #$0100
UnknownOctopus
  • 2,057
  • 1
  • 15
  • 26
  • I feel like a jerk for wasting your time, but I ended up getting the batch file I was looking for, so I won't bother downloading Java...for now at least. :P I'll probably look into C or C++ when I have the time, though. Hopefully someone out there will find a use for the code you posted. – Sheldon M. Jul 09 '15 at 04:45
  • It's alright, didn't take me that long to do. I'm glad you got an easier solution :). – UnknownOctopus Jul 09 '15 at 04:48
2

Use of regular expression replace will solve your problem in single line:

sed -i -- 's/C0\/.....................//g' <your_file_name>

That of course assumes you have sed. I did this in linux and the content of test.txt got replaced as you required.

You can try windows version of sed from this site:

http://gnuwin32.sourceforge.net/packages/sed.htm

Arundale Ramanathan
  • 1,781
  • 1
  • 18
  • 25
  • Thanks for your time, but I don't have sed, and I ended up getting the simple batch file solution I was hoping for. – Sheldon M. Jul 09 '15 at 04:42
  • You are welcome. I do appreciate your reply, but I think need not bother with explanations on StackOverflow. From the upvote above, looks like its useful to others. – Arundale Ramanathan Jul 09 '15 at 18:18
2

The Batch file below is a different approach that may run faster than other similar methods, but this largely depends on the size of the file:

@echo off

for /F "tokens=1-2*" %%a in ('findstr /N "^" test.txt') do (
   for /F "tokens=1,2 delims=:/" %%d in ("%%a") do (
      if "%%e" equ "C3" (
         echo %%c
      ) else if "%%e" neq "" (
         echo ; %%e %%b %%c
      ) else (
         echo/
      )
   )
)

However, the fastest method is via a Batch-JScript hybrid script. Save the file below with .bat extension:

@set @Batch=1    /*
@cscript //nologo //E:JScript "%~F0" < test.txt
@goto :EOF & rem */

WScript.Stdout.Write(WScript.Stdin.ReadAll().replace
   (/^C3\/.{15}|^(..)/gm,function(A){return A.length==2?"; "+A:""}));
Aacini
  • 65,180
  • 12
  • 72
  • 108
  • Wow, that second option is indeed very fast, and works just as well as the other solutions I've tried. (My document is 1.5 MB, by the way.) Would you mind modifying your code so that I can drag-and-drop a .txt file on the .bat file to generate a new file with the modifications applied? – Sheldon M. Jul 09 '15 at 16:59
1

This batch file should meet your requirements. Just save it as whatever.cmd and run it with whatever.cmd file_to_process. Save the output by redirecting stdout, like whatever.cmd file_to_process > processed_file.

@echo off
set "DEL_TOKEN=C0/"
set "DEL_TOKEN_LEN=3"
set "CHARS_TO_REMOVE=18"
set "FILENAME=%~1"

SETLOCAL DisableDelayedExpansion
FOR /F "usebackq delims=" %%a in (`"findstr /n ^^ %FILENAME%"`) do (
    set "LINE=%%a"
    SETLOCAL EnableDelayedExpansion
    set "LINE=!LINE:*:=!"
    if not "!LINE!"=="" (
        if "!LINE:~0,%DEL_TOKEN_LEN%!"=="%DEL_TOKEN%" (
            set "LINE=!LINE:~%CHARS_TO_REMOVE%!"
        ) else (
            set "LINE=; !LINE!"
        )
    )
    echo(!LINE!
    ENDLOCAL
)

Line reader courtesy of jeb.

Community
  • 1
  • 1
indiv
  • 17,306
  • 6
  • 61
  • 82
1

I generally use JREPL.BAT to do regular expression text modification within the Windows command line.

JREPL.BAT is a pure script (hybrid JScript/batch) utility that runs natively on any Windows machine from XP onward. Full documentation is embedded within the script.

A single line is all that is needed for your problem. Assuming your file is "test.in" and your output is "test.out", then:

jrepl "^C0/.{15}|^." "|; $&" /t "|" /f test.in /o test.out

If you want to overwrite the original, then use /o - instead.

The JREPL solution is very fast.

If you want pure batch, then you could use the following optimized solution:

@echo off
setlocal enableDelayedExpansion
for /f %%N in ('find /c /v "" ^<test.txt') do set "cnt=%%N"
<test.in >test.out (
  for /l %%N in (1 1 %cnt%) do (
    set "ln="
    set /p "ln="
    if "!ln:~0,3!" == "C0/" (set "ln=!ln:~18!") else if defined ln set "ln=; !ln!"
    echo(!ln!
  )
)

If you want to overwrite the original, then add the following line to the very end:

move /y test.out test.in >nul
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • Thank you very much, this worked perfectly. You guys saved me at least a dozen hours of repetitive work. I compared the output with one created with jeb's code, and yours trimmed trailing whitespaces, which may or may not be a good thing depending on context. – Sheldon M. Jul 09 '15 at 04:38
  • @SheldonM.- The JREPL solution will not strip trailing whitespace. I forgot to mention that the batch solution will strip trailing control characters (including tabs). But it does not strip trailing spaces. Not an issue for your application, but the batch solution also is limited to 1021 characters per line. – dbenham Jul 09 '15 at 10:48