-3

been to a few stackoverflow page and cant seems to get the right answer.

I have the following data per line in a txt file.

0320024                       |CYLINDER, TWISTLOCK, DOUBLE ACTING--                                                                                                                                                                                                            |385508-105          |KK1-39                 |21-AUG-17|NEW                           |PIECE  

How do I process the txt file using batch script to make the outcome like this per line?

0320024|CYLINDER, TWISTLOCK, DOUBLE ACTING-- |385508-105|KK1-39|21-AUG-17|NEW|PIECE  

I have tried the following to read the lines of the txt file

for /F "tokens=*" %%A in (filename.txt) do [process] %%A

Would appreciate any help, thanks!

Roll IWSP
  • 13
  • 1
  • 7

2 Answers2

1

The easiest method is using REPL.BAT written by Dave Benham.

type "input.txt" | repl.bat "[ \t]+(?:(\|)|$)" "$1" >"output.csv"

The search regular expression [ \t]+(?:(\|)|$) means:

[ \t]+ ... find 1 or more spaces or horizontal tabs.

(?:...) ... non marking/capturing group for the OR expression inside this group.

(\|)|$ ... find a literal interpreted pipe character and capture this character if really found OR end of line without matching the newline characters.

The replace regular expression $1 references the found pipe character if there was one found by the search expression at all.

In other words this regular expression finds 1 or more spaces or tabs left to a pipe character and removes those whitespaces OR finds trailing spaces/tabs at end of a line and removes them too.

Use next the command move /Y "output.csv" "input.txt" to overwrite the input file with the produced output file.

It is of course also possible to use latest version of JREPL.BAT also written by Dave Benham.

To write the output to output.csv:

jrepl.bat "[ \t]+(?:(\|)|$)" "$1" /f "input.txt" /o "output.csv"

To do the replaces directly on input file:

jrepl.bat "[ \t]+(?:(\|)|$)" "$1" /f "input.txt" /o -

You have to use command CALL on calling either repl.bat or jrepl.bat when you need more to do and therefore those command lines are used within your batch file. In this case I suggest to use instead of just repl.bat or jrepl.bat:

... call "%~dp0repl.bat" ...
call "%~dp0jrepl.bat" ...

The batch file for the replace operation is called now with full path of your batch file. repl.bat or jrepl.bat must be stored in the directory of your batch file. Then it does not matter what is the current directory on running your batch file.

Even better would be the usage of search regular expression string [ \t]+(?=\||$) which uses an OR expression in a lookahead expression to produce a positive match for 1 or more spaces/tabs only when next character is the pipe character or the spaces/tabs are found at end of line. The replace string is in this case simply an empty string as only the spaces/tabs are matched by the search string.

Example:

call "%~dp0jrepl.bat" "[ \t]+(?=\||$)" "" /f "input.txt" /o -
Mofi
  • 46,139
  • 17
  • 80
  • 143
  • You can simplify the regex and replace expressions to `call jrepl "\s+(?=\||$)" "" ...` – dbenham Aug 24 '17 at 21:26
  • Also, I'd appreciate if people always use JREPL.BAT instead of REPL.BAT when composing answers (or comments). – dbenham Aug 24 '17 at 21:30
  • @dbenham You are right. JREPL.BAT uses JScript which uses JavaScript which supports lookahead but not lookbehind. Therefore I could have thought by myself about the better solution using a search regular expression string with a lookahead and an empty replace string. I added this better solution to my answer. I favor `[ \t]` for matching spaces/tabs over `\s` as `\s` matches any whitespace character according to Unicode specification which means it matches also carriage returns and line-feeds. That does not occur here on this task and `\s` can be used, but safer is usage of `[ \t]`. Thanks. – Mofi Aug 25 '17 at 06:00
1
  • This type of replacement is best done with Regular Expressions.
  • Here it seems all blank space in front of a vertical bar should be deleted (exception is the one space following acting-- )
  • both can be done with lookarounds a negative lookbehind (?<!-) and a positive lookahead \s+(?=\|)
  • As batch itself has no RegEx support (aside from findstr's limited RE) another scripting language or tool like J-/vbscript,PowerShell,sed is needed.

From Windows 7 on PowerShell is included, so this should do

powershell -Nop -C "(gc .\filename.txt) -replace '(?<!-) \s+(?=\|)'|sc NewName.txt"
  • gc is an alias for Get-Content and sc, well you guessed that, Set-Content

> type NewName.txt  
0320024|CYLINDER, TWISTLOCK, DOUBLE ACTING-- |385508-105|KK1-39|21-AUG-17|NEW|PIECE  

powershell -Nop -C "(gc .\filename.txt) -replace '(?<!-) \s+(?=\||$)'|sc NewName.txt"

Changed the positive lookahead to also check for the
line end $ with an or | following the literal \| (stolen from Mofi ;-)

> type NewName.txt
0320024|CYLINDER, TWISTLOCK, DOUBLE ACTING-- |385508-105|KK1-39|21-AUG-17|NEW|PIECE
  • Nice job preserving a space after `-`, as the OP showed in the example. But I suspect that was an error in the OP's sample output. I doubt the trailing space was intentional. – dbenham Aug 24 '17 at 22:02
  • @dbenham Agreed, but this was **the** occasion to demonstrate the just fully understood lookarounds ;-) –  Aug 24 '17 at 22:14
  • 1
    Yes, look-arounds can be mighty handy. I wish JSCRIPT supported look-behind, it would be nice to have that feature in JREPL.BAT. But no such luck :-( Only look-aheads are supported in JSCRIPT (and JREPL.BAT). – dbenham Aug 24 '17 at 22:29
  • Hi @LotPings, thanks for your answer. It helped me a lot. Another favor to ask, how do I remove the white space next to PIECE? – Roll IWSP Aug 28 '17 at 01:20
  • See appendix to my answer. –  Aug 28 '17 at 06:51