1

I have a file containing two columns of text. Using a batch file, I would like to extract the second column of text and get the string length then write the string length and the string text to an output file. The step that challenges me is determining the string length which has special characters. For example, the input file looks like:

escitalopram CN(C)CCC[C@@]1(C2=C(CO1)C=C(C=C2)C#N)C3=CC=C(C=C3)F
ibuprofen CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
keflex CC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)[C@@H](C3=CC=CC=C3)N)SC1)C(=O)O 
aspirin CC(=O)OC1=CC=CC=C1C(=O)O 
linoleic_acid CCCCC/C=C\C/C=C\CCCCCCCC(=O)O

I can read the file extracting the two tokens using a batch command line and argument %1. I have tried a few of the subroutines I found in the discussion groups but I can not get them to work. The "=" sign and perhaps the other special characters cause problems. I looking for a solution that would produce an output file like. ignoring the "@","/" and "\" signs:

escitalopram 49
ibuprofen 29 
keflex 58 
aspirin 24
linoleic_acid 25 

My program thus far looks like:

@echo off
setLocal EnableDelayedExpansion enableextensions


set arg1=%1

FOR /F "tokens=1,2 delims= " %%r IN (%1) DO (
set teststring="%%s"
echo "Passing     " %%s
call :GetStrLength %%s
echo.%%s
goto :EOF
)
  ::========================
  :GetStrLength
  setlocal enableextensions

set s=%1
echo " counting.... " %1

:: Get the length of the quoted string assuming a max of 255
set charCount=0
for /l %%c in (0,1,255) do (
  set si=!s:~%%c!
  if defined si set /a charCount+=1)
if %charCount% EQU 256 set charCount=0
echo The length of "%s%" is %charCount% characters
endlocal & goto :EOF

Any help would be appreciated.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
  • The string length is working fine - it is the passing of the parameter to the subrotine that is broken. What happens if you just don't use a subroutine and just put the string length inline? – Jerry Jeremiah Feb 16 '14 at 22:53
  • Try just using `call :GetStrLength "%%s"` and `set s=%~1` so that the paranmeter is quoted – Jerry Jeremiah Feb 16 '14 at 23:01
  • @JerryJeremiah No, you can't handle every possible string this way, CALL will break parameters-by-value. You need to use by-reference. – jeb Feb 16 '14 at 23:41
  • Cool - SMILES strings. I haven't worked with those in ages :-) – dbenham Feb 16 '14 at 23:56

4 Answers4

2

You can use a strlen function, but you should use byre instead of byval parameters.

This function can handle any string and it needs always 13 loops to determine the length.
As a variable in batch can contain not more than 8191 characters this is enough.

echo off
set "myString=Any content"
call :strlen result myString
echo %result%
exit /b

:strlen <resultVar> <stringVar>
(   
    setlocal EnableDelayedExpansion
    set "s=!%~2!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if "!s:~%%P,1!" NEQ "" ( 
            set /a "len+=%%P"
            set "s=!s:~%%P!"
        )
    )
)
( 
    endlocal
    set "%~1=%len%"
    exit /b
)
jeb
  • 78,592
  • 17
  • 171
  • 225
1

The = causes problems because it is not quoted, and the batch parser treats = as a token delimiter. When you pass an unquoted string containing = as a parameter, the string is broken at each = into multiple parameters. It should be possible to fix your code with the addition of some strategically placed quotes, as well as use of the ~ parameter expansion modifier to remove enclosing quotes as needed. This is not a general solution, but it should work in your case because I don't think SMILES strings ever contain the " character. Note that a quoted string containing quotes would contain some portion of the string that is effectively not quoted.

Here is your code fixed. I've removed some of the unneccessary code and some of the diagnostic messages.

@echo off
setlocal

FOR /F "tokens=1,2 delims= " %%r IN (%1) DO (
  echo Passing     "%%s"
  call :GetStrLength "%%s"
  goto :EOF
)

::========================
:GetStrLength
setlocal enableDelayedExpansion

set "s=%~1"
echo counting.... %1

:: Get the length of the quoted string assuming a max of 255
set charCount=0
for /l %%c in (0,1,255) do (
  set si=!s:~%%c!
  if defined si set /a charCount+=1
)
if %charCount% EQU 256 set charCount=0
echo The length of "%s%" is %charCount% characters
endlocal & goto :EOF

Below is a fully working script that computes the length of each SMILES string after removing the stereochemistry characters. (I'm curious why you want that value). It uses a corrected version of the very fast strlen function in jeb's answer. I added the USEBACKQ option to the intial FOR /F loop just in case a user passes a quoted file name that contains spaces.

@echo off
setlocal enableDelayedExpansion

for /f "usebackq tokens=1,2 delims= " %%A IN (%1) do (
  set "SMILES=%%B"
  for %%C in (@ / \) do set "SMILES=!SMILES:%%C=!"
  call :strlen len SMILES
  echo %%A !len!
)
exit /b

:strlen <resultVar> <stringVar>
setlocal enableDelayedExpansion
set "s=!%~2!#"
set "len=0"
for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
  if "!s:~%%P,1!" NEQ "" (
    set /a "len+=%%P"
    set "s=!s:~%%P!"
  )
)
endlocal&set "%~1=%len%"
exit /b
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • The script works as promised so thanks. You recognized the SMILES strings as well as the sterochemistry notations. As you can appreciate the length of the SMILES string is of little value on its own. However, I hope to be able to build on this in the future. – user3245262 Feb 17 '14 at 03:33
0

To get the length of the string, I find the following method quite efficient.

@echo off
setLocal EnableDelayedExpansion

set s=%*
set length=0

:count
if defined s (
    if "!s:~0,1!" NEQ "@" if "!s:~0,1!" NEQ "/" if "!s:~0,1!" NEQ "\" set /A length += 1
    set "s=%s:~1%"
    goto count
)

echo %length%
unclemeat
  • 5,029
  • 5
  • 28
  • 52
  • This does not work on the inputs the OP asked about. – mbroshi Feb 16 '14 at 23:03
  • This method fails when inputted with `!` and `=`, which the OP has not asked to exclude. Furthermore, this method **does** count `@`, `\` and `/` which the OP has asked to exclude from the count. – Monacraft Feb 16 '14 at 23:04
  • @Monacraft I'll work on making it exclude those characters, do you have any suggestions to make it include `!` and `=`? – unclemeat Feb 16 '14 at 23:08
  • It now won't include those characters. My apologies for not reading the question thoroughly. – unclemeat Feb 16 '14 at 23:12
  • it also includes `=` but not `!`. – unclemeat Feb 16 '14 at 23:14
  • @unclemeat You might be interested in [this question](http://stackoverflow.com/questions/21818298/wierd-results-using-script-to-find-length-of-a-string). – Monacraft Feb 16 '14 at 23:36
0
@ECHO OFF
SETLOCAL
FOR /f "tokens=1*delims= " %%a IN (q21817684.txt) DO (
 SET /a count=0
 SET "chemical=%%a"
 SET "formula=%%b"
 CALL :report
)
GOTO :EOF

:report
SET "formula=%formula:@=%"
SET "formula=%formula:\=%"
SET "formula=%formula:/=%"
:reportl
IF DEFINED formula (
 SET "formula=%formula:~1%"
 SET /a count +=1
 GOTO reportl
)
ECHO %chemical% %count%

GOTO :eof

I used a file named q21817684.txt for my testing. Yor data has a trailing space after the formula for keflex and aspirin. I eliminated that for my testing, but adding

SET "formula=%formula: =%"

at the obvious point should be equivalent.

Magoo
  • 77,302
  • 8
  • 62
  • 84
  • This strlen method will be much slower than the method of the OP, as backward GOTOs will read always the complete file, _for each character_. – jeb Feb 16 '14 at 23:46