-1

I currently have a text file (file.txt) that looks like this sample:

String [date]
data1

String [another date]
data2

String [another date]
data3

I would like to have a batch file that keeps only the string from the first line of each block, knowing that the date is different each time. Here is an example of the output:

String
data1

String
data2

String 
data3

Since I have a text file of over 95,000 lines, I think I have to search for lines that contain a specific string, then delete everything on them, except for the string researched.

opharmos19
  • 25
  • 6

1 Answers1

0

String manipulation is one thing that batch files have always been terrible at. The cleanest way is to use a tool that understands regex and string search/replace. For example, using GNU Sed:

sed -e s/^String.*$/String/g file.txt > output.txt

Or using PowerShell which ships with Windows

powershell -noprofile -C "gc 'file.txt' | % { $_ -replace 'String.*','String' } | sc output.txt"

This particular problem doesn't need the full power of regex, and can be done in batch. The script goes through the file line by line. For each line, if it starts with "String ", then write "String" to the output; otherwise write the original line.

copy nul output.txt
for /f "tokens=1* delims=:" %%a in ('findstr /n "^" file.txt') do call :do_line "%%b"
goto :eof

:do_line
set line=%1
if {%line:String =%}=={%line%} (
  echo.%~1 >> output.txt
  goto :eof
)
echo string >> output.txt

The first line of the script uses a trick from this thread to preserve blank lines. It does have the downside that if any line in the original file starts with :, then the colon will be stripped off.

Community
  • 1
  • 1
Ryan Bemrose
  • 9,018
  • 1
  • 41
  • 54