3

I am trying to create a batch file that will edit a .csv and remove the first column, and any summary lines contained in the file. I am, however, fairly new to programming batch files, so I am not sure the best way to start this, and it would be great if you could include a basic explanation of how the code works so I can be self-sustaining in the future!

,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
ABB - Egypt,,,,,,,,,,,
ElAin EL-Sokhna,,,,,,,,,,,
,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
,Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794
ABB - Egypt - Other,,,,,,,,,,,

There are various iterations of this file, as they come from a monthly report, I need to remove the first (empty) column, and any rows that look like ABB - Egypt,,,,,,,,,,, or Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794

So the output should be:

Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts,,Training Income,15000,,15000
Invoice,09-14-11,13002,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0

Thanks for the input!

EDIT: It seems I wasn't clear enough in my OP (Sorry, first time here).

There are two processes that need to happen here, in every file the first column must be deleted, and any lines that are either title lines ABB - Egypt,,,,,,,,,,, or summary lines Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794 need to be removed.

All lines that need to be kept will be mostly filled in, such as ,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance or ,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000 Notice that, as in the second line, it is possible for there to be some missing values in them, so doing a search for something like ",," will not work.

Tim
  • 88
  • 1
  • 5
  • Please explain the precise method of determining inclusion/exclusion. Why for instance is the "Income" line to be excluded? The "ABB-" lines might be being excluded because they end with a series of commas; the Invoice lines included because they start with a comma. What are the rules? – Magoo Jan 22 '15 at 00:29
  • Ooh! Ooh! I have a [great utility script](http://www.dostips.com/forum/viewtopic.php?f=3&t=6184) for this! – rojo Jan 22 '15 at 01:29
  • 1
    Why not do your editing in Excel and record a VBA macro? Then you can add the code to Open a CSV, perform the edits and save the results as a new file. – user4317867 Jan 22 '15 at 05:33

5 Answers5

1

A way will be to define all your rules in a variable which will be used against findstr. The rules must be defined like this :

/c:"String which exclude the line" /c:"Another string which exclude the Line" /c: "etc.."

This rules must be exact (That they can't be found in a line who must stay).

For the empty first colonne you can use a substitution the way i made it in the code with

,Type=Type ,Invoice=Invoice

Test.bat :

@echo off&cls
setlocal enabledelayedexpansion

Rem The rules   
set $String_To_Search=/c:"ABB - Egypt," /c:"Total ElAin El-Sokhna," /c:"ElAin EL-Sokhna," /c:"ABB - Egypt - Other,"

for /f "delims=" %%a in (test.csv) do (
 set $line=%%a

 Rem the substitutions for the first Column
 set $Line=!$Line:,Type=Type!
 set $line=!$Line:,Invoice=Invoice!

 Rem the test and the ouput if nothing was found
 echo !$Line! | findstr /i %$String_To_Search% >nul || echo !$Line!
))>Output.csv

I used a file test.csv for my test.

The ouput is redirected to Output.csv

SachaDee
  • 9,245
  • 3
  • 23
  • 33
1

Perhaps is this what you want?

@echo off
setlocal EnableDelayedExpansion

for /F "delims=" %%a in (input.csv) do (
   set "line=%%a"
   if "!line:~0,1!" equ "," echo !line:~1!
)

When a problem is not enough explained we can only guess the missing details. In this case, I assumed that you just want the lines that start with comma, deleting it. The output is the same as your output example...

EDIT: Output example added

Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Aacini
  • 65,180
  • 12
  • 72
  • 108
  • My mistake for not explaining the issue well enough - I went back and edited the OP to try and make it clearer. dbenham responded with a great answer so I think I've got it figured out now! – Tim Jan 22 '15 at 18:43
  • Excuse me. Did you tested my solution? It fails in any way? – Aacini Jan 22 '15 at 19:50
  • This only completes the first portion of the updates needed on the file, by removing the comma before the rows that I want to keep, but it doesn't remove the extra lines that are for titles and summary (total). – Tim Jan 22 '15 at 23:38
  • I tested my program with your data and posted the output above. The output have the header line and two lines that start in `Invoice,`. Which ones are the extra lines that it doesn't remove? I am afraid I don't follow you... `:(` – Aacini Jan 23 '15 at 01:12
1

Batch is a terrible language for modifying text files. There are a great many special cases that require arcane knowledge to work around the problem. You may have a script that seems to do what you want, and then some wrinkle appears in your data, and the entire script may have to be redesigned.

With regard to your specific problem, it appears to me that you only want to preserve rows that begin with a comma, meaning the first column is empty. Of those remaining rows, you want to remove the first (empty) column.

Assuming none of the rows you want to keep have an empty value for the second column, then there is a really trivial solution:

@echo off
>"%~1.new" (for /f "delims=, tokens=*" %%A in ('findstr "^," %1') do echo %%A)
move /y "%~1.new" %1 >nul

The script expects the file to be passed as the first and only argument. So if your script is named "fixCSV.bat", and the file to be modified is "c:\test\file.csv", then you would use:

fixCSV "c:\test\file.csv"

The %1 expands to the value of the first argument, and %~1 is the same, except it also strips any enclosing quotes that may or may not be present.

The FINDSTR command reads the file and writes out only lines that begin with a comma. The FOR /F command iterates each line of output. The "delims=, tokens=*" options effectively strip all leading commas from each line, and the result is in variable %%A, which is then ECHOed. The entire construct is enclosed in parentheses and stdout is redirected to a temporary file. Finally, the temporary file is moved over top of the original file, thus replacing it.

If the 2nd column may be empty, then the result will be corrupted because it removes all leading commas (both columns 1 and 2 in this case). The script must be more complicated to compensate. You would need to set a variable and then use delayed expansion to get the sub-string, skipping the first character. But delayed expansion will corrupt expansion of the %%A variable if it contains the ! character. So delayed expansion must be toggled on and off. You are beginning to see what I mean by lots of special cases.

@echo off
setlocal disableDelayedExpansion
>"%~1.new" (
  for /f "delims=" %%A in ('findstr "^," %1') do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    echo !ln:~1!
    endlocal
  )
)
move /y "%~1.new" %1 >nul

As the batch scripts become more complicated, they become slower and slower. It may not be an issue for most files, but if the file is really large (say hundreds of megabytes) then it can become an issue.

I almost never use pure batch to modify text files anymore. Instead, I use a hybrid JScript/batch utility that I wrote called JREPL.BAT. The utility is pure script that runs natively on any Windows machine from XP onward. JREPL.BAT is able to efficiently modify text files using regular expression replacement. Regular expressions can appear to be mysterious, but they are well worth the investment in learning.

Assuming you have JREPL.BAT somewhere within your PATH, then the following command is all that you would need:

jrepl "^,(.*)" "$1" /jmatch /f "yourFile.csv" /o -

The /F option specifies the file to read.

The /O option with value of - specifies that the output should replace the original file.

The /JMATCH option specifies that each replacement value is written out to a new line. All other text is dropped.

The first argument is the search expression. It matches any line that begins with a comma, and everything after that is captured in a variable named $1.

The second argument specifies the replacement value, which is simply the captured value in variable $1.

dbenham
  • 127,446
  • 28
  • 251
  • 390
0

I would start here to learn this: How can you find and replace text in a file using the Windows command-line environment?

It covers many details of substitution from Windows command line and many ways to do it, some requiring only what's built into Windows, and some requiring other downloadable software.

Magoo is right, more criteria is needed, but there might be enough information in the linked page for you to get past the main hurdles.

Community
  • 1
  • 1
Br.Bill
  • 637
  • 5
  • 12
0
@ECHO OFF
SETLOCAL
(FOR /f "tokens=*delims=," %%a IN ('findstr /b /l "," q28079306.txt') DO ECHO %%a)>newfile.txt

GOTO :EOF

I used a file named q28079306.txt containing your data for my testing.
Produces newfile.txt

Magoo
  • 77,302
  • 8
  • 62
  • 84