9

I am struggling to write a batch script which can read a CSV file such as below

Name:, City:, Country:
Mark, London, UK
Ben, Paris, France
Tom, Athens, Greece

There will be a heading row in the CSV file. It should output to a text file as below:

Name:Mark
City:London
Country:UK

Name:Ben
City:Paris
Country:France

Name:Tom
City:Athens
Country:Greece

The field separator (:) in the above output is expected to be provided in the header row itself. So all that I need to do is concatenate the field heading and its value.

The number of columns in this CSV file is not fixed, so the script should not limit to 3 tokens. Kindly help!

Chris
  • 3,438
  • 5
  • 25
  • 27
Benny
  • 639
  • 3
  • 11
  • 25
  • language agnostic (I can provide pointers on how to solve this in perl)? – Drav Sloan Jun 24 '11 at 16:11
  • Thanks coding in perl is not troubling me.`use strict; use Data::Dumper; my $filename = $ARGV[0]; open FILE, $filename or die; my $header = ; my @keys = split /,/, $header; my @data; for my $line (){ chomp; my @values = split /,/, $line; my %data_rows; @data_rows{@keys} = @values; push @data, %data_rows; } print Dumper(\\@data);` But I need a Windows batch script solution. I find it difficult to fully express with batch script. – Benny Jun 24 '11 at 17:24
  • 1. Are there always spaces after commas? 2. Can tokens (whether headers or values) contain spaces? 3. Is there a maximum possible number of tokens? 4. Can tokens contain commas? – Andriy M Jun 24 '11 at 17:37
  • 1.There will be no spaces after comma, although it would be better to catch that case. 2. Tokens can contain spaces. 3. The no. of tokens is not fixed. However, if you are referring to maximum permissible tokens yeah it can be hardcoded to some reasonable value (10 or something). 4. Tokens strictly cannot contain any commas. – Benny Jun 25 '11 at 03:09
  • The perl solution I figured out can be done with a one-liner itself. `perl -F, -lane "if ($.==1) {@keys = @F} else{print @keys[$_].$F[$_] foreach(0..$#F)} " ` Why I need batch script is because it is easier to share with my colleagues who may not have the priveleges to install perl on their machines. – Benny Jun 25 '11 at 08:20
  • @Benny: Unless your colleagues have absurdly old Windows machines, they will have access to VBScript and JScript with no need to install anything new. Either of these is vastly nicer to program in than batch language. – John Y Jun 25 '11 at 16:04
  • @John, True... for some reason that didn't cross my mind. Even though I am not proficient at either of those, I will try to make a code out of it. Meanwhile if you have something ready to boot, please share. This kind of task is quite common and repeating, I am sure lot of people will find it useful. – Benny Jul 08 '11 at 16:37
  • OK, I got around to do a Vbscript for this function (I just modified a code sample available on the net). I will share it so that others can use it. – Benny Oct 20 '11 at 13:14

3 Answers3

8
@ECHO OFF
IF "%~1"=="" GOTO :EOF
SET "filename=%~1"
SET fcount=0
SET linenum=0
FOR /F "usebackq tokens=1-10 delims=," %%a IN ("%filename%") DO ^
CALL :process "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i" "%%j"
GOTO :EOF

:trim
SET "tmp=%~1"
:trimlead
IF NOT "%tmp:~0,1%"==" " GOTO :EOF
SET "tmp=%tmp:~1%"
GOTO trimlead

:process
SET /A linenum+=1
IF "%linenum%"=="1" GOTO picknames

SET ind=0
:display
IF "%fcount%"=="%ind%" (ECHO.&GOTO :EOF)
SET /A ind+=1
CALL :trim %1
SETLOCAL ENABLEDELAYEDEXPANSION
ECHO !f%ind%!!tmp!
ENDLOCAL
SHIFT
GOTO display

:picknames
IF %1=="" GOTO :EOF
CALL :trim %1
SET /a fcount+=1
SET "f%fcount%=%tmp%"
SHIFT
GOTO picknames

This batch scipt:

  • accepts one parameter, the name of the file to process;

  • does not verify the presence of : at the end of a header token, and when the values are displayed they are placed immediately after the corresponding header tokens;

  • trims all the leading spaces (but not the trailing ones);

  • considers the first row to be the header row, which also defines the number of tokens to process in subsequent rows;

  • supports up to 10 tokens, and the two areas highlighted in bold italics are responsible for that (so when you need to change the maximum number, modify both areas: if you increase the number, you must expand the "%%a" "%%b" "%%c" … list, and, likewise, if you decrease the number, then shrink the list).

Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • Also to note that, it doesn't handle CSV files containing double quotes in it. I suppose one might come across such CSV files more frequently if they are created using Excel or other spreadsheet software. Can this be modified to handle this case? – Benny Jun 25 '11 at 15:13
  • Sorry I deleted my first comment since it was a wrong observation. The code works great! There are some edge cases though, but I will happily take it. – Benny Jun 25 '11 at 15:22
  • @Benny: Yeah, double quotes were on my mind, but I quickly dismissed the notion of them, and that on the pretext that you never said a word about them in your post, how convenient, isn't it? :) But I must say that if I did try to account for them in the script, it might take me much longer to come up with a working solution. – Andriy M Jun 25 '11 at 15:33
  • @Benny: Re: The system cannot find the file – Hard to say anything without at least having a look at those batch files. Does this script behaves the same? – Andriy M Jun 25 '11 at 15:38
  • @Benny: One more thing, just in case you are not aware. Loop variables in batch scripts are case sensitive. So if you are extending the highlighted list, make sure all the var names are lower-case. – Andriy M Jun 25 '11 at 15:42
  • Thanks Andriy! Yeah I forgot about mentioning them in my post, but this is still good. Personally if it is on my PC, I can resort to perl or python where it is easier to handle such edge cases. For others I will just advise to watch out. Reg, Loop variables, does that mean that I can chose to have any name? All these days I assumed that those were pre-defined variables for catching the token (like $1, $2 in perl) and I was about to ask you how to extend beyond 26 variables. Now I got my answer :) – Benny Jun 25 '11 at 15:49
  • @Andriy, Well I faced the same issue with your script also. It ran fine on my Work PC. Would it help if I post the contents of my Environment variables here? – Benny Jun 25 '11 at 15:50
  • @Benny: I've never encountered that kind of problem, so wouldn't like to waste your time. That is, I wouldn't like to waste mine either, but the thing is, I'd be glad to help, it's just that the issue is unknown to me. I can't even tell you at the moment what I myself would have looked at in the first instance if I'd faced the problem. – Andriy M Jun 25 '11 at 16:03
  • @Benny: Is the error displayed only when you are using an incomplete path/name? – Andriy M Jun 25 '11 at 16:05
  • @Benny: In any event you could try posting a question about the issue here or maybe on [superuser](http://superuser.com/). – Andriy M Jun 25 '11 at 16:08
  • lol! It's OK I try to will deal with it on my own. Hearty thanks for the code! Will try to learn a few tricks from it. Cheers! – Benny Jun 25 '11 at 16:09
  • @Andriy Yup, I was also thinking about superuser. But let me exhaust myself first! And no It didn't work even with complete path name either. – Benny Jun 25 '11 at 16:10
7

I know this is an old question, but this type of question is my favorite one so here it is my answer:

@echo off
setlocal EnableDelayedExpansion

rem Create heading array:
set /P headingRow=< %1
set i=0
for %%h in (%headingRow%) do (
    set /A i+=1
    set heading[!i!]=%%~h
)

rem Process the file:
call :ProcessFile < %1
exit /B

:ProcessFile
set /P line=
:nextLine
    set line=:EOF
    set /P line=
    if "!line!" == ":EOF" goto :EOF
    set i=0
    for %%e in (%line%) do (
        set /A i+=1
        for %%i in (!i!) do echo !heading[%%i]!%%~e
    )
goto nextLine
exit /B

This program have not any limit in the number of fields. This version requires to enclose in quotes the elements that may have spaces or other Batch delimiters, but this restriction may be easily fixed.

Aacini
  • 65,180
  • 12
  • 72
  • 108
  • I don't see where you are looking for commas in your script. I mean, both the example in the original post and the term *CSV* used by Benny indicate that the items in the source file are delimited with commas, not with spaces. (In fact, as the OP says in a comment, there should be no spaces after commas in source files, although handling them would be a nice bonus.) Having said that, I still think your answer should stay, as it may turn out useful to someone either as it is or as a starting point to a more elaborate solution. (And perhaps someone's upvote has already proven exactly that.) – Andriy M Dec 17 '11 at 19:50
  • 3
    The standard delimiters for Batch file parameters and FOR sets may be the comma, semicolon or equal sign (besides spaces), so there is no necessity to explicitly check for commas. However, this may be a problem if the headings or elements may have these characters (as I wrote in my answer). – Aacini Dec 18 '11 at 03:19
  • I've just made a couple of simple tests with strings containing commas, and it worked nice! Didn't know that (non-escaped) commas are treated in the `IN()` clause the same way as spaces, thanks! – Andriy M Dec 18 '11 at 20:55
2

Python makes this so easy it should be regulated by the government.

from csv import DictReader

with open('file', 'rb') as file:
    reader = DictReader(file)

    for line in reader:
        for field in reader.fieldnames:
            print '{0}{1}'.format(field.strip(), line[field].strip())

         print '\n'

Edit: I guess you need something native to the Windows command shell. Oh well.

Chris
  • 3,438
  • 5
  • 25
  • 27
  • I will give it an upvote for the elegant solution in Python. Haven't tested personally yet but I hope it can catch most of the edge cases well. – Benny Jun 25 '11 at 16:17