1

Hello I am trying to output a CSV file and I keep getting part of my string written onto a new line.

The overall script reads in a CSV file, which has a time stamp, converts it and then appends the Epoch time to the end of the line as a new variable and outputs the filem.

#!/bin/bash 
OLDIFS=$IFS 
IFS=","
cat test.csv | while read Host AName Resource MName TimeStamp Integer_Value Epoch; 
do 

Epoch=$(date -d "$TimeStamp GMT" +%s)

if [ -z "$Epoch" ]
then
    (echo "$Host, $AName, $Resource, $MName, $TimeStamp, $Integer_Value, Epoch,";) >> target.csv

else
    (echo "$Host, $AName, $Resource, $MName, $TimeStamp, $Integer_Value, $Epoch,";) >> target.csv

fi

done

I am trying to set a header then write out the appended variable, expect, and this only happens on the new value, it drops the appended variable to a new line.

#Host, AName, Resource, MName, Actual Start Time, Integer Value
, Epoch,
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00.0, 0
, 1388998800,

Instead of

#Host, AName, Resource, MName, Actual Start Time, Integer Value, Epoch,
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00.0, 0, 1388998800,

When I move the order around it doesn't happen. Sorry I know this is probably simple I new to Unix scripting.

EDIT

I have now changed the code to:

#!/bin/bash 
OLDIFS=$IFS 
IFS=","
while read Host AName Resource MName TimeStamp Integer_Value Epoch
do 

Epoch=$(date -d "$TimeStamp GMT" +%s)

if [ -z "$Epoch" ]
then
    echo "$Host, $AName, $Resource, $MName, $TimeStamp, $Integer_Value, Epoch,"

else
    echo "$Host, $AName, $Resource, $MName, $TimeStamp, $Integer_Value, $Epoch,"

fi

done < test.csv  > target.csv

And i am still getting the same problems.

also as an additional question if anyone knows why I get : command not found date: invalid date `Actual Start TimeStamp GMT' when running the date part but it produces the correct date and the scripts run.

QuinsUK
  • 361
  • 2
  • 5
  • 14
  • 1
    Try removing all the semi-colons, and the brackets around the "echo" statements. Then remove the "cat test.csv" and add "< test.csv" after the "done". Then remove both the ">>" and add "> target.csv" after the "done". – Mark Setchell Jan 14 '14 at 15:30
  • Hi Mark, thank you for your response, I have tried that an it doesn't make a difference, it still happens. – QuinsUK Jan 14 '14 at 15:35
  • Have you run the script with `bash -x script`? The invalid date messages are surprising; the `bash -x` will (should) show which invocation of the `date` command generates the message. The newline is a little puzzling. I suggest trying `IFS=$',\n'`. If that works, then the explanation is that the `read` includes the newline in the input string, but by excluding newline from IFS, you prevent the shell from ignoring the newline. – Jonathan Leffler Jan 14 '14 at 16:04
  • Hi Jonathan, Thank you for your response, unfortunately changing the IFS hasn't made a difference, it still inputs a new line. – QuinsUK Jan 14 '14 at 16:10
  • You still have a semi-colon on the "while read" line. – Mark Setchell Jan 14 '14 at 16:11
  • Hi Mark, I have removed the semi-colon and it doesn't make a difference. – QuinsUK Jan 14 '14 at 16:15
  • 1
    You can remove the whole `if...fi` block and replace with simpler one line `echo "$Host, $AName, $Resource, $MName, $TimeStamp, $Integer_Value, ${Epoch:-Epoch},"`. If you move $Integer_Value at the beginning of the line, does the newline follow it? – alvits Jan 14 '14 at 16:36
  • Hi Alvits, thank you for the suggestion that help my learning but unfortunately the extra line is still there. – QuinsUK Jan 14 '14 at 16:39
  • 3
    How about showing us your input file? It may have spurious MS-DOS CR-LF in it. You can test it yourself with "cat -vet rbs.csv" and see if you see "^M" at the ends of the lines. – Mark Setchell Jan 14 '14 at 16:43
  • I'm suspecting the column for `Integer_Value` contains `^L`. But in the sample output, the header already has newline between `Integer Value` and `,`. Now I am suspecting line width. – alvits Jan 14 '14 at 16:48
  • @alvits: while it might be line width, the line width would be quite short — 63-64 characters if I counted correctly. – Jonathan Leffler Jan 14 '14 at 16:50
  • Can you show the output of `bash -x yourscript` on a 3-line `test.csv` file? Can you show the 3-line `test.csv` as well? Have you run both the input and the output past a hex dump or octal dump program (`od -c` for an example octal dump). What is your locale setting (LANG, LC_ALL etc environment variables)? – Jonathan Leffler Jan 14 '14 at 16:51
  • Thank you for your help, I have added additional information to the post. – QuinsUK Jan 14 '14 at 17:04
  • Mark- It does have ^M$ at the end of each of the lines. I am assuming that this would cause the additional carriage returns. – QuinsUK Jan 14 '14 at 17:11
  • 3
    Better remove those puppies! http://stackoverflow.com/questions/800030/remove-carriage-return-in-unix – Mark Setchell Jan 14 '14 at 17:19
  • Thanks for the trace output. It shows that (1) you don't want to submit the first line of the file to processing by `date` and (2) you have CRLF (DOS/Windows) line endings confusing things. If, after you've removed the CR `'\r'` from the lines you still get odd behaviour, revisit my `$IFS` suggestion (which may still not help, but which might help). And, if removing CRLF is a nuisance, consider (meaning 'test') `IFS=$',\r\n'`. – Jonathan Leffler Jan 14 '14 at 17:37

2 Answers2

1

Try this script:

IFS=[,$'\r']; while read Host AName Resource MName TimeStamp Integer_Value Epoch
do
   # ignore first line with headers
   [[ "$Host" == \#* ]] && continue

   Epoch=$(date -d "$TimeStamp GMT" +%s)

   if [ -z "$Epoch" ]; then
     echo "$Host, $AName, $Resource, $MName, $TimeStamp, $Integer_Value, Epoch,"
   else
     echo "$Host, $AName, $Resource, $MName, $TimeStamp, $Integer_Value, $Epoch,"    
   fi
done < test.csv > target.csv

It does 2 things differently:

  1. It treats \r as field separator and doesn't include that in read variables
  2. It ignores your first line that is header of input CSV file
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

I would personally use awk here is how:

awk  -F"," '{timestamp=$5;  gsub(":"," ",timestamp); gsub("-"," ",timestamp);   EPOCH=(mktime(timestamp)*1000)} {print $0","EPOCH}' 1.csv 

Produces:

ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0,1388998800000

A 1 liner that does all of what you need:

So long as your time stamps are in that exact format then the gsum in the awk is remvoving : and - from the date format passing it to mktime to produce timestamp in seconds and finally printing each line entirely $0","EPOCH which is now converted time value

 awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp)*1000)
     } 
     {
      print $0","EPOCH
      }' your_File.csv 

Here it is expanded.

Now to expand this so that you read this file parse it through awk and then pump the output back into the same file you could something like this:

cp 2.csv 1.csv
cat 1.csv 
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
file="1.csv"; output=$(awk  -F"," '{ timestamp=$5;gsub(":"," ",timestamp);gsub("-"," ",timestamp);EPOCH=(mktime(timestamp));}{print $0", "EPOCH;}' $file 2>&1);  echo "$output" > $file
cat 1.csv 
ABCD89A, Admi , shop, Stall Cou t, 2014-01-06 09:00:00, 0, 1388998800
ABCD89N, Admi , shop, Stall Cou t, 2014-01-06 09:00:00, 0, 1388998800

Now to expand this method so that you ensure you do not overwrite the same file which has already been set with the time in seconds you could run something like this:

cp 2.csv 1.csv
 cat $file
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
 file="1.csv"; output=$(awk  -F"," '{ if (NF==7) { print "ERROR"; next; }else{timestamp=$5;gsub(":"," ",timestamp);gsub("-"," ",timestamp);EPOCH=(mktime(timestamp));}{print $0", "EPOCH;}}' $file 2>&1); if echo "$output"|grep -q "ERROR"; then  echo "$output"; else echo "$output" > $file; fi
 file="1.csv"; output=$(awk  -F"," '{ if (NF==7) { print "ERROR"; next; }else{timestamp=$5;gsub(":"," ",timestamp);gsub("-"," ",timestamp);EPOCH=(mktime(timestamp));}{print $0", "EPOCH;}}' $file 2>&1); if echo "$output"|grep -q "ERROR"; then  echo "$output"; else echo "$output" > $file; fi
ERROR
ERROR
 cat $file
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0, 1388998800
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0, 1388998800

You will notice on the 2nd run it outputs ERROR and does not actually overwrite the same file...

This way you could automate some script to come along and do this and feel safe that it won't add extra stuff to existing csv's

Or use a temp file for massive csv files: This is a pointless line I Was only testing if I could tee back into the same file which I found worked on odd occasions - really bizzare.

(awk  -F"," '{ timestamp=$5;gsub(":"," ",timestamp);gsub("-"," ",timestamp);EPOCH=(mktime(timestamp));}{print $0", "EPOCH;}' 1.csv 2>&1|tee /tmp/a; mv /tmp/a 1.csv;)

since this could have just been

(awk  -F"," '{ timestamp=$5;gsub(":"," ",timestamp);gsub("-"," ",timestamp);EPOCH=(mktime(timestamp));}{print $0", "EPOCH;}' 1.csv >/tmp/a; mv /tmp/a 1.csv;)

The first method using $output stores the csv into memory as a variable and then pushes back into the file. The second or last method probably the very last attempt of the /tmp file uses a temp file to process. The method you choose I guess could depend on the size of your CSV file. If we are talking gigs and its not a very powerful machine then temp files is the way to go. The memory is obviously cleaner and should be fastest of all.

Its just my input on this - it may come in handy for someone else wishing to do something similar

V H
  • 8,382
  • 2
  • 28
  • 48
  • Thank you for your response, how would you have used awk? Unfortunately that hasn't helped it still has the same problems. – QuinsUK Jan 14 '14 at 15:55
  • I have updated the answer to use a 1 liner in awk to produce what you need. – V H Jan 15 '14 at 10:01