1

This is the input .csv file

"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
.
.
.

Now, I want to convert the second column (date) to this specific format date -d '10/12/2017 2:10pm' +'%Y/%m/%d %H:%M:%S', so that "10/12/2017 2:10pm" converts to "2017/10/12 14:10:00"

Expecting output file

"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
.
.
.

I know it can be done by using bash or python, but I want to do it in one-line command. Any ideas? Is there a way to pass date result to sed?

Marc Lambrichs
  • 2,864
  • 2
  • 13
  • 14
Clinton Lam
  • 687
  • 1
  • 8
  • 27

3 Answers3

3

One-liner awk approach.

awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
    cmd |getline $2; close(cmd) }1' OFS=, infile #>>outfile
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...

This will output changes in your Terminal, you need to redirect the output to a file if you need record the output or use FILENAME to redirect the output to the input infile itself.

awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
    cmd |getline $2; close(cmd); print >FILENAME }' OFS=, infile

Or with GNU awk implementations which does support -i inplace identifier for in-place replace. see 'awk' save modifications in place

αғsнιη
  • 2,627
  • 2
  • 25
  • 38
  • 1
    However, you do have a large problem. You are losing the quotation marks around the fields. See if that can be corrected. – David C. Rankin Oct 17 '17 at 07:16
  • @ClintonLam No, this works on a file called `infile`. Checked it myself. And about the quotes, I wouldn't mind those, because your csv doesn't seem to have any. If it does, this can be fixed. αғsнιη, take it away. – Marc Lambrichs Oct 17 '17 at 07:40
  • `"item1","10/11/2017 2:10pm",77.300,14127044,23190700,77.700,77.320,77.300,77.750,56.600,79.650,76.127,71.239,142.415,36.709,0.000,,1575.53B,"A"` `"item2","10/12/2017 3:10pm",192.200,614721,942324,190.800,190.400,190.100,192.200,136.200,195.300,183.516,168.717,23.157,12.046,0.000,,367.49B,"B"` Failed on this set of text, only two rows here, date of 2nd row not replaced. – Clinton Lam Oct 17 '17 at 07:54
  • please see my update, @DavidC.Rankin that's corrected. – αғsнιη Oct 17 '17 at 08:37
  • @αғsнιη Yes, only 3 terms with quotes. But still, only 1st row was converted, all rows after that does not changes. – Clinton Lam Oct 17 '17 at 08:41
  • it works without any problem, please check your input file I for valid date in second column. Please avoid commenting clarifications here and directly [edit](https://stackoverflow.com/posts/46782016/edit) and update into your question. Thanks – αғsнιη Oct 17 '17 at 09:07
  • 1
    @αғsнιη Great job. This would be much more efficient than a blended shell, awk, sed solution. I have got to work with awk more... – David C. Rankin Oct 17 '17 at 20:55
1

You can do it in one line, but that begs the question -- "How long of a line do you want?" Since you have it labeled 'shell' and not bash, etc., you are a bit limited in your string handling. POSIX shell provides enough to do what you want, but it isn't the speediest remedy. You are either going to end up with an awk or sed solution that calls date or a shell solution that calls awk or sed to parse old date from the original file and feeds the result to date to get your new date. You will have to work out which provides the most efficient remedy.

As far as the one-liner goes, you can do something similar to the following while remaining POSIX compliant. It simply uses awk to get the 2nd field from the file, pipes the result to a while loop which uses expr length "$field" to get the length and uses that within expr substr "$field" "2" <length expression - 2> to chop the double-quotes from the end of the original date olddt, followed by date -d "$olddt" +'%Y/%m/%d %H:%M:%S' to get newdt and finally sed -i "s;$olddt;$newdt;" to perform the substitution in place. Your one-liner (shown with auto line-continuations for readability)

$ awk -F, '{print $2}' timefile.txt | 
while read -r field; do 
olddt="$(expr substr "$field" "2" "$(($(expr length "$field") - 2))")"; 
newdt=$(date -d "$olddt" +'%Y/%m/%d %H:%M:%S'); 
sed -i "s;$olddt;$newdt;" timefile.txt; done

Example Input File

$ cat timefile.txt
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...

Resulting File

$ cat timefile.txt
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...

There are probably faster ways to do it, but this is a reasonable length one-liner (relatively speaking).

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
0

Revised less ugly sed method:

sed 's/^.*,"\|",.*//g;h;s#.*#date "+%Y/%m/%d %T" -d "&"#e;H;g;s#\n\|$#,#g;s/^/s,/' input.csv | sed -f - input.csv 

Spread out, (it works the same):

sed 's/^.*,"\|",.*//g
     h;
     s#.*#date "+%Y/%m/%d %T" -d "&"#e;
     H;
     g;
     s#\n\|$#,#g;
     s/^/s,/' input.csv | sed -f - input.csv 

Output:

"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...

How it works:

The first sed block uses the evaluate command to run date, the output of which is used to generate some new sed substitute commands. To show the new s commands, temporarily replace the shell script | pipe with a # comment:

s,10/11/2017 2:10pm,2017/10/11 14:10:00,
s,10/12/2017 3:10pm,2017/10/12 15:10:00,

These are piped to the second sed.

agc
  • 7,973
  • 2
  • 29
  • 50
  • I don't think any one-liner for this problem will be 'pretty', but I like both or your and Marc's solutions which even a `sed` with 5 sub-expressions or the single `awk` solution are likely to be quicker than the combination of the two. – David C. Rankin Oct 17 '17 at 06:06
  • @DavidC.Rankin, Code that generates brute force code is often not too speedy, but sometimes it's easier/faster to code. But it's relevant to the OP's question about how to run `date` via `sed` (or *GNU* `sed` anyway). I'm thinking my `printf` can be removed, (as well as the backquotes around `date`), with an abler use of `sed` commands... – agc Oct 17 '17 at 06:17
  • I suspect you are correct. I would rather a single call to `sed` or `awk` over the mixed brute force approach. I'm still tinkering with `awk` to get rid of `expr`. But since we know it is always the 2nd field a single `sed` that looks for the first `,"` and captures `\([^"]*\)` as a back-reference would handle that which would then simply need the conversion to the new date format and substitute in place. Heck of a one-liner.... – David C. Rankin Oct 17 '17 at 06:22
  • This is quite an art! There is so much for me to learn. Thank you. – Clinton Lam Oct 17 '17 at 07:35
  • 1
    @agc A small fix, in `s#.*#date "+%Y/%M/%d %T" -d e;` `%M` should be `%m` instead. – Clinton Lam Oct 17 '17 at 12:21