0

I have a Very large (13 GiB) csv file (3856321 rows and 1698) where as expected some of the dates are differently formated. The file looks like ::

2013/01/08 2:11:30 AM,abdc,good time ...
2015/12/28 8:19:30 PM,abdc,good time ...
2/15/2016 10:46:30 AM,kdafh,almost as good ...
12/13/2014 10:46:00 PM,asjhdk,not that good ...
02-Jan-2014,bad time,good time ...
1/1/2015,nomiss time,boy ...
10/15/2016 17:08:30,bad,boy ...

I want to convert it to a same time format and required output is ::

1/8/2013 2:11:30,abdc,good time
12/28/2015 20:19:30,abdc,good time
2/15/2016 10:46:30,kdafh,almost as good
12/13/2014 22:46:00,asjhdk,not that good
1/2/2014 00:00:00,bad time,good time
1/1/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy

I managed to format the time using the following scripts

 awk -F ',' 'BEGIN{FS=OFS=","}{split($1,a," "); 
 if(a[3]=="PM") 
 {  split(a[2],b,":"); 
    b[1]=b[1]+12    
    a[2]=b[1]":"b[2]":"b[3]
 };
 if(a[2]=="")
 {
        a[2]="00:00:00"
 }
tmp=a[1];
# tmp2=system("date -d `tmp` +%m/%d/%Y");
# print tmp2
$1=tmp" "a[2]
 }1' time_input.csv

I borrowed the idea of formatting dates from question https://unix.stackexchange.com/questions/177888/how-to-convert-date-format-in-file which is commented out in the second last line. However, this does not work in my case. I get an error

date: invalid date ‘+%m/%d/%Y’

Is there an easier and better way to do this? Thanks in advance

Community
  • 1
  • 1
discipulus
  • 2,665
  • 3
  • 34
  • 51

3 Answers3

1

Awk sure is one great way of doing it but since it's really early morning here I don't like to think about all those ifs so here is one in php, since it's got a really nice strtotime function:

$ cat program.php
<?php
  $handle = fopen("file", "r");
  if ($handle) {
    while (($line = fgets($handle)) !== false) {
      // process the line read.

      $arr = explode(",", $line, 2);                     
      echo date("m/d/Y H:i:s", strtotime($arr[0])), ",", $arr[1];

    }
    fclose($handle);
  } else {
  // error opening the file.
}

Run it:

$ php -f program.php
01/08/2013 02:11:30,abdc,good time
12/28/2015 20:19:30,abdc,good time
02/15/2016 10:46:30,kdafh,almost as good
12/13/2014 22:46:00,asjhdk,not that good
01/02/2014 00:00:00,bad time,good time
01/01/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy

The read line by line loop comes from here: How to read a file line by line in php. I only added lines with explode and strtotime.

The explode splits the line to pieces by the first , and stores them to array $arr. strtotime function is applied to the first element $arr[0]. $arr[1] is later outputed as-is.

Community
  • 1
  • 1
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • Thanks! Although it does what I asked for, I am accepting above python example because that seemed a tad faster for larger files. Thanks! – discipulus Nov 02 '16 at 04:16
1

With Python, using the dateutils and csv modules:

import dateutil.parser as parser
import csv

with open('time_input.csv', 'rb') as inputfile, open('time_output.csv', 'w') as outputfile:

  reader = csv.reader(inputfile, delimiter=',')
  writer = csv.writer(outputfile)

  for row in reader:
    row[0] = parser.parse(row[0]).strftime('%m/%d/%Y %H:%M:%S')
    writer.writerow(row)

The result is output to time_output.csv file.

SLePort
  • 15,211
  • 3
  • 34
  • 44
1

You can try below awk command -

Input

vipin@kali:~$ cat kk.txt
2013/01/08 2:11:30 AM,abdc,good time
2015/12/28 8:19:30 PM,abdc,good time
2/15/2016 10:46:30 AM,kdafh,almost as good
12/13/2014 10:46:00 PM,asjhdk,not that good
02-Jan-2014,bad time,good time
1/1/2015,nomiss time,boy
10/15/2016 17:08:30,bad,boy

filtering -

vipin@kali:~$  awk -F"," '{split($1,a," "); printf ("%s,%s,%s",$2,$3,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}'  kk.txt
abdc,good time,,01/08/2013 02:11:30
abdc,good time,,12/28/2015 08:19:30
kdafh,almost as good,,02/15/2016 10:46:30
asjhdk,not that good,,12/13/2014 10:46:00
bad time,good time,,01/02/2014 00:00:00
nomiss time,boy,,01/01/2015 00:00:00
bad,boy,,10/15/2016 17:08:30

Move the filtered output to file kk.txt2

vipin@kali:~$  awk -F"," '{split($1,a," "); printf ("%s,%s,%s",$2,$3,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}'  kk.txt > kk.txt2

Output

vipin@kali:~$ awk -F"," '{print $NF,$1,$2}' OFS="," kk.txt2
01/08/2013 02:11:30,abdc,good time
12/28/2015 08:19:30,abdc,good time
02/15/2016 10:46:30,kdafh,almost as good
12/13/2014 10:46:00,asjhdk,not that good
01/02/2014 00:00:00,bad time,good time
01/01/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy

Explanation -

Use Split function on column 1 and put it in a and then use system function of awk to format the date as per our need.

I can print the output in order but it was printing a leading zero so i am printing formatted date in last column that is why i am moving the data in another file. and finally you can print the column in your order.

VIPIN KUMAR
  • 3,019
  • 1
  • 23
  • 34
  • Thanks! Since I had multiple columns (>1500, which I didn't mention in my question, Apologies ). This print wouldn't work very well but I got the idea of using date in awk with dynamic variables. However, since the python idea above was easier, I am accepting tha – discipulus Nov 02 '16 at 04:15