Linux script to filter lines in text file by date time pattern in the line

Question

I am not a linux script expert and I know that I can do this with some programming, but I am just try to find easy with to do it with some command line or simple script. Any help will be appreciated.

I have a csv file, contains following lines

Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3

16/12/2006,18:00:00,2.79,0.18,237.52,11.8,0,0,18
16/12/2006,18:01:00,2.624,0.144,238.2,11,0,0,17
16/12/2006,18:02:00,2.772,0.118,238.28,11.6,0,0,17
16/12/2006,18:03:00,3.74,0.108,236.93,16.4,0,16,18
16/12/2006,18:04:00,4.928,0.202,235.01,21,0,37,16
16/12/2006,18:05:00,6.052,0.192,232.93,26.2,0,37,17
16/12/2006,18:06:00,6.752,0.186,232.12,29,0,36,17
16/12/2006,18:07:00,6.474,0.144,231.85,27.8,0,37,16
16/12/2006,18:08:00,6.308,0.116,232.25,27,0,36,17
16/12/2006,18:09:00,4.464,0.136,234.66,19,0,37,16
16/12/2006,18:10:00,3.396,0.148,236.2,15,0,22,18
16/12/2006,18:11:00,3.09,0.152,237.07,13.8,0,12,17
16/12/2006,18:12:00,3.73,0.144,235.78,16.4,0,27,17
16/12/2006,18:13:00,2.308,0.16,237.43,9.6,0,1,17
16/12/2006,18:14:00,2.388,0.158,237.26,10,0,1,17
16/12/2006,18:15:00,4.598,0.1,234.25,21.4,0,20,17
16/12/2006,18:16:00,4.524,0.076,234.2,19.6,0,9,17
16/12/2006,18:17:00,4.202,0.082,234.31,17.8,0,1,17
16/12/2006,18:18:00,4.472,0,233.29,19.2,0,1,16
16/12/2006,18:19:00,2.852,0,235.61,12,0,1,17
16/12/2006,18:20:00,2.928,0,235.25,12.4,0,1,17
16/12/2006,18:21:00,2.94,0,236.04,12.4,0,2,17
16/12/2006,18:22:00,2.934,0,235.51,12.4,0,1,17
16/12/2006,18:23:00,2.926,0,235.68,12.4,0,1,17
16/12/2006,18:24:00,3.452,0,235.2,15.2,0,1,17

As shown above, the second field is time of the day with hh:mm:ss (up to minute resolution). Obviously, the data record is created here by every minute. I want to create script to do following:

For date field (the first field), give two dates: starting date "SDate" and ending date "EDate"
For time filed (the second field), give two hours: starting hour "SHour" and ending hour "Ehour"
The script will generate a 5 new data files:
- The first file will contains all the records that has minute of hour: 00, 05, 10, 15, 20 ... 00
- The second file will contains all the records that has minute of hour: 01, 06, 11, 16, 21 ... 01
- The third file will contains all the records that has minute of hour: 02, 07, 12, 17, 22 ... 02
- The fourth file will contains all the records that has minute of hour: 03, 08, 13, 18, 23 ... 02
- The fifth file will contains all the records that has minute of hour: 04, 09, 14, 19, 24 ... 02

score 1 · Answer 1 · answered Jun 29 '14 at 19:57

1

awk -F: 'NR>1{print > "file" (1+$2%5)}' input

this should do what you want. After running it, you should see 5 files: file1-file5

answered Jun 29 '14 at 19:57

Kent

189,393
32
233
301

score 0 · Answer 2 · answered Jun 29 '14 at 18:39

0

Use awk with multiple delimiters and use a separate line for each file.

awk -F'[:,]' '$3 % 5 == 0' Input.txt > File1
awk -F'[:,]' '$3 % 5 == 1' Input.txt > File2
awk -F'[:,]' '$3 % 5 == 2' Play.input > File3
awk -F'[:,]' '$3 % 5 == 3' Play.input > File4
awk -F'[:,]' '$3 % 5 == 4' Play.input > File5

answered Jun 29 '14 at 18:39

merlin2011

71,677
44
195
329

Sorry, I am new to post stackoverflow. How do I accept this answer. – bajie88 Jul 02 '14 at 10:10
the answer worked great – bajie88 Jul 02 '14 at 10:10
1

This works, it starts 5 processes and goes through the same file 5 times. – Kent Jul 02 '14 at 10:18
@bajie88 just click the green check mark. – merlin2011 Jul 02 '14 at 16:16
@kent that is a very fair point but there is a tradeoff between understandability and efficiency and this answer favors the former over the latter. Whether that choice makes sense depends on the op input size and technical level. – merlin2011 Jul 02 '14 at 16:45

Linux script to filter lines in text file by date time pattern in the line

2 Answers2