0

I would like to use awk (though open to python/pandas solutions) to pull everything but a specific day form a timeseries dataset. The specific day happens only sometimes throughout the file, as it is a leapyear day that is only present if there were records being taken during a leapyear.

Dataset looks like this, as an example of where a leapyear-day instance occurs:

02-28   HammondBay  139 279 30  49.23281860 -123.96769714   4   5150    69.9
02-29   HammondBay  139 279 30  49.23281860 -123.96769714   1   1437    50.9
03-01   HammondBay  139 279 30  49.23281860 -123.96769714   4   5754    59.0
03-02   HammondBay  139 279 30  49.23281860 -123.96769714   4   5732    54.8
03-03   HammondBay  139 279 30  49.23281860 -123.96769714   4   5724    128.5

So the intended outcome, just to be clear is a file with every instance of 02-29 removed from this tab-delimited timeseries dataset.

geokrowding
  • 621
  • 1
  • 6
  • 13

3 Answers3

1

I came upon some removal (or stripping off from) methods at this site

The solution to the above problem, using awk, is thus:

awk '!/02-29/' file > temp && mv temp.whatever file
Community
  • 1
  • 1
geokrowding
  • 621
  • 1
  • 6
  • 13
1
awk '!/02-29/' your_file.txt | tee new_file.txt

How about grep:

grep -Ev '02-29' your_file.txt > new_file.txt
Vidhya G
  • 2,250
  • 1
  • 25
  • 28
  • ooo good use of grep. I like the inverse matching method. Thank you for the different methods! As an aside, is "tee" basically like using the > command? If so, is one more efficient? – geokrowding Mar 04 '15 at 02:00
  • Thxs. With > redirect prog output to a file. With Pipe (|) provide input to a program tee which saves to file and displays on the screen. 'Tee' as in a 'T junction" - here i split output of the awk to display and save. – Vidhya G Mar 04 '15 at 09:57
  • Good explanation. I am about to post another question in a couple minutes that I think you will have a solution for, stay tuned :) – geokrowding Mar 04 '15 at 10:00
  • thank you for the tip. Does the shoe up voting thing go both ways? ;) – geokrowding Mar 05 '15 at 08:29
1

To remove all lines that start with 02-29 prefix inplace, you could use sed -i:

$ sed -i '/^02-29/d' input.txt 

Or using grep + sponge:

$ grep -v '^02-29' input.txt | sponge input.txt

where sponge utility from moreutils allows to overwrite the file that is used in the pipeline as an input.

jfs
  • 399,953
  • 195
  • 994
  • 1,670