-2

I am trying to get the first and last line of a set of files in a directory.

I have a set of .log files. Want to be able to get the first and last line (there is no whitespace/gap at the top/bottom of each file).

I want to grab the first and last line from the oldest file first and keep repeating until its got to the most recent file (this to be done via modification time) and get the results from all of them into one file

e.g. the below are a set of files, I want a line before each file output so I know which line belongs from which log file. Have manually entered the log files in the order but would rather the script chose the oldest-newest automatically.

echo "S26" >> RunTimes.txt ; sed -n -e '1p;$p' S26.log >> RunTimes.txt ;
echo "S27" >> RunTimes.txt ; sed -n -e '1p;$p' S27.log >> RunTimes.txt ;
echo "S28" >> RunTimes.txt ; sed -n -e '1p;$p' S28.log >> RunTimes.txt ;
echo "S29" >> RunTimes.txt ; sed -n -e '1p;$p' S29.log >> RunTimes.txt ;
echo "S30" >> RunTimes.txt ; sed -n -e '1p;$p' S30.log >> RunTimes.txt ;
echo "S31" >> RunTimes.txt ; sed -n -e '1p;$p' S31.log >> RunTimes.txt ;
echo "S32" >> RunTimes.txt ; sed -n -e '1p;$p' S32.log >> RunTimes.txt ;
echo "S33" >> RunTimes.txt ; sed -n -e '1p;$p' S33.log >> RunTimes.txt ;

There has to be a more efficient way of doing this, any help is much appreciated.

Thanks

EDIT

Thanks to @JorgeBellon for the heads up, when I try and convert

sed -n '1p' to '1p;$p'

I receive -bash: $0 >> RunTimes.txt: command not found

This is the complete query below:

ls -t | xargs -n1 bash -c 'echo $0 >> RunTimes.txt; sed -n "1p;$p" $0 >> RunTimes.txt'

Not sure if it is because of using bash that is does not like how it is formatted?

As a workaround I tried using

head -n1 && tail -n1

In the hope of getting first and last line but no success.

If i use double quotes so "1p;$p" as oppose to '1p;$p' the query runs, but only get first line back of each log.

Regards

rdbmsNoob
  • 121
  • 1
  • 12

3 Answers3

1

This might work for you (GNU sed, tac and ls):

sed -ns '1p;$p' $(ls -t | tac)

The sed command line option -s allows the line number address in sed commands to address each file separately when multiple files are used as input.

The input file set uses the names of files in the current directory sorted in reverse time modified order.

The result is a single stream of output of first and last lines of all files in the directory in oldest to youngest.

To get the file name too, use:

sed -ns '1F;1p;$p' $(ls -t | tac)
potong
  • 55,640
  • 6
  • 51
  • 83
  • this is great as is JorgeBellon one he suggested. The difficulty I am having is having the name of the file before each one that is read. For example I want the name of the file being read, then the first/last line and then the subsequent next file. Is there a workaround that you know of? Thanks again – rdbmsNoob Aug 24 '20 at 09:22
  • much apprecaited! – rdbmsNoob Aug 25 '20 at 10:41
0

To sort the files by timestamp you can use ls -t. If you want the reverse ordering ls -rt.

For a given file, you can get the first and last lines of that file with sed -n '1p;$p'.

You can feed each file in ls to sed using xargs:

ls -t | xargs -n1 sed -n '1p;$p'

You need to pass -n1 argument to xargs, so that sed gets one file at a time. Otherwise the first and last lines will be for all the files shown by ls.

If you want to append the name of the file before the contents, you need something more ellaborated, like using another shell and storing the argument from xargs:

ls -t | xargs -n1 bash -c 'echo $0 >> RunTimes.txt; sed -n "1p;\$p" $0 >> RunTimes.txt'

You can possibly get something simpler with awk.

... | bash -c 'awk "BEGIN{print \"$0\"}NR==1{print}END{print}" $0'

If the result file is already created and you want to skip it, just filter the input for xargs:

ls -t | grep -v RunTimes.txt | xargs ...

or if you want to also skip directories, recurse to subdirectories, etc. you can use find (does not sort by date) and then feed that to ls.

find . -type f -not -name RunTimes.txt | xargs ls -t | xargs ...
Jorge Bellon
  • 2,901
  • 15
  • 25
  • Thats certainly alot easier, is there anyway of having the xargs to exclude a file, I am outputting the results to a text file in the same directory. Looks like it gets read as part of this as I get an additional line. Is there anyway of adding a line for each file that gets read to the output file. Currently I had to echo and manually type the start of the file minus the .log is there anyway to incorporate this into the script? Thanks again! – rdbmsNoob Aug 21 '20 at 11:52
  • If you want to skip files I suggest swapping `ls` for `find` or do a `grep`, such that any input `xargs` receives is what you actually are interested into. I'll expand the question. – Jorge Bellon Aug 21 '20 at 11:54
  • 1
    Let's link the ultimate [why not parse ls and what to do instead](https://unix.stackexchange.com/questions/128985/why-not-parse-ls-and-what-to-do-instead). Of course the solution without `ls` like: `find -type f -printf '%T@ %p\0' | sort -zk 1nr | sed -z 's/^[^ ]* //' | xargs -0 sed ..` doesn't look particularly nice. – KamilCuk Aug 21 '20 at 12:08
  • @KamilCuk thanks. In the examples I've shown the output of `ls` is just path names, so there isn't any complicated parsing necessary. In any more complex cases such as skipping directories I agree using `find` is the way to go. – Jorge Bellon Aug 21 '20 at 13:01
  • @JorgeBellon this all but gives me what I need ls -t | xargs -n1 bash -c 'echo $0 >> RunTimes.txt; sed -n "1p;$p" $0 >> RunTimes.txt' But I dont get the last line added only the first for every log file. – rdbmsNoob Aug 21 '20 at 13:02
  • Then do `sed -n '1p' $0` or the better alternative `head -n1 $0`. – Jorge Bellon Aug 21 '20 at 13:06
  • Sorry, when I run this: ls -rt | xargs -n1 bash -c 'echo $0 >> RunTimes.txt; sed -n "1p;$p" $0 >> RunTimes.txt' It should give me both the first line and the end line but it only returns the first line as oppose to returning both? Is there something I am missing? – rdbmsNoob Aug 21 '20 at 13:34
  • 1
    Good stuff. Don't forget that awk stores the current filename being processed as `FILENAME`, so `printf"(%s|%s|%s\n", FILENAME,dat1, dat2)` might give you some ideas. – shellter Aug 21 '20 at 14:17
  • @shellter Nice. I didn't know about that. `awk` is awesome. – Jorge Bellon Aug 21 '20 at 14:28
  • @JorgeBellon if i convert the sed -n '1p' to '1p;$p' so it includes both first and last line, I receive -bash: $0 >> RunTimes.txt: command not found Note: This is the complete query: ls -rt | xargs -n1 bash -c 'echo $0 >> RunTimes.txt; sed -n '1p;$p' $0 >> RunTimes.txt' I have tried also using head -n1 && tail -n1 in the hope of getting first and last line but no success. Could this be an issue with the way formatting is done? If i use double quotes so "1p;$p" as oppose to '1p;$p' the query runs, but only get first line back of each log. – rdbmsNoob Aug 21 '20 at 14:45
  • @IanPorritt : Impossible to read/understand code embedded in comments. It's perfectly OK to add a section `'EDIT`` in you Q above with your new test and flag it in the comments so an individual is made aware of it. Good luck. – shellter Aug 22 '20 at 14:18
  • Thanks @shellter will give it a try now. – rdbmsNoob Aug 24 '20 at 08:18
  • @JorgeBellon have made an edit to my original question with your command, is there any reason why only the first line of a file is read and not the end? Not sure if it is the introduction of the bash command that is causing the issue? Thanks again by the way! – rdbmsNoob Aug 24 '20 at 09:40
  • Hi. When you use double quotes in bash, any `$` followed by aphanumeric name is interpreted as a variable. You can escape this by using a backslash. I missed this in the last snippet (you don't need this when using single quotes), so the sed expression was being evaluated as `1p;`. If you change the sed command to be `"1p;\$p"` it will work. i've updated the answer with the fix. – Jorge Bellon Aug 24 '20 at 16:35
  • @JorgeBellon awesome! it worked. Cheers again! Also is there a way of removing the extension name of a file the script goes through. For example the script looks at a set of files oldest to newest and prints the filename as the echo in the script. Is there a way of it only printing the name of the file not the full file name, so Test as opposed to Test.log? – rdbmsNoob Aug 25 '20 at 10:33
  • Yes it is possible, but I bet that is already answered in another question. If your problem is solved, consider marking one of the answers as the solution. It took me less than a minute to find a solution: https://stackoverflow.com/a/30007867/5809597 – Jorge Bellon Aug 25 '20 at 11:56
0

The easiest and at the same time the wrong approach would be to parse the output of ls -tr1 which would output a list of files sorted by date. However, when funny filenames are involved, this is the wrong way forward (Never Parse LS)

The robust solution is using GNU find and GNU sort to create a sorted list of filenames with the modification time in it:

$ find -type f -printf "%T@ %p" | sort -nz

This creates a NULL-delimited string. This can now be processed with a simple loop in the following way:

$ while IFS= read -rd"" line; do 
     file="${line#* }"
     # do magic with file
     command "$file"
  done < <(find -type f -printf "%T@ %p" | sort -rnz)

This answer is based on the detailed BashFAQ#003

The loop could also be replaced with just a single pipe-line as:

$ find -type f -printf "%T@ %p" | sort -nz | sed 's/\(^\|\x0\)[0-9.]* /\1/g' | xargs -0 -n1 command 
kvantour
  • 25,269
  • 4
  • 47
  • 72