0

There are some rotate gz log file in the log dir,it rotaes every twenty minutes using logrotate with dateformat '.%s', as flows

ls -l /var/log/app/h323server.log.[1-9][0-9]*  |head
-rw-r--r-- 1 root adm   2063852 Mar 19 02:00 /var/log/app/h323server.log.1584554401.gz
-rw-r--r-- 1 root adm   2093937 Mar 19 02:20 /var/log/app/h323server.log.1584555601.gz

I want to print the corresponding log content between start_time timestamp and end_time timestamp,there are a few steps:

1, find out the log file and fill them into an array named totalfile

2, use for loop to read totalfile and print, the first and last element need to filter by start and end timestamp, print the rest of file direcly. I want to use for (( i=1; i<${arraylength}+1; i++ )); loop to achieve it,but something goes wrong.

The Bash script is as fllow:

#!/bin/bash

oldifs="$IFS"
IFS=$'\n'
declare -a filetime
declare -a filename
declare -a totalfile
index_1=0
index_2=0

for line in $(ls -l /var/log/app/h323server.log.[1-9][0-9]* |awk '{split($NF,a,".");print a[3],$NF}')
do
        filetime[${index_1}]=$(echo ${line} |awk '{print $1}')
        filename[${index_2}]=$(echo ${line} |awk '{print $2}')
        ((index_1++))
        ((index_2++))
done
IFS="$oldifs" 

index=0
timesys_s=1584945601
timesys_e=1584948001

# store the corresponding delaycompress and compress file to totalfile array
while [ ${index} -le $((${#filetime[@]}-1)) ]
do
        if [ ${index} -eq 0 ]
        then
                if [[ ${filetime[${index}]} -ge ${timesys_s} ]] || \
                   [[ ${filetime[${index}]} -le ${timesys_s} ]] || \
                   [[ (${filetime[${index}-1]} -ge ${timesys_s}) && (${filetime[${index}]} -le ${timesys_e}) ]]
                then
                        totalfile[${index}]=${filename[${index}]}
                fi
        else
                if [[ (${filetime[${index}-1]} -le ${timesys_s}) && (${filetime[${index}]} -ge ${timesys_s}) ]] || \
                   [[ (${filetime[${index}-1]} -ge ${timesys_s}) && (${filetime[${index}]} -le ${timesys_e}) ]] || \
                   [[ (${filetime[${index}-1]} -le ${timesys_e}) && (${filetime[${index}]} -ge ${timesys_e}) ]]
                then
                        totalfile[${index}]=${filename[${index}]}
                fi
        fi
        ((index++))
done

echo "length of totalfile:"
echo ${#totalfile[@]}
echo "content of totalfile:"
echo ${totalfile[@]}

# get length of totalfile
arraylength=${#totalfile[@]}

# use for loop to read all values and indexes
for (( i=1; i<${arraylength}+1; i++ ));
do
  echo $i " / " ${arraylength} " : " ${totalfile[$i-1]}
done

# can only print the first and last value when using ${array[index]} to loop
echo "the length of totalfile is: ${arraylength}"
echo "the 1st element: ${totalfile[0]}"
echo "the 2st element: ${totalfile[1]}"
echo "the 3st element: ${totalfile[2]}"
echo "the 4st element: ${totalfile[3]}"
echo "the 5st element: ${totalfile[-1]}"

the output is as follows:

length of totalfile:
5
content of totalfile:
/var/log/app/h323server.log.1584554401.gz /var/log/app/h323server.log.1584945601.gz /var/log/app/h323server.log.1584946801.gz /var/log/app/h323server.log.1584948001.gz /var/log/app/h323server.log.1584949201.gz
1  /  5  :  /var/log/app/h323server.log.1584554401.gz
2  /  5  : 
3  /  5  : 
4  /  5  : 
5  /  5  : 
the length of totalfile is: 5
the 1st element: /var/log/app/h323server.log.1584554401.gz
the 2st element: 
the 3st element: 
the 4st element: 
the 5st element: /var/log/app/h323server.log.1584949201.gz

The question is:

There are five element in the totalfile array, but only "${totalfile[0]}" and "${totalfile[-1]}" can print normally, while "${totalfile[1]}","${totalfile[2]}" and "${totalfile[3]}" does not print at all.

One more thing, when I use "${totalfile[-4]}","${totalfile[-3]}" and "${totalfile[-2]}", it works.

use -4,-3,-2, instead of 1,2,3

echo "the length of totalfile is: ${arraylength}"
echo "the 1st element: ${totalfile[0]}"
echo "the 2st element: ${totalfile[-4]}"
echo "the 3st element: ${totalfile[-3]}"
echo "the 4st element: ${totalfile[-2]}"
echo "the 5st element: ${totalfile[-1]}"

output:

the length of totalfile is: 5
the 1st element: /var/log/app/h323server.log.1584554401.gz
the 2st element: /var/log/app/h323server.log.1584945601.gz
the 3st element: /var/log/app/h323server.log.1584946801.gz
the 4st element: /var/log/app/h323server.log.1584948001.gz
the 5st element: /var/log/app/h323server.log.1584949201.gz

The os system is "Ubuntu 14.04.5 LTS".

I don't understand how it happens.And I'll be appreciated if anyone can explain it to me.

xoyabc
  • 137
  • 1
  • 10
  • 1
    Run `declare -p totalfile` to see the contents of the array. – codeforester Mar 23 '20 at 14:51
  • @codeforester let me try. – xoyabc Mar 23 '20 at 14:52
  • It does not work and the error is `line 7: declare: totalfile: not found` I cchanged '`declare -a totalfile` to `declare -a totalfile` – xoyabc Mar 23 '20 at 14:53
  • the index value changed also,not 0-4, it turns to 0,326,237,328,329 declare -a totalfile='([0]="/var/log/app/h323server.log.1584554401.gz" [326]="/var/log/app/h323server.log.1584945601.gz" [327]="/var/log/app/h323server.log.1584946801.gz" [328]="/var/log/app/h323server.log.1584948001.gz" [329]="/var/log/app/h323server.log.1584949201.gz")' – xoyabc Mar 23 '20 at 14:56
  • Should it be `print a[2],$NF` ? – Philippe Mar 23 '20 at 15:10
  • @Philippe the initial index is `1` of the array generated by `split` in awk,you can try and see. – xoyabc Mar 23 '20 at 15:13
  • Can you run `declare -p totaltime` ? – Philippe Mar 23 '20 at 15:16
  • @Philippe I have ran,it seems the index value is not correct, and i'm solving it. – xoyabc Mar 23 '20 at 15:20
  • Add `set -x` to your script and debug it. [how we debug bash scripts](https://unix.stackexchange.com/questions/155551/how-to-debug-a-bash-script). I don't understand the script - what is going on in the `while [ ${index} -le $((${#filetime[@]}-1)) ]` loop? – KamilCuk Mar 23 '20 at 15:36
  • @KamilCuk the array of filetime stores the ` timestamp` of all log files,while filename stores the `filename` of all log files. The index value of `filetime` and `filename` is same,and the `while [ ${index} -le $((${#filetime[@]}-1)) ]` loop to find the log find between start_time and endtime – xoyabc Mar 23 '20 at 15:57
  • @KamilCuk Thanks for your deleted answer and the advice.At the very beginning ,I use the way, in your answer,but when the number of gz file is big, `start_time < $2 && $2 < stop_time` will be executed by every gz file,it will takes many time,so I use the current way – xoyabc Mar 23 '20 at 16:01
  • @KamilCuk oops,you add again, thanks – xoyabc Mar 23 '20 at 16:02
  • `,it will takes many time` - bash comparisons will be very, very slow. In `awk` it will be very fast. Each time you write `[` it runs a separate process, each process has it's own address space etc. Use `[[` or better `((` for a little bit speed. Or the best is to just run a single process, like `awk`. – KamilCuk Mar 23 '20 at 16:05
  • @KamilCuk The script is to print log content between start_time and end_time,there are some content of script that I don't post,which is to print log content using `zcat -f ${totalfile[index_value]}`, if `start_time < $2 && $2 < stop_time ` is executed,it turn out to be very slow,I have added `time` to test before. BTW,I have used `-x `,but it does not print the index of totalfile array,it shows when running `declare -p totalfile`,but I don't know before.Thanks again,all of you. – xoyabc Mar 23 '20 at 16:17
  • @Philippe I add a variable named `index_t` and set its value to `0`,then add ` ((index_t++))` in the time comparison 'if' condition, `while [ ${index} -le $((${#filetime[@]}-1)) ] --> if [[ ${filetime[${index}]} -ge ${timesys_s} ]] and if [[ (${filetime[${index}-1]} -le ${timesys_s}) && (${filetime[${index}]} -ge ${timesys_s}) ]] `. it works now. – xoyabc Mar 23 '20 at 17:15
  • @KamilCuk Since awk can't print the content of `gz` file,so I have to use ·zcat· instead,but I don't want to compare time of every log file,only compare start_time and end_time of the fisrt and the last one,the rest of gz file just use `zcat` to print the log content – xoyabc Mar 23 '20 at 17:30
  • No, it can't. So use `zcat` on resulting list. You _filter_ the file names with `awk`. _Then_ you print with `xargs zcat`. Unix philosophy is one tool does good one job. – KamilCuk Mar 23 '20 at 17:56

2 Answers2

1

Storing state can be complicated in bash. Just parse the stream as it goes.

start_time='now -2 hour'
stop_time='now -1 hour'

# convert to seconds since epoch
start_time=$(date --date="$start_time" +%s)
stop_time=$(date --date="$stop_time" +%s)

# get list of files
( cd /var/log/app/ && find . -type f -name 'h323server.log.*.gz' ;) |
# extract the number
sed 's/\.\([0-9]*\).gz$/& \1/' |
# compare and print the filename
awk -v start_time="$start_time" -v stop_time="$stop_time" \
     'start_time < $2 && $2 < (stop_time + 20 * 60) { print $1 }' 
# I guess maybe also `(start_time - 20 * 60)` to fetch the previous one

Notes:

  • Nice script!
  • Use for ((i = 0; i < ${#array[@]}; ++i)) to iterate over array indexes. Or just for i in ${!array[@]}.
  • I prefer arithmetic expansion, instead of if [[ ${filetime[${index}]} -ge ${timesys_s} ]] I would if (( ${filetime[${index}]} >= ${timesys_s} )).

Or for example get the file before and after the match:

find . -type f -name 'h323server.log.*.gz' |
# extract the number
sed 's/\.\([0-9]*\).gz$/& \1/' |
# sort on numbers
sort -n -k2 |
# important - the input is sorted
# compare and print the filename 
awk -v start_time="$start_time" -v stop_time="$stop_time" '
    # Because i dont want to write  stop_time > $2 && $2 > start_time everrywhere, I cache it in cond variable
    # clear cond variable
    { cond=0 }
    stop_time > $2 && $2 > start_time {
        cond_was_true=1; # remember that at least once the condition was met
        cond=1; # if the condition is met, set cond variable
    }
    # so, if the condition is met
    cond {
        # output the previous line before the match if any
        # if we did not output the previous line yet (oncelast)
        # and the previous line length is not empty
        if (!oncelast && length(last) != 0) {
            # remember that we ouputted the previous line and output it
            oncelast=1
            print last;
        }
        # output the current line
        print $1;
        # there is nothing interesting below
        next;
    }
    # remember the previous line
    # the !cond could be just removed, it want be executed because next above
    !cond { last=$1; }
    # print one more line after the condition is true
    # if the condition was true before
    # but is no longer true
    # then only once output the next line after the condition was met
    cond_was_true && !cond && !once { once=1; print $1; }
'

If you seem to want to print the content of resulting files, add | xargs -d$'\n' zcat on the end of the scripts.

After the sort -n -k2 the input is sorted using the timestamps. So we have a condition stop_time > $2 && $2 > start_time and I am interested in the one line before and one after the range the condition is met for the input.

Above I used cond variable to just not write stop_time > $2 && $2 > start_time over and over again. I guess I'll try to rewrite a simpler version, but untested:

awk -v start_time="$start_time" -v stop_time="$stop_time" '

    stop_time > $2 && $2 > start_time {
        # if the condition wasnt true, output the previous line
        if (!cond_was_true &&
               # and the previous_line is not empty
               length(previous_line) != 0) {
            print last;
        }
        # remember that the condition was true
        cond_was_true = 1;
        # output the current line
        print $1;
    }

    # remember the previous line
    { previous_line = $1; }

    # if the condition was true
    # but is no longer met
    # output the next line
    # but output it only once
    cond_was_true && 
             !(stop_time > $2 && $2 > start_time) &&
             !output_next_line_once { 
         output_next_line_once = 1;
         print $1;
    }
'
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • It won't need `20 * 60` when using array,which is available when the rotate time is not 20 mins. – xoyabc Mar 23 '20 at 16:29
  • what does `last` mean in the second way? when I ran this,it display the filename and timestamp number on the first line,the rest of line print only the filename. ./h323server.log.1584974401.gz 1584974401 ./h323server.log.1584975601.gz ./h323server.log.1584976801.gz ./h323server.log.1584978001.gz ./h323server.log.1584979201.gz – xoyabc Mar 23 '20 at 17:20
  • There was a bug, should have been `last=$1`. I understood you need one file and one file after the match. `last` is used to store the line right before the match. – KamilCuk Mar 23 '20 at 17:59
  • Can you tell me how `!oncelast && length(last) != 0` and `!cond { last=$1; }` work, I only understand the `stop_time > $2 && $2 > start_time` pattern and action. – xoyabc Mar 24 '20 at 12:54
  • I'll added some comments. `!oncelast && length(last) != 0` - I use `oncelast` variable to check if the previous line was outputted already. If it was, then I don't want to output the previous line. If it wasn't, then set `oncelast` to true and output it. The `lenght(last) != 0` checks if the length of content of variable `last` is nonzero. In case there is no previous line before the condition is met, there is nothing to output. The `{last=$1;}` is used to remember the line before the condition was met. By "condition" I mean `stop_time > $2 && $2 > start_time`. The `!cond` could be removed... – KamilCuk Mar 24 '20 at 13:09
  • Thank for your detailed explanation.` !cond { last=$1; }` is only to remember the line before the condition was met,it can be removed, is right? `next;` is to terminal the currenct loop and go to next loop like `continue` in `C` programming language? – xoyabc Mar 25 '20 at 00:22
  • That's yes and yes. – KamilCuk Mar 25 '20 at 00:24
  • once ` !cond { last=$1; }` be removed.it can't output the the previous line which is not empty,beacause the initial value of `last` is empty,` !cond { last=$1; }` is used to store the previous line $1 to `last`,when `cond` is true and `oncelast` is not true,output the previous line before match,so ` !cond { last=$1; }` can't be removed. – xoyabc Mar 25 '20 at 02:45
  • I have tried,with ` !cond { last=$1; }` ,previous line can be outputed,while previous line cannot be outputed without it – xoyabc Mar 25 '20 at 03:05
  • I'm sorry. `!cond` can be remove,not ` !cond { last=$1; }`. – xoyabc Mar 25 '20 at 07:10
0

I'm thinking you should use readarray instead of a for loop to read the values: How to use 'readarray' in bash to read lines from a file into a 2D array

But if you just want to get the first and last lines, sed is likely a good option instead:

$ sed -n -e '1p' -e '$p' /etc/passwd
below cmd output started 2020 Mon Mar 23 08:19:32 AM PDT
root:x:0:0:root:/root:/bin/bash
apacheds:x:124:131::/var/lib/apacheds:/bin/bash

BTW, what do you require for a log file with only one line in it? Should it print the same line twice?

dstromberg
  • 6,954
  • 1
  • 26
  • 27
  • I want to loop all log file,here are 5 log file,but the question is 2st,3st,4st does not print when using `for (( i=1; i<${arraylength}+1; i++ ));`. the first and last element need to filter by start and end timestamp, while the rest of files print direcly. – xoyabc Mar 23 '20 at 15:32