you can use bash date
command as $(date +%Y-%m-%d)
:
for example, running as below will look for /user/hdfs/eventlog/2017-01-04.snappy
log file and output will be stored to /user/hdfs/eventlog_output/2017-01-04
hdfs
dir:
hadoop jar EventLogsSW.jar EventSuspiciousWatch /user/hdfs/eventlog/$(date +%Y-%m-%d).snappy /user/hdfs/eventlog_output/$(date +%Y-%m-%d)
to get specific date format see this answer OR type man date
command to learn more about date
...
update after more details provided:
1. explanation:
$ file=$(hadoop fs -ls /user/cloudera/*.snappy|grep $(date +%Y-%m-%d)|awk '{print $NF}')
$ echo $file
/user/cloudera/xyz.snappy
$ file_out=$(echo $file|awk -F '/' '{print $NF}'|awk -F '.' '{print $1}')
$ echo $file_out
xyz
$hadoop jar EventLogsSW.jar EventSuspiciousWatch /user/hdfs/eventlog/$file /user/hdfs/eventlog_output/$file_out
2. make shell script to reuse these commands daily... and in more logical way
This script can process more than one files in hdfs for present system date:
#!/bin/sh
#get today's snappy files
files=$(hadoop fs -ls /user/hdfs/eventlog/*.snappy|grep $(date +%Y-%m-%d)|awk '{print $NF}')
#Only process if today's file(s) available...
if [ $? -eq 0 ]
then
# file(s) found now create dir
hadoop fs -mkdir /user/hdfs/eventlog/$(date +%Y-%m-%d)
counter=0
#move each file to today's dir
for file in $files
do
hadoop fs -mv $file /user/hdfs/eventlog/$(date +%Y-%m-%d)/
counter=$(($counter + 1))
done
#run hadoop job
hadoop jar EventLogsSW.jar EventSuspiciousWatch /user/hdfs/eventlog/$(date +%Y-%m-%d) /user/hdfs/eventlog_output/$(date +%Y-%m-%d)
fi
echo "Total processed file(s): $counter"
echo "Done processing today's file(s)..."
This script can process more than one files - one file at time - in hdfs for present system date:
#!/bin/sh
#get today's snappy files
files=$(hadoop fs -ls /user/hdfs/eventlog/*.snappy|grep $(date +%Y-%m-%d)|awk '{print $NF}')
#Only process if today's file(s) available...
if [ $? -eq 0 ]
then
counter=0
for file in $files
do
echo "Processing file: $file ..."
#get output dir name
file_out=$(echo $file|awk -F '/' '{print $NF}'|awk -F '.' '{print $1}')
#run hadoop job
hadoop jar EventLogsSW.jar EventSuspiciousWatch /user/hdfs/eventlog/$file /user/hdfs/eventlog_output/$file_out
counter=$(($counter + 1))
done
fi
echo "Total processed file(s): $counter"
echo "Done processing today's file(s)..."