I'm trying to filter out all duplicates of a list, ignoring the first n columns, preferable using awk (but open for other implementations)
I've found a solution for a fixed number of columns, but as I don't know how many columns there will be, I need a range. That solution I've found here
For clarity:
What I'm trying to achieve is an alias for history
which will filter out duplicates, but leaves the history_id intact, preferably without messing with the order.
The history is in this form
ID DATE HOUR command
5612 2019-07-25 11:58:30 ls /var/log/schaubroeck/audit/2019/May/
5613 2019-07-25 12:00:22 ls /var/log/schaubroeck/
5614 2019-07-25 12:11:30 ls /etc/logrotate.d/
5615 2019-07-25 12:11:35 cat /etc/logrotate.d/samba
5616 2019-07-25 12:11:49 cat /etc/logrotate.d/named
So this command works for commands up to four arguments long, but I need to replace the fixed columns by a range to account for all cases:
history | awk -F "[ ]" '!keep[$4 $5 $6 $7]++'
I feel @kvantour is getting me on the right path, so I tried:
history | awk '{t=$0;$1=$2=$3=$4="";k=$0;$0=t}_[k]++' | grep cd
But this still yields duplicate lines
1102 2017-10-27 09:05:07 cd /tmp/
1109 2017-10-27 09:07:03 cd /tmp/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1124 2017-11-07 16:38:50 cd /etc/init.d/
1127 2017-12-29 11:13:26 cd /tmp/
1144 2018-06-21 13:04:26 cd /etc/init.d/
1161 2018-06-28 09:53:21 cd /etc/init.d/
1169 2018-07-09 16:33:52 cd /var/log/
1179 2018-07-10 15:54:32 cd /etc/init.d/