Get substring based on position and delimiter in unix

Question

I have files in some path. Say when i do ls -lrt , i get

20160401_RM_ARN_MAPPING-M_RTL_NORTH_DELH_101.csv
20160401_RM_ARN_MAPPING-M_RTL_NORTH_DELH_102.csv
20160401_RM_ARN_MAPPING-M_BND_NORTH_DELH_102.csv
20160405_RM_ARN_MAPPING-M_RTL_NORTH_DELH_101.csv
20160405_RM_ARN_MAPPING-M_RTL_NORTH_DELH_102.csv
20160401_MAP_RTL_BANK-M_RTL_NORTH_DELH_101.csv
20150401_RM_ARN_MAPPING-M_RTL_NORTH_DELH_101.csv

I want the distinct file names after the date and before the "-" delimiter.

I tried

ls -lrt | awk '{print $9}' | sed '1d' | awk -F'-' '{print $1}'

It gives

20160401_RM_ARN_MAPPING
20160401_RM_ARN_MAPPING
20160401_RM_ARN_MAPPING
20160405_RM_ARN_MAPPING
20160405_RM_ARN_MAPPING
20160401_MAP_RTL_BANK
20150401_RM_ARN_MAPPING

But I want only

RM_ARN_MAPPING 
MAP_RTL_BANK

as output, i.e distinct names after removing the date. Here the first 8 characters are fixed and it will be YYYYMMDD format.

are you using bash? are you in linux? these are things that may be relevant — fedorqui, Jun 07 '16 at 12:51

score 3 · Answer 1 · edited May 23 '17 at 11:47

3

Do not parse ls. Instead, loop through the elements in your directory and keep track of the new names with an array. To get the clean data, use shell parameter expansion:

for file in your_dir/*; do
   no_date=${file#*_}              # remove up to the first _
   no_dash=${no_date%%-*}          # remove from the first -
   [[ " ${var[@]} " =~ " ${no_dash} " ]] || var+=($no_dash)
done

Then, check the elements with:

$ printf "%s\n" "${var[@]}"
RM_ARN_MAPPING
MAP_RTL_BANK

edited May 23 '17 at 11:47

Community

1
1

answered Jun 07 '16 at 12:43

fedorqui

275,237
103
548
598

Note that the question specifies `ls -lrt` -- which is (reverse) sort based on modification time. – Michael Back Jun 08 '16 at 21:34
@MichaelBack I don't think the sorting is important here – fedorqui Jun 09 '16 at 06:27

score 1 · Answer 2 · answered Jun 07 '16 at 12:37

1

Add cut -d '_' -f 2-

That is

ls -lrt | awk '{print $9}' | sed '1d' | awk -F'-' '{print $1}' | cut -d '_' -f 2-

THe 2- means second part and everything to the end..

answered Jun 07 '16 at 12:37

nayana

3,787
3
20
51

@PavaniSrujana check the other answers and please choose one as accepted.. the other ones are better, mine is just dumbly adding the cut, but its better to not use ls and parse that.. – nayana Jun 08 '16 at 09:48
@otoplosky .I went through all options and using 3 of them in fact at many different palces of my code.thanks a lot – Pavani Srujana Jun 09 '16 at 06:59
@PavaniSrujana even so you use more than one, you must look at it as you asked only about one way.. so choose the best answer for your question - in this way you help other people seeking advice on the question you posted. Otherwise the question will forever remain unanswered which would be sad.. – nayana Jun 09 '16 at 07:33

Michael Back · Answer 3 · 2016-06-14T03:32:44.180

This answer avoids parsing ls output -- protecting against file names containing odd characters -- emulating -lrt safely using stat with null character record delimiter output and complementary sort. The \0 can also be used as an awk delimiter, and we can use that tool for the remainder of text manipulation. The removal of leading numbers and underscore is handled with the regex /^[^_]+_/. The removal of repeated "names" is accomplished via an associative array lookup.

stat --printf '%Y %n\0' *_*-*.csv |
    sort -nz |
    awk -v RS='\0' '{
            sub(/^[^_]+_/, "")
            sub(/-.*$/, "")
            if ($0 in y)
                next
            y[$0]=1
            print
    }'

sumitya · Accepted Answer · 2016-06-08T06:16:24.283

0

can be done this way:-

ls -ltr|sed 1d|awk '{print $9}'|cut -d"-" -f1  |cut -d_ -f2-|sort|uniq

explanation

ls -ltr --> list file

sed 1d --> ignore first

awk '{print $9}' --> print 9th column

others I guess self explanatory

edited Jun 08 '16 at 06:16

answered Jun 08 '16 at 06:05

sumitya

2,631
1
19
32

Get substring based on position and delimiter in unix

4 Answers4