1

I want to extract the first column between this two lines (%BLOCK positions_frac & %ENDBLOCK positions_frac) in "file1".

%BLOCK positions_frac
Si        0.5303000000000000  0.0000000000000000  0.3333000000000000
Si        0.0000000000000000  0.5303000000000000  0.6666299999999999
Si        0.4697000000000000  0.4697000000000000  0.9999700000000000
O         0.1462000000000000  0.4142000000000000  0.8810000000000000
O         0.7320000000000000  0.5858000000000000  0.7856700000000000
O         0.5858000000000000  0.7320000000000000  0.2143300000000000
O         0.2680000000000000  0.8538000000000000  0.5476700000000000
O         0.4142000000000000  0.1462000000000000  0.1190000000000000
O         0.8538000000000000  0.2680000000000000  0.4523300000000000
%ENDBLOCK positions_frac

I can get that using:

awk '/%BLOCK\ positions_frac/{flag=1;next}/%ENDBLOCK\ positions_frac/{flag=0}flag' file1

Then I want to store the first column in an array but of the non-equivalent ones

expected output:

array= ["Si", "O"]
Caterina
  • 775
  • 9
  • 26
  • 1
    So 1. filter the first column. 2. Sort with unique `sort -u` and 3. store into an array. – KamilCuk Jul 15 '19 at 13:08
  • 1
    See: [How do I assign the output of a command into an array?](https://stackoverflow.com/questions/9449417). Combine that with `sort -u` and your are off for a good start. – kvantour Jul 15 '19 at 13:18
  • ok so I guess it is something like this: awk '/%BLOCK\ positions_frac/{flag=1;next}/%ENDBLOCK\ positions_frac/{flag=0}flag {print $1}' file1 | sort -u, but need some help sotring it in an array – Caterina Jul 15 '19 at 13:19
  • it's not a duplicate, they're using grep. I'm still not sure how to store what I found with awk in an array – Caterina Jul 15 '19 at 13:21
  • @Caterina it is a duplicate. Your problem is "How do I assign the output of a command into an array". The command is known: `awk '...' | sort -u`. The example of the duplicate is using `grep whatever` as command. – kvantour Jul 15 '19 at 13:27
  • yes but I didn't know about the sort -u command, that problem does not include it. Without asking it here I wouldn't have been able to figure it out. – Caterina Jul 15 '19 at 13:29
  • Having asked a question which is considered a duplicate is not something to be ashamed of. The question you asked is actually a double question: question 1: how do I sort an array. Question 2, how do I put the output of a command in an array. There are thousands of ways this can be answered. And there are a lot of similar questions around. We have answered your first question in a comment, and the second by pointing you to the source where you could find a possible solution. Your question is still good and should stay for other users of this forum to find help. – kvantour Jul 15 '19 at 13:41

2 Answers2

3

This is how to write the awk part (squeeze it all back onto 1 line if you like):

$ awk '
    /%ENDBLOCK positions_frac/ { inBlock=0 }
    inBlock && !seen[$1]++     { print $1 }
    /%BLOCK positions_frac/    { inBlock=1 }
' file
Si
O

then it's just this to save the output in a shell array:

arr=( $(awk '...' ) )
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • can you explain me this part: inBlock && !seen[$1]++, I am still new to bash, so I still don't get some things – Caterina Jul 15 '19 at 13:36
  • That has absolutely nothing to do with bash or any other shell, it's part of an awk script. Whenevr you see an array named `seen[]` it is (or should be!) being used idiomatically to identify unique values. Initially `seen[foo]` for any value of `foo` is zero-or-null so `seen[foo]++` is also zero-or-null but that post-increment means that next time you test `seen[foo]` it has the value 1. In that way you can tell if a value is being seen for the first time or not. So that line of my code just says "if you're in the target block and it's the first time this $1 has been seen then print it". – Ed Morton Jul 15 '19 at 13:52
  • In case it helps: `awk '!seen[$0]++'` is equivalent to `uniq` for sorted input but will also print unique values even if the input is unsorted. Run these commands to see the behavior of each and the similarities/differences between them: 1) `printf 'a\na\nb\n' | awk '!seen[$0]++'` 2) `printf 'a\na\nb\n' | uniq` 3) `printf 'a\nb\na\n' | awk '!seen[$0]++'` 4) `printf 'a\nb\na\n' | uniq`. Note the output of "4" vs the first 3. – Ed Morton Jul 15 '19 at 13:59
1

So this solved it:

arr=($( awk '/%BLOCK\ positions_frac/{flag=1;next}/%ENDBLOCK\ positions_frac/{flag=0}flag {print $1}' file1 |sort -u))

Thanks for the suggestions. I realized I just had to use pipelines.

Caterina
  • 775
  • 9
  • 26
  • Why are you doing `sed 's/:.*//'` when the output doesn't contain `:`s? You don't need `sort` btw, awk can print unique values just fine, There's also no reason to escape a blank char, it's not special in any way, – Ed Morton Jul 15 '19 at 13:29