I have a (previously sorted) text file that, consisting of either a dash -
or a single alphabetical character. I'd greatly appreciate any help in better understanding the proper awk syntax to move through each column of the text file and retain only the first non-dash character in each row if a non-dash character exists, or else to retain that dash character if no alphabetical character exists. The result in either situation would be a single row of text. Files are always formatted in such a way that every row has the same number of columns, and the first non-dash character is always preferred, regardless if other alphabetical characters exist in 'lower' rows.
Two examples to clarify: given this text file:
# printf 't---k-\ncha---\n--nn--\n--ab-s\n'
t---k-
cha---
--nn--
--ab-s
the program would start in the first column, and because the first character is not a dash, it would retain a t
. we'd then proceed to the next column, wherein the first row of information is a dash, thus advance to the second row, where an h
is selected. you'd then advance to column three, and have to move to the third row to select the n
character, etc. The expected string to report is:
thanks
.
In the second example, we have a very similar arrangement of text, with one exception:
#printf 't-----\ncha---\n--nn--\n--ab-s\n'
t-----
cha---
--nn--
--ab-s
Notice there is no alphabetical character present in the fourth column in this second example. Because no such character exists, we would return a dash in that position. Thus the expected output would be:
than-s
This post highlights a pandas approach somewhat similar to what I'm trying to achieve, and this post similarly offers a solution via numpy, but I believe they both require functions applicable for integers, whereas I have a data set consisting of alphabetical characters. This post similarly explains a method to apply a function in column-wise fashion using awk, which is closer to what I'm after, as does this other awk post. It seems to me that the awk method I'm after will similarly require me to declare a column-wise approach, which I think is stated in the beginning of the function as:
awk '{for (i=1;i<=NF;i++){
... where I'm stuck is trying to identify the next argument of the function, where I think I'm after some type of if/else statement. That's the part where I'm hoping to get further clarification.
Perhaps the solution need not be done via awk - I'm certainly open to other strategies that rely on any language, so if Python or Perl or some other strategy is clearly the more appropriate language, thank you for the education.
Thanks for your consideration