1

Here my file.dat

1   A 1 4
2     2 4
3   4 4
3     7 B
1   U 2

Running awk '{print $2}' file.dat gives:

A
2
4
7
U

But I would like to keep the empty field:

A

4

U

How to do it?

I must add that between :

  • column 1 and 2 there is 3 whitespaces field separator

  • column 2 and 3 and between column 3 and 4 one whitespace field separator

So in column 2 there are 2 fields missing (lines 2 and 4) and in column 4 there are also 2 fields missing (lines 3 and 5)

olivier dadoun
  • 622
  • 6
  • 22
  • It might help with GNU awk and mawk to set field separator to exact one space: `awk -F ' ?' '{print $2}' file` – Cyrus Feb 10 '19 at 16:57
  • In my file the exact field separator between two column is not constant (could be one two or sevaral whitespace) – olivier dadoun Feb 10 '19 at 17:02
  • @olivierdadoun In that case, how do you define the `second column`? – Til Feb 10 '19 at 17:57
  • With `Procedural Text Edit` you can use `forEach line { select (firstN char 2) {remove} select (afterN char 1) {remove} }` – I3ck Feb 10 '19 at 18:14
  • if the field separator is not constant how do you know in second row second field is blank? it may we well that there are multiple white spaces between first and second fields. – karakfa Feb 10 '19 at 19:00
  • @tiw this a very good point :) I have checked, indeed between two columns the space are constant. – olivier dadoun Feb 10 '19 at 19:11
  • @karakfa you have right, I will modify my exemple – olivier dadoun Feb 10 '19 at 19:12
  • Are the numbers in column 1 always single-digit numbers? Are the values in the other columns always a single character? – Jonathan Leffler Feb 10 '19 at 19:34
  • I suggest: Find the row with the most columns. In this row determine all column spacing and save it in an array. With this information you can find out in all rows where a column is empty. – Cyrus Feb 10 '19 at 19:36
  • In GNU Awk, the manual has a section on [Reading fixed-width data](https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size). It's hard to tell whether that will be helpful to you. – Jonathan Leffler Feb 10 '19 at 19:46
  • From your comments, we get the definition "3 space between column 1 and 2; 1 space between any other column". This means that In your current example, rows 1 and 5 have 4 columns, rows 2 and 4 have 5 columns, and row 3 has 3 columns. I don't think that's what you actually mean to happen, so your example and definition are inconsistent – jhnc Feb 10 '19 at 20:28
  • similar to this: https://stackoverflow.com/a/36011760/1435869 – karakfa Feb 10 '19 at 22:01
  • 1
    @Cyrus `awk -F ' ?' '{print $2}' file` means the same in any awk and it doesn't mean "set field separator to exact one space" it means set FS to zero or 1 blank chars but YMMV with what any awk actually tries to do given that setting. To get 1 blank as the FS in any awk is `awk -F'[ ]' '...'`. – Ed Morton Feb 10 '19 at 23:11
  • Your change makes it even fuzzier... Well, do the lines have same fields number, i.e. same columns? – Til Feb 11 '19 at 02:15

4 Answers4

1

If this isn't all you need:

$ awk -F'[ ]' '{print $4}' file
A

4

U

then edit your question to provide a more truly representative example and clearer requirements.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

If the input is fixed-width columns, you can use substr to extract the slice you want. I have assumed that you want a single character at index 5:

awk '{ print(substr($0,5,1)) }' file
tripleee
  • 175,061
  • 34
  • 275
  • 318
0

Your awk code is missing field separators.

Your example file doesn't clearly show what that field separator is.

From observation your file appears to have 5 columns.

You need to determine what your field separator is first.

This example code expects \t which means <TAB> as the field separator.

awk -F'\t' '{print $3}' OFS='\t' file.dat

This outputs the 3rd column from the file. This is the 'read in' field separator -F'\t' and OFS='\t' is the 'read out'.

A

4

U
tripleee
  • 175,061
  • 34
  • 275
  • 318
0

For GNU awk. It processes the file twice. On the first time it examines all records for which string indexes have only space and considers continuous space sequences as separator strings building up FIELDWIDTHS variable. On the second time it uses that for fixed width processing of the data.

a[i]:s get valus 0/1 and h (header) with this input will be 100010101 and that leads to FIELDWIDTHS="4 2 2 1":

1   A 1 4
2     2 4
3   4 4
3     7 B
1   U 2
|   | | |
100010101 - while(match(h,/10*/))
 \ /|/|/|     
  4 2 2 1

Script:

$ awk '
NR==FNR {
    for(i=1;i<=length;i++)                              # all record chars
        a[i]=((a[i]!~/^(0|)$/) || substr($0,i,1)!=" ")  # keep track of all space places
    if(--i>m)
        m=i                                             # max record length...
    next
}
BEGINFILE {
    if(NR!=0) {                                         # only do this once
        for(i=1;i<=m;i++)                               #  ... used here
            h=h a[i]                                    # h=100010101
        while(match(h,/10*/)) {                         # build FIELDWIDTHS
            FIELDWIDTHS=FIELDWIDTHS " " RLENGTH         # qnd
            h=substr(h,RSTART+RLENGTH)                       
        }
    }
}
{ 
    print $2                                            # and output 
}' file file

And output:

A

4 

U 

You need to trim off the space from the fields, though.

James Brown
  • 36,089
  • 7
  • 43
  • 59