Delete a field by position from a long line

Question

I have a long, semicolon separated line of fields, 69 of them, to be precise.

I need to delete field 3, so I could, in a verbose manner, do:

awk -F\; '$1 == 3 { print $1";"$2";"$4 ... }' a.txt

Which would get really long. Is there a shortcut to say '$4 to the end', '$4 to $69' or maybe just 'delete $3'?

Related to the question: Repeating ";" all over the place is very unconvenient.

Of course, I could generate the command in part with:

echo -e "\b"{4..69}"\";\"$"

but while it looks clever, the result is a multiline command, which is not elegant to handle.

What is an elegant solution - preferably in pure awk.

I guess I can find a sed-solution fast, but I have more things to do (recalculate Field 5: if Field 1 == 2, Field5 = 5-Field5), which would be hard in sed, but I guess a good fit for awk.

I'm using Gnu-AWK 3.1.6, if it matters, but have, according to apropos:

awk
gawk
igawk
mawk
nawk
pgawk

ok, update:

I should have known better, and provided some test data right away, but of course, I will try out all your answers and upvote what looks promising.

3;03.2012;7228;0;1;3;1;3;4;3;1;3;4;3;2;0;4;4;1;1;4;2;1;1;1;1;1;1;1;1;1;1;1;1;0;0;0;1;1;3;0;3;1;3;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;
3;03.2012;7229;0;2;2;0;5;5;4;4;5;5;4;4;2;5;5;0;0;3;3;0;0;5;6;0;0;0;0;0;2;2;1;2;1;2;2;2;4;3;4;1;5;4;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;4;0;0;0;0;0;0;0;
3;03.2012;7230;0;2;2;2;4;3;4;4;4;3;3;3;2;4;6;1;1;1;6;5;1;6;6;1;1;1;1;1;2;2;1;2;2;0;2;2;3;4;2;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;3;3;4;4;0;0;0;0;0;0;0;
3;03.2012;7231;0;1;3;1;4;4;3;3;4;4;4;4;2;5;5;1;1;4;6;5;1;4;1;1;1;1;1;5;2;1;1;2;0;0;1;2;4;4;3;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;3;0;0;0;0;0;0;0;

just hold the line. :)

awk also has an [output field separator](http://www.gnu.org/software/gawk/manual/gawk.html#Output-Separators): `awk -F';' -v OFS=; '{print $1, $2, $4}'` — glenn jackman, Mar 27 '12 at 16:11

Colonel Panic · Answer 1 · 2012-03-27T18:09:01.703

5

I'm not sorry to interrupt this perverse game of golf. Do you masochists take pleasure in reinventing the wheel? Civilisation offers modern man such amenities as sewage collection and CSV libraries so he doesn't have to deal with—

How about as csvfix? It's a command-line tool that works with text streamed in and out, ie. the same environment as awk. The command you need is exclude

csvfix exclude -f 3 -rsep ";" a.txt

edited Mar 27 '12 at 18:09

answered Mar 27 '12 at 17:00

Colonel Panic

132,665
89
401
465

1

+1. Yeah really. These awk-ward solutions are only appropriate for Unix (TM) system installations where you can't install any third party code. (Including GNU Awk: talk about a program whose extensions to the awk language are completely pointless, since if you can install GNU anything, you can just put on your rubber boots and wade through the 15 feet of gawk that you think is in your way to get to something else.) – Kaz Mar 27 '12 at 18:14
I'm sorry, not on my system and not in the repositories. Of course, I could install it, but then I can use `cut` too, or `sed`. But it looks nice and simple. That's fine. – user unknown Mar 28 '12 at 01:38

score 3 · Answer 2 · answered Mar 27 '12 at 15:46

3

One way:

awk '{ 
  split( $0, f, /;/ );
  delete f[3];
  for (i=1; i<=length(f); i++) { 
    printf "%s", f[i] ? f[i] ";" : "" 
  } 
}' <<<"one;two;three;four;five;six;seven"

With following output:

one;two;four;five;six;seven;

answered Mar 27 '12 at 15:46

Birei

35,723
2
77
82

+1 generic solution. however, the last ";" should not be there. ;) – Kent Mar 27 '12 at 15:55
The output is merged together in one, big line, so I put a `printf "\n"` between the two `}}`. Now I see that all values of 0 are discarded. That is - I'm sorry - not acceptable. Else, it looks quiet elegant. (I meanwhile added test data to my question above). Instead of `delete f[3]`, we can delete that line, and do the test later: `printf "%s", i!=4 ? f[i] ";" : ""`? – user unknown Mar 28 '12 at 01:34

Lee Netherton · Answer 3 · 2012-03-27T16:36:05.163

2

You could use the cut command instead:

cut -d';' -f1,2,4- a.txt

The list of fields can be a range, and can include an open-ended range (like the 4- used here)

And if you still need to process the result in awk you could pipe the output from this into it.

edited Mar 27 '12 at 16:36

answered Mar 27 '12 at 16:30

Lee Netherton

21,347
12
68
102

Works flawlessly. Is available. Is short and elegant. Is not awk, but deserves an upvote. :) – user unknown Mar 28 '12 at 00:17

Dimitre Radoulov · Answer 4 · 2012-03-28T11:15:10.677

You could use something like this:

awk -v fl=<filed_list> 'BEGIN {
  n = split(fl, t, " ")
  for (i = 0; ++i <= n;)
    fa[t[i]]
  }
{
  for (i = 0; ++i <= NF;)
    if (!(i in fa))
      printf "%s", ($i (i < NF ? OFS : ORS))
  }'

Consider the following input:

zsh-4.3.14[t]% paste -sd\; < <(printf '%s\n' {1..10})
1;2;3;4;5;6;7;8;9;10

To remove the 3th field:

zsh-4.3.14[t]% paste -sd\; < <(printf '%s\n' {1..10}) |
pipe>   awk -F\; -v fl=3 'BEGIN {
pipe quote>     n = split(fl, t, " ")
pipe quote>     for (i = 0; ++i <= n;)
pipe quote>       fa[t[i]]
pipe quote>     }
pipe quote>   {
pipe quote>     for (i = 0; ++i <= NF;)
pipe quote>       if (!(i in fa))
pipe quote>     printf "%s", ($i (i < NF ? OFS : ORS))
pipe quote>   }' OFS=\;
1;2;4;5;6;7;8;9;10

To remove a set of fields:

zsh-4.3.14[t]% paste -sd\; < <(printf '%s\n' {1..10}) |
pipe>   awk -F\; -v fl='7 4 3' 'BEGIN {
pipe quote>     n = split(fl, t, " ")
pipe quote>     for (i = 0; ++i <= n;)
pipe quote>       fa[t[i]]
pipe quote>     }
pipe quote>   {
pipe quote>     for (i = 0; ++i <= NF;)
pipe quote>       if (!(i in fa))
pipe quote>     printf "%s", ($i (i < NF ? OFS : ORS))
pipe quote>   }' OFS=\;
1;2;5;6;8;9;10

Let me know how the output should look like if you remove the last filed (with or without the trailing FS).

Consider that with a single character field separator and for simple tasks cut could be sufficient:

zsh-4.3.14[t]% paste -sd\; < <(printf '%s\n' {1..10}) | cut -d\; -f 1-2,4-
1;2;4;5;6;7;8;9;10
zsh-4.3.14[t]% paste -sd\; < <(printf '%s\n' {1..10}) | cut -d\; -f 1-2,5-6,8-
1;2;5;6;8;9;10

[Edit: following the comments here]

Given the sample input:

3;03.2012;7228;0;1;3;1;3;4;3;1;3;4;3;2;0;4;4;1;1;4;2;1;1;1;1;1;1;1;1;1;1;1;1;0;0;0;1;1;3;0;3;1;3;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;
3;03.2012;7229;0;2;2;0;5;5;4;4;5;5;4;4;2;5;5;0;0;3;3;0;0;5;6;0;0;0;0;0;2;2;1;2;1;2;2;2;4;3;4;1;5;4;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;4;0;0;0;0;0;0;0;
3;03.2012;7230;0;2;2;2;4;3;4;4;4;3;3;3;2;4;6;1;1;1;6;5;1;6;6;1;1;1;1;1;2;2;1;2;2;0;2;2;3;4;2;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;3;3;4;4;0;0;0;0;0;0;0;
3;03.2012;7231;0;1;3;1;4;4;3;3;4;4;4;4;2;5;5;1;1;4;6;5;1;4;1;1;1;1;1;5;2;1;1;2;0;0;1;2;4;4;3;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;3;0;0;0;0;0;0;0;

and the following awk script:

zsh-4.3.14[t]% cat s.awk 
BEGIN {
  n = split(fl, t, " ")
  for (i = 0; ++i <= n;)
    fa[t[i]]
  }
{
  for (i = 0; ++i <= NF;)
    if (!(i in fa))
      printf "%s", ($i (i < NF ? OFS : ORS))
  }

With this command:

zsh-4.3.14[t]% awk -F\; -v fl=3 -f s.awk OFS=\; infile > outfile

... I get the following output:

zsh-4.3.14[t]% cat outfile
3;03.2012;0;1;3;1;3;4;3;1;3;4;3;2;0;4;4;1;1;4;2;1;1;1;1;1;1;1;1;1;1;1;1;0;0;0;1;1;3;0;3;1;3;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;
3;03.2012;0;2;2;0;5;5;4;4;5;5;4;4;2;5;5;0;0;3;3;0;0;5;6;0;0;0;0;0;2;2;1;2;1;2;2;2;4;3;4;1;5;4;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;4;0;0;0;0;0;0;0;
3;03.2012;0;2;2;2;4;3;4;4;4;3;3;3;2;4;6;1;1;1;6;5;1;6;6;1;1;1;1;1;2;2;1;2;2;0;2;2;3;4;2;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;3;3;4;4;0;0;0;0;0;0;0;
3;03.2012;0;1;3;1;4;4;3;3;4;4;4;4;2;5;5;1;1;4;6;5;1;4;1;1;1;1;1;5;2;1;1;2;0;0;1;2;4;4;3;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;3;0;0;0;0;0;0;0;

If I understand the requirement correctly, the output is correct.

To remove the fields from 1 to 5:

zsh-4.3.14[t]% awk -F\; -v fl='1 2 3 4 5' -f s.awk OFS=\; infile > outfile
zsh-4.3.14[t]% cat outfile
3;1;3;4;3;1;3;4;3;2;0;4;4;1;1;4;2;1;1;1;1;1;1;1;1;1;1;1;1;0;0;0;1;1;3;0;3;1;3;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;
2;0;5;5;4;4;5;5;4;4;2;5;5;0;0;3;3;0;0;5;6;0;0;0;0;0;2;2;1;2;1;2;2;2;4;3;4;1;5;4;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;4;0;0;0;0;0;0;0;
2;2;4;3;4;4;4;3;3;3;2;4;6;1;1;1;6;5;1;6;6;1;1;1;1;1;2;2;1;2;2;0;2;2;3;4;2;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;3;3;4;4;0;0;0;0;0;0;0;
3;1;4;4;3;3;4;4;4;4;2;5;5;1;1;4;6;5;1;4;1;1;1;1;1;5;2;1;1;2;0;0;1;2;4;4;3;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;3;0;0;0;0;0;0;0;

Am I missing something?

Hm. I saved the first code from apostroph to apostroph exclusive to a file del3b.awk. I called it with `awk -v fl=3 -f del3b.awk a.txt > h.txt` but `xxdiff a.txt h.txt` showed no difference. Since this part is solved, I'm not too much interested in correcting this not as elegant as expected code, but since I voted ltn100's cut solution up, I can't resist... — user unknown, Mar 28 '12 at 00:55
No, but now I did, and the only difference to the original is the trailing `;`, which gets removed. — user unknown, Mar 28 '12 at 11:05
Hi @user unknown, I've added an example with your sample file. — Dimitre Radoulov, Mar 28 '12 at 11:15
Now it works, and I can't reconstruct what went wrong. :) Maybe I stumbled over one of your renames. `` looked to me like a file redirection in the beginning, and I never talked about a list and the necessity to delete multiple at once. — user unknown, Mar 28 '12 at 12:10

score 1 · Answer 5 · answered Mar 28 '12 at 11:58

1

Pure Bash:

IFS=';'
while read -a line ; do
  unset line[2]
  echo "${line[*]}"
done < infile.dat

answered Mar 28 '12 at 11:58

Fritz G. Mehner

16,550
2
34
41

I would need to reset IFS later, wouldn't I? – user unknown Mar 29 '12 at 14:35

score 1 · Accepted Answer · answered Mar 29 '12 at 14:02

awk -F";" 'BEGIN{OFS=";"} {$3="";print }' file3|sed 's/;;/;/'

here is the test:

pearl.341> cat file3
3;03.2012;7228;0;1;3;1;3;4;3;1;3;4;3;2;0;4;4;1;1;4;2;1;1;1;1;1;1;1;1;1;1;1;1;0;0;0;1;1;3;0;3;1;3;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;
3;03.2012;7229;0;2;2;0;5;5;4;4;5;5;4;4;2;5;5;0;0;3;3;0;0;5;6;0;0;0;0;0;2;2;1;2;1;2;2;2;4;3;4;1;5;4;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;4;0;0;0;0;0;0;0;
3;03.2012;7230;0;2;2;2;4;3;4;4;4;3;3;3;2;4;6;1;1;1;6;5;1;6;6;1;1;1;1;1;2;2;1;2;2;0;2;2;3;4;2;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;3;3;4;4;0;0;0;0;0;0;0;
3;03.2012;7231;0;1;3;1;4;4;3;3;4;4;4;4;2;5;5;1;1;4;6;5;1;4;1;1;1;1;1;5;2;1;1;2;0;0;1;2;4;4;3;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;3;0;0;0;0;0;0;0;

output:

pearl.342> awk -F";" 'BEGIN{OFS=";"} {$3="";print }' file3 | sed 's/;;/;/'
3;03.2012;0;1;3;1;3;4;3;1;3;4;3;2;0;4;4;1;1;4;2;1;1;1;1;1;1;1;1;1;1;1;1;0;0;0;1;1;3;0;3;1;3;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;
3;03.2012;0;2;2;0;5;5;4;4;5;5;4;4;2;5;5;0;0;3;3;0;0;5;6;0;0;0;0;0;2;2;1;2;1;2;2;2;4;3;4;1;5;4;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;4;0;0;0;0;0;0;0;
3;03.2012;0;2;2;2;4;3;4;4;4;3;3;3;2;4;6;1;1;1;6;5;1;6;6;1;1;1;1;1;2;2;1;2;2;0;2;2;3;4;2;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;3;3;4;4;0;0;0;0;0;0;0;
3;03.2012;0;1;3;1;4;4;3;3;4;4;4;4;2;5;5;1;1;4;6;5;1;4;1;1;1;1;1;5;2;1;1;2;0;0;1;2;4;4;3;1;4;3;2;0;0;0;0;0;0;0;0;0;0;4;4;4;4;3;0;0;0;0;0;0;0;

You're late to the show, but the code works, is pretty elegant, easy to understand, not using confusing ad hoc , unwanted surplus, different language - just a little sed in the end, but all other posts had more minor or not so minor issues. :) — user unknown, Mar 29 '12 at 14:39
Sadly this will fail if you empty `$3` but `$2` is empty by default. — kvantour, Jul 01 '20 at 13:50

score 0 · Answer 7 · edited May 23 '17 at 11:56

0

I need to delete field 3...Is there a shortcut to say '$4 to the end'

Yes, and its basically asking the same as this question Print Field 'N' to End of Line

awk -F\; '{print $1 FS $2 FS substr($0, index($0, $4))}' temp.txt

This also handles the bonus question

FS is field seperator so the ouput from my file of 7 fields delimited by ';' would be as follows

awk -F\; '{print $1 FS $2 FS substr($0, index($0,$4))}' temp2

$> field1;field2;field4;field5;field6;field7

note: that printing field N to the end retains the field separator naturally -- at least as far as I understand

edited May 23 '17 at 11:56

Community

1
1

answered Mar 27 '12 at 15:33

matchew

19,195
5
44
48

I'm irritated. For real data and the test data in my question your first code yields `3;03.2012;03.2012;7228;0;1;3;1;3;4...` which means, field 2 is repeated in the output - not field 3 deleted. But for simple tests with `echo "1;2;3;4;..."` it works. I have no idea what's going on there. It looks so fine and easy! I meanwhile observed that it is field 4 (I normally start counting at 0) which needs to be deleted, but that doesn't make a difference. – user unknown Mar 28 '12 at 01:05
this is a curious problem. I'll look into it. – matchew Mar 28 '12 at 16:40

user unknown · Answer 8 · 2012-03-28T11:08:06.667

0

While testing I found (as announced) a sed-solution by myself:

sed -r 's/(([^;]*;){3}).;(.*)/\1\3/' a.txt > g.txt

Not easy to read, but easy to write, if you know sed. It looks as if I'm going with 2 solutions for my problem: delete with one program, and transform with another one.

It deletes Field 3 (if we happen to count from 0, not 1) :) .

edited Mar 28 '12 at 11:08

answered Mar 28 '12 at 00:34

user unknown

35,537
11
75
121

Delete a field by position from a long line

ok, update:

8 Answers8