1

I have to write an awk command that will find all lines in a specified file that are more than 30 characters long and split these lines into multiple lines with no more than 30 characters each.

I know I can find the length using

awk 'length>30' test.txt

But how to post-process the file and split each line?

For e.g. If my file is like this:-

qwertyuiopadfgghjkklkllllllvvvxxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

After running the command, it should be like this (with no line containing more than 30 characters):-

qwertyuiopadfgghjkklkllllllvv
vxxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
kvantour
  • 25,269
  • 4
  • 47
  • 72
Lokesh jain
  • 63
  • 2
  • 7

6 Answers6

2

Simple, there's an utility for that purpose, fold:

fold -w 30 text.txt

Wrap input lines in each FILE (standard input by default), writing to standard output.

If you need to stick to gawkthis solution, as a curiosity, is quite fixed but easy, you get the idea of how FIELDWITHS works:

gawk 'BEGIN { FIELDWIDTHS = "30 30 30 30 30 30"}{for (i=1;i<=NF;i++){if ($i!=""){print $i}}}' text.txt
Juan Diego Godoy Robles
  • 14,447
  • 2
  • 38
  • 52
0

You could do the following for this:

awk '(length>30) { for(i=1;i<=length;i+=30) print substr($0,i,30)}' file

If you still want the other lines also, you can do:

awk '(length>30) { for(i=1,i<=length;i+=30) print substr($0,i,30); next} 1' file

Here we just print the substrings which we are interested in. Those substrings are always length 30 and start at indices 1,31,61,91,...

If you don't like the concept of recomputing the length all the time, you can do:

awk '{L=length} (L>30){ for(i=1;i<=L;i+=30) print substr($0,i,30)}' file
awk '{L=length} (L>30){ for(i=1;i<=L;i+=30) print substr($0,i,30); next}1' file

length[([s])]: Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument.

substr(s, m[, n ]): Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s.

kvantour
  • 25,269
  • 4
  • 47
  • 72
  • @Lokeshjain just at the end of the awk as you did in your question. – kvantour Dec 12 '18 at 09:56
  • The above commands only prints the output in console but I want processed lines to be changed in my file. Here file content remains the same. – Lokesh jain Dec 12 '18 at 09:59
  • @Lokeshjain This is answered here : https://stackoverflow.com/questions/16529716/awk-save-modifications-in-place – kvantour Dec 12 '18 at 10:00
0

Could you please try following.

awk '
{
  val=""
  count=1
  while($0){
    val=(val?val ORS:"")substr($0,1,30)
    count+=30
    $0=substr($0,count)
  }
  print val
}'  Input_file

Output will be as follows.

qwertyuiopadfgghjkklkllllllvvv
xxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

How about with gsub:

$ awk '{gsub(/.{30}/,"&" ORS)}1' file

Output for that sample:

qwertyuiopadfgghjkklkllllllvvv
xxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa

Naturally, if your RS is something else than \n, you need to deal with that, for example with RS="\r?\n".

James Brown
  • 36,089
  • 7
  • 43
  • 59
0

Perl solution:

> cat lokesh.txt
qwertyuiopadfgghjkklkllllllvvvxxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> perl -pe ' s/(.{30})/\1\n/g; ' lokesh.txt
qwertyuiopadfgghjkklkllllllvvv
xxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
>
stack0114106
  • 8,534
  • 3
  • 13
  • 38
-1

I think this code can work, but unfortunately I can not test it:

awk -F, 'length($0) > 30' /path/to/input > good_field_length.txt
Plazma
  • 117
  • 6