Command that finds all lines in a file that are more than 30 characters long and split them

Question

I have to write an awk command that will find all lines in a specified file that are more than 30 characters long and split these lines into multiple lines with no more than 30 characters each.

I know I can find the length using

awk 'length>30' test.txt

But how to post-process the file and split each line?

For e.g. If my file is like this:-

qwertyuiopadfgghjkklkllllllvvvxxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

After running the command, it should be like this (with no line containing more than 30 characters):-

qwertyuiopadfgghjkklkllllllvv
vxxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa

Could you please post samples for it, on which basis we should split the lines. Please post samples in CODE TAGS and let us know then. — RavinderSingh13, Dec 12 '18 at 09:39

Juan Diego Godoy Robles · Accepted Answer · 2018-12-12T10:51:50.300

2

Simple, there's an utility for that purpose, fold:

fold -w 30 text.txt

Wrap input lines in each FILE (standard input by default), writing to standard output.

If you need to stick to gawkthis solution, as a curiosity, is quite fixed but easy, you get the idea of how FIELDWITHS works:

gawk 'BEGIN { FIELDWIDTHS = "30 30 30 30 30 30"}{for (i=1;i<=NF;i++){if ($i!=""){print $i}}}' text.txt

edited Dec 12 '18 at 10:51

answered Dec 12 '18 at 09:55

Juan Diego Godoy Robles

14,447
2
38
52

2

And this is why we love SO. Yet again a new command I never heard of. – kvantour Dec 12 '18 at 09:57
@klashxx Thanks but I am aware of fold. I am specifically looking to use awk. – Lokesh jain Dec 12 '18 at 09:57
I see, just posted a `gawk` simple but max line length in source file should be *controlled*. – Juan Diego Godoy Robles Dec 12 '18 at 10:15
Your awk solution is limited as it might be possible that you have a line with more than 180 characters. – kvantour Dec 12 '18 at 10:22
1

yes @kvantour is just an example of `FIELDWIDTHS` use, as a curiosity. – Juan Diego Godoy Robles Dec 12 '18 at 10:50

kvantour · Answer 2 · 2018-12-12T09:57:46.183

You could do the following for this:

awk '(length>30) { for(i=1;i<=length;i+=30) print substr($0,i,30)}' file

If you still want the other lines also, you can do:

awk '(length>30) { for(i=1,i<=length;i+=30) print substr($0,i,30); next} 1' file

Here we just print the substrings which we are interested in. Those substrings are always length 30 and start at indices 1,31,61,91,...

If you don't like the concept of recomputing the length all the time, you can do:

awk '{L=length} (L>30){ for(i=1;i<=L;i+=30) print substr($0,i,30)}' file
awk '{L=length} (L>30){ for(i=1;i<=L;i+=30) print substr($0,i,30); next}1' file

length[([s])]: Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument.

substr(s, m[, n ]): Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s.

@Lokeshjain just at the end of the awk as you did in your question. — kvantour, Dec 12 '18 at 09:56
The above commands only prints the output in console but I want processed lines to be changed in my file. Here file content remains the same. — Lokesh jain, Dec 12 '18 at 09:59
@Lokeshjain This is answered here : https://stackoverflow.com/questions/16529716/awk-save-modifications-in-place — kvantour, Dec 12 '18 at 10:00

score 0 · Answer 3 · answered Dec 12 '18 at 10:10

Could you please try following.

awk '
{
  val=""
  count=1
  while($0){
    val=(val?val ORS:"")substr($0,1,30)
    count+=30
    $0=substr($0,count)
  }
  print val
}'  Input_file

Output will be as follows.

qwertyuiopadfgghjkklkllllllvvv
xxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa

score 0 · Answer 4 · answered Dec 12 '18 at 10:46

How about with gsub:

$ awk '{gsub(/.{30}/,"&" ORS)}1' file

Output for that sample:

qwertyuiopadfgghjkklkllllllvvv
xxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa

Naturally, if your RS is something else than \n, you need to deal with that, for example with RS="\r?\n".

score 0 · Answer 5 · answered Dec 12 '18 at 11:20

Perl solution:

> cat lokesh.txt
qwertyuiopadfgghjkklkllllllvvvxxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> perl -pe ' s/(.{30})/\1\n/g; ' lokesh.txt
qwertyuiopadfgghjkklkllllllvvv
xxxx
jjjjfff
aaahhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhh
ggggggggggggg
dddddddddddddd
gggggggggggggggggggg
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
>

score -1 · Answer 6 · answered Dec 12 '18 at 09:57

-1

I think this code can work, but unfortunately I can not test it:

awk -F, 'length($0) > 30' /path/to/input > good_field_length.txt

answered Dec 12 '18 at 09:57

Plazma

117
6

I don't get what you want to do here. Why do you want to split the entries on a ? – kvantour Dec 12 '18 at 09:58
Also in `awk` the command `length($0)` is equivalent to `length` – kvantour Dec 12 '18 at 09:59

Command that finds all lines in a file that are more than 30 characters long and split them

6 Answers6