How to get first n characters of each line in unix data file

Question

I am trying to get the first 22 characters from a unix data file.Here is my data looks as below.

First 12 characters is column 1 and next 10 characters is 2nd column.

000000000001199998000180000     DUMMY RAG #         MFR NOT ST            1999980    ZZ-            0        0              0ZZ-
000000000002199998000180000     DUMMY RAG #         MFR NOT ST            1999980    ZZ-            0        0              0ZZ-
000000000003199998000180000     DUMMY RAG #         MFR NOT ST            1999980    ZZ-            0        0              0ZZ-
000000000004199998000180000     DUMMY RAG #         MFR NOT ST            1999980    ZZ-            0        0              0ZZ-
000000000005199998000180000     DUMMY RAG #         MFR NOT ST            1999980    ZZ-            0        0              0ZZ-
000000000006199998000180000     DUMMY RAG #         MFR NOT ST            1999980    ZZ-            0        0              0ZZ-

Do you want the first 22 characters of the file, or the first 22 characters of each line? You should modify the question if you want data from each line. As asked, `dd` is the tool you want to get the first 22 characters from the file. — William Pursell, Jan 23 '13 at 21:41
I came across this by looking for what the OP originally described in the title, not what he really wanted to do (which is reflected a bit better by the title as I've edited it, @WilliamPursell explained the difference in the two intents). However, should anyone come across this question searching for the _take the first n characters of a file_, [here](https://stackoverflow.com/questions/8262758/copy-n-bytes-of-data-x-to-file)' the answer, which makes use of `dd` as William suggested. — Enlico, Jul 15 '20 at 13:09

Chris Seymour · Accepted Answer · 2013-01-22T16:06:02.290

123

With cut:

$ cut -c-22 file
0000000000011999980001
0000000000021999980001
0000000000031999980001
0000000000041999980001
0000000000051999980001
0000000000061999980001

If I understand the second requirement you want to split the first 22 characters into two columns of length 10 and 12. sed is the best choice for this:

$ sed -r 's/(.{10})(.{12}).*/\1 \2/' file
0000000000 011999980001
0000000000 021999980001
0000000000 031999980001
0000000000 041999980001
0000000000 051999980001
0000000000 061999980001

edited Jan 22 '13 at 16:06

answered Jan 22 '13 at 15:54

Chris Seymour

83,387
30
160
202

What does the -r do? I've tried this sed command but I get an error informing me that -r is an illegal option – interstellar Dec 30 '15 at 16:07
@interstellar with GNU sed it switches on extended regular expressions, the equivalent option for BSD sed is -E. – Chris Seymour Dec 30 '15 at 16:30
Based on the performance, this seems to read in the entire file before taking the first `n` characters. Is there a way to avoid reading in the entire file? – Joshua Pinter Mar 21 '18 at 17:44
1

@JoshuaPinter you can use `head` to read the first n lines only, for example, to limit to the first 20 lines: `head -n 20 file | cut -c-22` – Chris Seymour Mar 21 '18 at 17:49
1

@ChrisSeymour Thanks. I tried that as well but it turns out the file that we're crunching is on one single line (!) but is millions of characters long. – Joshua Pinter Mar 22 '18 at 00:33

score 3 · Answer 2 · answered Jan 22 '13 at 16:20

sudo_O has provided nice cut and sed solution, I just added an awk one-liner:

awk 'BEGIN{FIELDWIDTHS="22"} {print $1}' file

echo "000000000001199998000180000     DUMMY RAG"|awk 'BEGIN{FIELDWIDTHS="22"} {print $1}'
0000000000011999980001

with empty char (it depends on your requirement, you want to skip the spaces or you want to include and count them in your output)

if blank spaces should be counted and displayed in output as well: (you don't have to change the cmd above)

echo "0 0 0 0 00000001199998000180000"|awk 'BEGIN{FIELDWIDTHS="22"} {print $1}'                                                                         
0 0 0 0 00000001199998

if you want to skip those spaces: (quick and dirty)

echo "0 0 0 0 00000001199998000180000"|sed 's/ //g'|awk 'BEGIN{FIELDWIDTHS="22"} {print $1}'                                                            
0000000000011999980001

Jody Bruchon · Answer 3 · 2022-01-07T05:22:17.330

This can actually be done in Bash without using any external programs (scripts using this must start with #!/bin/bash instead of #!/bin/sh and will not be POSIX shell compliant) using the expression ${VARIABLE:offset:length} (where :length is optional):

#!/bin/bash

STR="123456789"

echo ${STR:0:1}
echo ${STR:0:5}
echo ${STR:0:10}
echo ${STR:5:10}
echo ${STR:8:10}

will have this output:

Note that the start offset begins at zero and the length must be at least one. You can also offset from the right side of the string using a negative offset in parentheses:

echo ${STR:(-5):4}

5678

To read a file, fetch the first 8 characters repeatedly for each line, and print them to the terminal, use a while loop like this:

while read LINE
    do echo "${STD:0:8}"
done < "/path/to/the/text_file"

An extremely useful resource for all you'll need to know about Bash string manipulation is here: https://tldp.org/LDP/abs/html/string-manipulation.html

How to get first n characters of each line in unix data file

3 Answers3