Using Awk to remove whitespace

Question

I have a file in the form of:

Firstname LastName; 123-4567; Job Title
    Firstname LastName;   123-4567;      Job Title
Firstname LastName;      123-4567; Job Title
...

I am trying to use awk to parse the file into a form readable by makedbm (to make a custom NIS map). Field separator is a semicolon. I need to be able to remove all leading whitespace from each field on each line, but leave the spaces in the name field and the title field. Thanks.

Do you mean to leave space in between `Firstname Lastname` and `Job Title` or leading spaces on these field? — jaypal singh, Jan 07 '12 at 00:23
@Johnsyweb, I love that link. From now on, I still won't do any investigation before asking a question but I will ensure I place the phrase "I checked NSURLconnection. I cannot intergrate that code." into all of my question. Do you think I'll get a better response that way? :-) — paxdiablo, Jan 07 '12 at 00:45

jaypal singh · Answer 1 · 2012-01-07T00:48:50.803

If you wish to remove leading space from all fields and keep the space in between the Names and Job title fields then you can do something like this -

awk -F";" -v OFS=";" '{for (i=1;i<=NF;i++) gsub (/^ */,"",$i);print}' INPUT_FILE

Test:

[jaypal:~/Temp] cat file
Firstname LastName; 123-4567; Job Title
    Firstname LastName;   123-4567;      Job Title
Firstname LastName;      123-4567; Job Title

[jaypal:~/Temp] awk -F";" -v OFS=";" '{for (i=1;i<=NF;i++) gsub (/^ */,"",$i);print}' file
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title

johnsyweb · Answer 2 · 2012-01-07T00:44:46.390

This can be done far more easily with sed:

sed 's/^ *//; s/; */;/g'

This assumes that all of your whitespace is just space characters. To include all whitespace characters, look at POSIX character classes, viz:

sed 's/^[[:space:]]*//; s/;[[:space:]]*/;/g'

Demo (on OSX):

% echo 'Firstname LastName; 123-4567; Job Title
    Firstname LastName;   123-4567;      Job Title
Firstname LastName;      123-4567; Job Title' | sed 's/^[[:space:]]*//; s/;[[:space:]]*/;/g'
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title

If your version of sed doesn't support separating statements with semicolons, you can issue separate commands using -e:

% echo 'Firstname LastName; 123-4567; Job Title
    Firstname LastName;   123-4567;      Job Title
Firstname LastName;      123-4567; Job Title' | sed -e 's/^[[:space:]]*//' -e 's/;[[:space:]]*/;/g'
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title

paxdiablo · Answer 3 · 2012-01-07T00:42:25.860

1

Simply execute a gsub on your field number, such as in:

gsub (/^ */, "", $1);

This will substitute all leading spaces with nothing, while leaving all other spaces intact. The gsub function does a global substitution of a given pattern with a new value on a specified variable.

In this case, the pattern is ^ *, meaning the start of string followed by zero or more spaces. The replacement pattern is an empty string, and the variable being operated on is the first field in the row, $1.

The following transcript shows this in action, for all columns in the row, controlled by the i variable. NF is the number of fields in the current row and $i refers to the field at position i.

$ cat file | awk -F\; -vOFS=\; '{
    for (i = 1; i <= NF; i++) {
        gsub (/^ */, "", $i);
    };
    print}'
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title

edited Jan 07 '12 at 00:42

answered Jan 07 '12 at 00:22

paxdiablo

854,327
234
1,573
1,953

The OP said explicity that he wants to "remove all leading whitespace from *each* field on each line". – lhf Jan 07 '12 at 00:30
@lhf, my misunderstanding, I thought columns 1 and 3 were to be left totally untouched. On re-reading, you're right, it's only the _non-leading_ spaces that should be left alone - modified to suit. – paxdiablo Jan 07 '12 at 00:36
But now it looks a lot more like Jaypal's answer so I'll probablt delete it in a while. – paxdiablo Jan 07 '12 at 00:37
@paxdiable LOL … and I looked at your answer and thought may be I misunderstood his question and posted a second option to remove leading space only from second column. :) – jaypal singh Jan 07 '12 at 00:40

Kent · Answer 4 · 2012-01-07T00:54:38.277

many ways could achieve your goal.

just add one more for fun:

awk -v OFS=";" -F'; *' '{gsub(/^ */,"")}$1=$1' file

even shorter:

awk -v OFS=";" -F'; *' 'gsub(/^ */,"", $1)' file

test

kent$  echo "Firstname LastName; 123-4567; Job Title
    Firstname LastName;   123-4567;      Job Title
Firstname LastName;      123-4567; Job Title
"|awk -v OFS=";" -F'; *' '{gsub(/^ */,"")}$1=$1'
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title


kent$  echo "Firstname LastName; 123-4567; Job Title
    Firstname LastName;   123-4567;      Job Title
Firstname LastName;      123-4567; Job Title
"|awk -v OFS=";" -F'; *' 'gsub(/^ */,"",$1)'
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title
Firstname LastName;123-4567;Job Title

score 0 · Answer 5 · answered Jan 07 '12 at 00:21

0

Try this

{
    gsub(";  *",";")
    gsub("^  *","")
    print
}

answered Jan 07 '12 at 00:21

lhf

70,581
9
108
149

Using Awk to remove whitespace

5 Answers5

Demo (on OSX):

Linked