69

I'm trying to remove leading and trailing space in 2nd column of the below input.txt:

Name, Order  
Trim, working
cat,cat1

I have used the below awk to remove leading and trailing space in 2nd column but it is not working. What am I missing?

awk -F, '{$2=$2};1' input.txt

This gives the output as:

Name, Order  
Trim, working
cat,cat1

Leading and trailing spaces are not removed.

codeforester
  • 39,467
  • 16
  • 112
  • 140
Marjer
  • 1,313
  • 6
  • 20
  • 31

8 Answers8

99

If you want to trim all spaces, only in lines that have a comma, and use awk, then the following will work for you:

awk -F, '/,/{gsub(/ /, "", $0); print} ' input.txt

If you only want to remove spaces in the second column, change the expression to

awk -F, '/,/{gsub(/ /, "", $2); print$1","$2} ' input.txt

Note that gsub substitutes the character in // with the second expression, in the variable that is the third parameter - and does so in-place - in other words, when it's done, the $0 (or $2) has been modified.

Full explanation:

-F,            use comma as field separator 
               (so the thing before the first comma is $1, etc)
/,/            operate only on lines with a comma 
               (this means empty lines are skipped)
gsub(a,b,c)    match the regular expression a, replace it with b, 
               and do all this with the contents of c
print$1","$2   print the contents of field 1, a comma, then field 2
input.txt      use input.txt as the source of lines to process

EDIT I want to point out that @BMW's solution is better, as it actually trims only leading and trailing spaces with two successive gsub commands. Whilst giving credit I will give an explanation of how it works.

gsub(/^[ \t]+/,"",$2);    - starting at the beginning (^) replace all (+ = zero or more, greedy)
                             consecutive tabs and spaces with an empty string
gsub(/[ \t]+$/,"",$2)}    - do the same, but now for all space up to the end of string ($)
1                         - ="true". Shorthand for "use default action", which is print $0
                          - that is, print the entire (modified) line
Floris
  • 45,857
  • 6
  • 70
  • 122
  • 1
    Can you please explain this. – Marjer Dec 15 '13 at 23:28
  • 3
    Look at @EdMorton's answer for a single `gsub` solution. It also use the character class for space which is a better thing to do. – codeforester Dec 07 '17 at 21:46
  • Ed Morton's answer is [here](https://stackoverflow.com/a/33380957/6862601). – codeforester Feb 04 '19 at 20:16
  • used the regex.. gsub(/^[ \t]+/,"",$2) – H S Rathore May 07 '19 at 08:58
  • 2
    This isn't the right answer, because it removes all spaces, not just leading and trailing. – Peter Mar 26 '20 at 07:19
  • With "Test, Trim only leading spaces" this answer gives "Test,Trimonlyleadingspaces" – Geoff Jul 26 '20 at 03:21
  • Fails, try " Multiple spaces , and here ". – Geoff Jul 26 '20 at 05:07
  • @Geoff did you read the description at the start of the answer - “if you want to remove ALL the spaces...”. – Floris Jul 26 '20 at 12:00
  • @floris, yes I did see that, but the Question is about leading & trailing spaces in the second column. I actually believe @BMW’s Answer should be the accepted Answer. I did also note that you very helpfully explained @BMW’s Answer, and I built on your input to edit it, so learning loads in the process. As someone getting into this I found so many ‘answers’ just a bit unhelpful. I doubt many people would want to strip out all of the spaces between words. – Geoff Jul 26 '20 at 12:09
  • @Geoff i acknowledged that the other answer was better - I can’t help that this one was accepted. It’s been a long time... – Floris Jul 26 '20 at 12:12
  • @floris, I know, hey ho! – Geoff Jul 26 '20 at 12:37
39

remove leading and trailing white space in 2nd column

awk 'BEGIN{FS=OFS=","}{gsub(/^[ \t]+/,"",$2);gsub(/[ \t]+$/,"",$2)}1' input.txt

another way by one gsub:

awk 'BEGIN{FS=OFS=","} {gsub(/^[ \t]+|[ \t]+$/, "", $2)}1' infile
BMW
  • 42,880
  • 12
  • 99
  • 116
  • I like how you are removing only leading and trailing spaces by using the `^` and `$` anchors in two separate `gsub` commands. More complex, but on the money. – Floris Dec 16 '13 at 14:05
  • 1
    I just think it directly, and thanks for detail explanation for Ganz Ricanz – BMW Dec 16 '13 at 23:23
  • 2
    Feel free to improve your answer by writing your explanation of what you did (or copy mine). I think your answer should be accepted - but it is always better when it is "complete". – Floris Dec 16 '13 at 23:44
  • This is the only answer that works properly, so it should be the accepted answer. – Geoff Jul 26 '20 at 05:10
  • Is there a way to extend this code to trim leading/trailing spaces from all fields (for a file with lots of fields), other than a separate call to `gsub` for every field? Thanks. – Josh Aug 13 '20 at 13:53
  • 1
    yes, and the command will be simpler. `awk '$1=$1' infile > outfile` – BMW Aug 13 '20 at 21:36
  • @BMW Wow, this black magic actually - but only when you don't specify a custom FS. Any solution for that case? – Jann Poppinga Sep 08 '20 at 10:48
  • Nit-pick - this only removes blanks and tabs, not all white space (which would include newline, carriage return, formfeed and vertical tab). Also, `awk '$1=$1'` [in the comments](https://stackoverflow.com/questions/20600982/trim-leading-and-trailing-spaces-from-a-string-in-awk/33380957#comment112113785_20601998) will fail if/when `$1` contains `0` so don't do that - if you want to trim white space from all fields in a CSV it'd be `awk -F'[[:space:]]*,[[:space:]]*' -v OFS=',' '{$1=$1; gsub(/^[[:space:]]+|[[:space:]]+$/,"")} 1'`. – Ed Morton Jan 15 '22 at 12:56
22

Warning by @Geoff: see my note below, only one of the suggestions in this answer works (though on both columns).

I would use sed:

sed 's/, /,/' input.txt

This will remove on leading space after the , . Output:

Name,Order
Trim,working
cat,cat1

More general might be the following, it will remove possibly multiple spaces and/or tabs after the ,:

sed 's/,[ \t]\?/,/g' input.txt

It will also work with more than two columns because of the global modifier /g


@Floris asked in discussion for a solution that removes trailing and and ending whitespaces in each colum (even the first and last) while not removing white spaces in the middle of a column:

sed 's/[ \t]\?,[ \t]\?/,/g; s/^[ \t]\+//g; s/[ \t]\+$//g' input.txt

*EDIT by @Geoff, I've appended the input file name to this one, and now it only removes all leading & trailing spaces (though from both columns). The other suggestions within this answer don't work. But try: " Multiple spaces , and 2 spaces before here " *


IMO sed is the optimal tool for this job. However, here comes a solution with awk because you've asked for that:

awk -F', ' '{printf "%s,%s\n", $1, $2}' input.txt

Another simple solution that comes in mind to remove all whitespaces is tr -d:

cat input.txt | tr -d ' '
Geoff
  • 151
  • 8
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • I don't think your current `awk` solution removes _trailing_ spaces…? – Floris Dec 16 '13 at 00:08
  • While the main point in my answer is to show that `sed` is better suited for this job than awk - because it's a text editing task, the `awk` script should work. I've tested it. What is wrong with that? – hek2mgl Dec 16 '13 at 00:10
  • Your `sed` command doesn't appear to remove trailing spaces either. Just put `> output.txt` at the end, open the result in a text editor and you will see... – Floris Dec 16 '13 at 00:12
  • There are no *trailing* spaces before the first column. The task is more: Remove the spaces after the `,` – hek2mgl Dec 16 '13 at 00:13
  • trailing spaces at the end of the second column. Quoting from the question "_I'm trying to remove leading and trailing space in 2nd column_ " – Floris Dec 16 '13 at 00:14
  • You just read the headline. It's no problem to modify the sed pattern to do this. But it's just no necessary having given input – hek2mgl Dec 16 '13 at 00:15
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/43225/discussion-between-floris-and-hek2mgl) – Floris Dec 16 '13 at 00:16
  • None of these actually work, the closest is "sed 's/[ \t]\?,[ \t]\?/,/g; s/^[ \t]\+//g; s/[ \t]\+$//g' input.txt" (just needed to specify the input file). – Geoff Jul 26 '20 at 03:34
  • @Geoff Make sure you are using GNU sed – hek2mgl Jul 26 '20 at 07:34
  • @hek2mgl, I have "sed (GNU sed) 4.2.2" (2012). and "sed (GNU sed) 4.4" (2017). Try " Multiple spaces , 2 spaces before here ". – Geoff Jul 26 '20 at 08:52
  • I recommend that you open a question with your particular problem. If you want, ping me. (but normally others will have answered within minutes :) ) – hek2mgl Jul 26 '20 at 09:33
22

I just came across this. The correct answer is:

awk 'BEGIN{FS=OFS=","} {gsub(/^[[:space:]]+|[[:space:]]+$/,"",$2)} 1'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
4

just use a regex as a separator:

', *' - for leading spaces

' *,' - for trailing spaces

for both leading and trailing:

awk -F' *,? *' '{print $1","$2}' input.txt
Ilya Kharlamov
  • 3,698
  • 1
  • 31
  • 33
  • Fails to remove the trailing spaces from colum 2. Instead removes the trailing spaces from column 1. – Geoff Jul 26 '20 at 04:59
2

Simplest solution is probably to use tr

$ cat -A input
^I    Name, ^IOrder  $
  Trim, working  $
cat,cat1^I  

$ tr -d '[:blank:]' < input | cat -A
Name,Order$
Trim,working$
cat,cat1
Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
2

The following seems to work:

awk -F',[[:blank:]]*' '{$2=$2}1' OFS="," input.txt
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
-1

If it is safe to assume only one set of spaces in column two (which is the original example):

awk '{print $1$2}' /tmp/input.txt

Adding another field, e.g. awk '{print $1$2$3}' /tmp/input.txt will catch two sets of spaces (up to three words in column two), and won't break if there are fewer.

If you have an indeterminate (large) number of space delimited words, I'd use one of the previous suggestions, otherwise this solution is the easiest you'll find using awk.

Andrew
  • 906
  • 7
  • 9
  • Fails badly, and I won't even try to explain, but try " Multiple spaces , and here ". – Geoff Jul 26 '20 at 05:04
  • Again, this particular solution assumes only ONE set of spaces between TWO columns as SPECIFIED by the actual problem. If you want to pretend a complex gsub regex replacement is 'easier' than a simple awk print statement, I won't even try to explain why your're wrong. – Andrew Aug 02 '20 at 17:09