removing last character of every word in files

Question

I have multiple files with just one line of simple text. I want to remove last character of every word in each file. Every file has different length of text.

The closest I got is to edit one file:

awk '{ print substr($1, 1, length($1)-1); print substr($2, 1, length($2)-1); }' file.txt

But I can not figure out, how to make this general, for files with different words count.

to be sure, *1 line of simple text* and *every word* mean there is 1 line per file with 0 to several word inside and each word have to be modified (seeing lot of reply removing only the last char of the line) — NeronLeVelu, Dec 15 '16 at 13:18

score 3 · Answer 1 · answered Dec 15 '16 at 12:07

3

awk '{for(x=1;x<=NF;x++)sub(/.$/,"",$x)}7' file

this should do the removal.

If it was tested ok, and you want to overwrite your file, you can do:

awk '{for(x=1;x<=NF;x++)sub(/.$/,"",$x)}7' file > tmp && mv tmp file

Example:

kent$  awk '{for(x=1;x<=NF;x++)sub(/.$/,"",$x)}7' <<<"foo bar foobar"   
fo ba fooba

answered Dec 15 '16 at 12:07

Kent

189,393
32
233
301

funny to use `7` for printing a line, a special meaning for 7 instead of frequent 1 for this purpose ? – NeronLeVelu Dec 15 '16 at 14:22
yes, the special meaning is lucky number! :-D just kidding.. 7 is easier for me to reach. I think right index finger is more convenient than left little finger. You know, a heavy vim user cares about keystrokes – Kent Dec 15 '16 at 14:25

Inian · Accepted Answer · 2016-12-15T12:22:59.693

Use awk to loop till max fields in each row upto NF, and apply the substr function.

awk '{for (i=1; i<=NF; i++) {printf "%s ", substr($i, 1, length($i)-1)}}END{printf "\n"}' file

For a sample input file

ABCD ABC BC

The awk logic produces an output

ABC AB B

Another way by changing the record-separator to NULL and just using print:-

awk 'BEGIN{ORS="";}{for (i=1; i<=NF; i++) {print substr($i, 1, length($i)-1); print " "}}END{print "\n"}' file

score 2 · Answer 3 · edited May 23 '17 at 12:17

2

I would go for a Bash approach:

Since ${var%?} removes the last character of a variable:

$ var="hello"
$ echo "${var%?}"
hell

And you can use the same approach on arrays:

$ arr=("hello" "how" "are" "you")
$ printf "%s\n" "${arr[@]%?}"
hell
ho
ar
yo

What about going through the files, read their only line (you said the files just consist in one line) into an array and use the abovementioned tool to remove the last character of each word:

for file in dir/*; do
   read -r -a myline < "$file"
   printf "%s " "${myline[@]%?}"
done

edited May 23 '17 at 12:17

Community

1
1

answered Dec 15 '16 at 12:12

fedorqui

275,237
103
548
598

My only concern was the size constraint of the files when using pure `bash` logic like this. Is it really slow when processing huge files compared to `awk`? – Inian Dec 15 '16 at 12:16
2

@Inian we should test it. However, parsing a number of one-line files does not seem to be a very CPU intensive task, so bothering about performance is more of an academic debate. – fedorqui Dec 15 '16 at 12:17
1

That being said, it is also recommendable to keep this answer in mind: [Why is using a shell loop to process text considered bad practice?](http://unix.stackexchange.com/a/169765/40596). – fedorqui Dec 15 '16 at 12:27
Thanks! Asked the question in first place because of that particular topic in mind :) – Inian Dec 15 '16 at 12:29

NeronLeVelu · Answer 4 · 2016-12-15T15:10:59.000

0

Sed version, assuming word are only composed of letter (if not, just adapt the class [[:alpha:]] to reflect your need) and separate by space and puctuation

sed 's/$/ /;s/[[:alpha:]]\([[:blank:][:punct:]]\)/\1/g;s/ $//' YourFile

awk (gawk for regex boundaries in fact)

 gawk '{gsub(/.\>/, "");print}' YourFile

 #or optimized by @kent ;-) thks for the tips
 gawk '4+gsub(/.\>/, "")' YourFile

edited Dec 15 '16 at 15:10

answered Dec 15 '16 at 13:11

NeronLeVelu

9,908
1
23
43

1

if golf a bit on the gawk line, you can do `gawk '7+gsub(...)' file` – Kent Dec 15 '16 at 14:27
did'nt know the `7+gsub` as "pattern matcher", nice tip – NeronLeVelu Dec 15 '16 at 15:11

James Brown · Answer 5 · 2016-12-16T09:16:29.047

0

$ cat foo
word1
word2 word3
$ sed 's/\([^ ]*\)[^ ]\( \|$\)/\1\2/g' foo
word
word word

A word is any string of characters excluding space (=[^ ]).

EDIT: If you want to enforce POSIX (--posix), you can use:

$ sed --posix 's/\([^ ]*\)[^ ]\([ ]\{,1\}\)/\1\2/g' foo
word
word word

This $ \|$$ changes to $[ ]\{,1\}$, ie there is an optional space in the end.

edited Dec 16 '16 at 09:16

answered Dec 15 '16 at 13:48

James Brown

36,089
7
43
59

with GNU sed, posix failed with the `\|` (reason of the add/remove of space in my sed) – NeronLeVelu Dec 15 '16 at 14:10
Didn't notice posix being required. – James Brown Dec 15 '16 at 15:43
1

not a critic, just telling a limitation even if more and more it is used on linux where this limitation is not the case. Here, half of my systems are still aix (or sun) without gawk ou gnu sed and i have to composed with. – NeronLeVelu Dec 16 '16 at 06:40
@NeronLeVelu Fixed (hopefully :). – James Brown Dec 16 '16 at 09:17

removing last character of every word in files

5 Answers5