1

I want to remove all the white spaces from a given text files (about 200,000 of them).

I know how to remove whitespace for one of them:

cat file.txt | tr -d " \t\n\r" 

How I can do that for all folder?

Maroun
  • 94,125
  • 30
  • 188
  • 241
neringab
  • 613
  • 1
  • 7
  • 16

3 Answers3

3

One way is by iterating on the files in the current directory:

for file in *; do
   cat $file | tr -d " \t\n\r" 
done

Or, maybe better:

for file in *; do
   sed -i 's/ //g' $file
done 
Maroun
  • 94,125
  • 30
  • 188
  • 241
  • you need to redirect the output after the `tr`/`sed` commands or add `-i.bak` for `sed` – Allan Jan 21 '18 at 08:24
  • @Maroun: Globbing (`*`) becomes problematic with thousands of files. – Cyrus Jan 21 '18 at 08:30
  • @Cyrus Can you please explain why? Or, what else do you recommend (I'll edit my answer). – Maroun Jan 21 '18 at 08:32
  • 2
    @Maroun: The length of the command line is limited to the value (in bytes) displayed with `getconf ARG_MAX`. – Cyrus Jan 21 '18 at 08:38
  • What's the point of having `sed` create a backup file if you immediately remove it? Also please fix the errors reported by http://shellcheck.net/ – tripleee Jan 21 '18 at 08:42
  • The poster has stated that the pattern `" \t\n\r"` is correct. – Ed Heal Jan 21 '18 at 08:45
  • @tripleee I'm on MacOS, `sed` works a bit different here. OP can get rid of the `-i` if he's using Ubuntu. – Maroun Jan 21 '18 at 08:53
  • You can use `-i ""` on MacOS and related platforms (*BSD etc). – tripleee Jan 21 '18 at 09:02
  • @Cyrus The `ARG_MAX` limit applies to the `exec` system call. The shell can, and typically does, handle wildcard expansion in a way where this limit does not apply. (Running an external command on the result of the expansion will still fail, of course; but that is not the scenario here.) – tripleee Jan 21 '18 at 09:04
  • With second option new lines are still not deleted. :/ – neringab Jan 21 '18 at 11:42
2

I would use find for this:

find . -type f -exec sed -i ':a;N;s/[[:space:]]*//g;ba' {}  \;

This assumes the files are of reasonable sizes :-/

sjsam
  • 21,411
  • 5
  • 55
  • 102
1

Just use a for loop

i.e.

for i in *
do
   cat $i | tr -d " \t\n\r" > $i.tmp
   rm -f $i
   mv $i.tmp $i
done;
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
  • 1
    Better avoiding `cat` for this. See [here](http://www.catb.org/jargon/html/U/UUOC.html). – Maroun Jan 21 '18 at 08:35
  • @Maroun - I agree. Was going to update my answer - but you beat me to it – Ed Heal Jan 21 '18 at 08:37
  • The link to the *Jargon file* has pretty bad formatting errors. Perhaps instead link to https://stackoverflow.com/questions/11710552/useless-use-of-cat – tripleee Jan 21 '18 at 08:45