9

I have four files:

one_file.txt

abc | def

two_file.txt

ghi | jkl

three_file.txt

mno | pqr

four_WORD.txt

xyz| xyz

I want to concatenate all of the files ending with "file.txt" (i.e. all except four_WORD.txt) in order to get:

abc | def
ghi | jkl
mno | pqr

To accomplish this, I run:

cat *file.txt > full_set.txt

However, full_set.txt comes out as:

abc | defmno | pqrghi | jkl

Any ideas how to do this correctly and efficiently so that each ends up on its own line? In reality, I need to do the above for a lot of very large files. Thank you in advance for your help.

TotPeRo
  • 6,561
  • 4
  • 47
  • 60
user3890260
  • 101
  • 1
  • 1
  • 4
  • Add end-of-line characters to the end of each file. They are supposed to be there. – n. m. could be an AI Jul 30 '14 at 06:32
  • @n.m. Depending on the files, that might not be at all feasible. There are many scenarios where you would like to be able to concatenate files with newlines between them without adding them to the input files. – tripleee Jul 30 '14 at 06:38
  • @tripleee A text file is a sequence of lines. This is specified by POSIX. A line ends with the newline character. (An alternative point of view, that the newline character separates, rather than terminates, lines, is theoretically possible but results in a mess). – n. m. could be an AI Jul 30 '14 at 06:59
  • 2
    Regardless of POSIX, there will be situations where what you have is not a POSIX text file which would nevertheless be useful to manipulate using standard tools. – tripleee Jul 30 '14 at 07:04
  • @tripleee If it's not a text file, then calling it `*txt` is probably not a good idea. Anyway, if you have non-standard files, you will have issues when using them with standard tools. Handling the issues one by one is one way of dealing with the problem, switching to a standard format is another, neither is universally good. – n. m. could be an AI Jul 30 '14 at 07:16

6 Answers6

14

Try:

awk 1 *file.txt > full_set.txt

This is less efficient than a bare cat but will add an extra \n if missing at the end of each file

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
3

Many tools will add newlines if they are missing. Try e.g.

sed '' *file.txt >full_set.txt

but this depends on your sed version. Others to try include Awk, grep -ho '.*' file*.txt and etc.

tripleee
  • 175,061
  • 34
  • 275
  • 318
0

this works for me:

for file in $(ls *file.txt) ; do cat $file ; echo ; done > full_set.txt

I hope this will help you.

linibou
  • 688
  • 1
  • 6
  • 10
  • This adds a space between the output of each file (i.e. between the lines). To avoid this, remove the `echo ;` – Simply_me Jul 30 '14 at 07:50
  • if newline is missing in each file `echo` add it, which is the point here. – linibou Jul 30 '14 at 07:55
  • see author's desired output and check my solution. – Simply_me Jul 30 '14 at 08:06
  • yes, it doesn't work: `juju@juju-laptop:~/tmp$ for f in *_file.txt; do (cat "${f}") >> full_set.txt; done juju@juju-laptop:~/tmp$ cat full_set.txt abc | defmno | pqrjhi | jkljuju@juju-laptop:~/tmp$ ` well, on my computer it doesn't work, there is no ending newline on each files ! – linibou Jul 30 '14 at 08:11
  • works on in my bash, i guess we have different versions. Your solution adds an empty line between the 'cat' output in my bash which isnt required per author. – Simply_me Jul 30 '14 at 08:14
  • it's weird, what does output your `cat *file.txt` the same as the author ? – linibou Jul 30 '14 at 08:17
  • yes, my output file is the same as in my answer and the same as in the question. I'm using CEntOS 6.3. Weird. – Simply_me Jul 30 '14 at 08:19
  • This is a [useless use of `ls`](http://www.iki.fi/era/unix/award.html#ls) – tripleee Sep 24 '19 at 16:12
  • @tripleee you would be right if that was `ls *` but it' `ls *file.txt`. if you try `for file in *file.txt ...` you won't have expected result (i'm using bash 5.0.9 on archlinux) – linibou Sep 27 '19 at 12:57
  • Why do you think that matters? How do you think the result from `for file in *file.txt` is incorrect?The shell expands the wildcard before `ls` runs. Read the link I provided. – tripleee Sep 27 '19 at 13:43
  • Also, the command substitution will wreck any spaces or literal wildcard characters in the output (so then it won't matter that [your quoting is broken](/questions/10067266/when-to-wrap-quotes-around-a-shell-variable), too). – tripleee Sep 27 '19 at 13:53
0

You can loop over each file and do a check to see if the last line ends in a new line, outputting one if it doesn't.

for file in *file.txt; do
    cat "$file"
    [[ $(tail -c 1 "$file") == "" ]] || echo
done > full_set.txt
John B
  • 3,566
  • 1
  • 16
  • 20
  • This won't work because `[\x0a]` is not a valid way to represent a newline character, and the command substitution will trim any final newline from the output anyway. I suppose it could be worked around by using `tail | xxd` and comparing the hex output to `*0a` but this is getting pretty tortured already. – tripleee Sep 27 '19 at 13:50
  • 1
    @tripleee Indeed, thanks. Fixed (minimally) . Other answers are still probably better. – John B Sep 27 '19 at 22:01
-1

You can use one line for loop for this. The following line:

for f in *_file.txt; do (cat "${f}") >> full_set.txt; done

Yields the desired output:

$ cat full_set.txt 
abc | def
mno | pqr
ghi | jkl

Also, possible duplicate.

Community
  • 1
  • 1
Simply_me
  • 2,840
  • 4
  • 19
  • 27
  • -1 Running `cat` on one file at a time in a subprocess does nothing to add a newline if one is missing, and is highly wasteful. You are testing with input files which do not lack a final newline. – tripleee Sep 01 '14 at 08:31
  • @tripleee It works on the input example and desired output by the OP. It merges all of the files into one properly formatted file. – Simply_me Sep 04 '14 at 17:58
  • The possible duplicate you are linking to doesn't create a superfluous shell (`{ cat; echo; }` vs your `(cat)`) and uses `echo` to add a newline, even when one isn't missing. – tripleee Sep 05 '14 at 05:31
  • @tripleee it is somewhat expected that the OP will be able to extrapolate from a similar question, hence possible duplicate. – Simply_me Sep 05 '14 at 17:36
  • @tripleee worked on my CENTOS. Note that even your `cat` operation does not yield the same result as mine. – Simply_me Sep 05 '14 at 17:39
-3
find . -name "*file.txt" | xargs cat > full_set.txt
shlomi33
  • 1,458
  • 8
  • 9
  • -1 this does nothing to resolve the OP's problem. It would be an excellent answer to an entirely different question. – tripleee Jul 30 '14 at 07:03
  • 2
    Some more explaination would be helpful to make clear what your method does. – Jens Jul 30 '14 at 07:04
  • @tripleee Did you try it? It does exactly what was needed. I run it on my PC and the output is as expected. If you still think I am mistaking please do point what is exactly wrong with my answer. – shlomi33 Sep 01 '14 at 05:27
  • @shlomi33 Then your input files do not match the OP's, or you have an incompatible `cat` which adds the missing newlines out of thin air. In which case the OP's much simpler `cat *file.txt` would have worked fine as well – tripleee Sep 01 '14 at 05:52
  • Plus if there are subdirectories with matching files it will pull in those as well, not just the ones in the current directory. – tripleee Sep 01 '14 at 05:53