2

I have hundreds of txt files, which are all in a single directory. I would like to be able to do the following:

  1. Join all files in a single txt file. This command will insert a symbol when joining (such as §) together with the file name.
  2. [I then do some work on the combined file, which consists of making changes. Some of these changes involve using a priority software which works better with one big file than lots of little files].
  3. Use a second command to go through the joined file and split it back into separate files, using the file name that was next to the symbol to name each split file.

Example:

Before joining:

File 1: "Towns.txt"

Béthlem
Cabul
Corinthia
ruined lands
eshcol
Gabbatha
old town

File 2: "Fruits and Nuts.txt"

Apples
Pomegranates
Sycamore

After Joining, but before I make changes

(Single file)

§Towns.txt
Béthlem
Cabul
Corinthia
ruined lands
eshcol
Gabbatha
old town
$Fruits and Nuts.txt
Apples
Pomegranates
Sycamore

After Joining and I make changes

(These changes are made manually in the single file)

§Towns.txt
Bethlehem
Cabul
Corinth
Ruined lands
Eshcol
Gabbatha
The Old Town
$Fruits and Nuts.txt
Apples
Pomegranates
Sycamore

After Splitting:

File 1: "Towns.txt"

Bethlehem
Cabul
Corinth
Ruined lands
Eshcol
Gabbatha
The Old Town

File 2: "Fruits and Nuts.txt"

Apples
Pomegranates
Sycamore


Steps I have tried

Combining files

I reworked the answer in this thread, to make an awk command that can join the files together with the file name prefixed with the § symbol.

awk '(FNR==1){print "§" FILENAME }1' * > ^0join.txt;

This seems to work well.

Splitting files

This thread provides a solution for splitting files. I have reworked to my needs to produce this:

awk -v RS='§' '{ outfile = "output_file_" NR; print > outfile}' ^0join.txt   

The only problem is that the output files have the name "outfile1", "outfile2" etc. They also keep the file name at the top of each file, which I do not want. Also, sometimes when I use this command, it will just put everything in a single file called "outfile" and not split them up.

I also found this thread which had another solution, that I reworked:

awk '{print $0 "file" NR}' RS='§'  ^0join.txt

However, this didn’t seem to do anything.

Notes

The § can be any other symbol. I am using Mac OS 10.14.6, so I would like something that would work in the terminal of Mac OS.

big_smile
  • 1,487
  • 4
  • 26
  • 59

1 Answers1

2

Could you please try following.

For joining command:

awk 'FNR==1{print "§" FILENAME}; 1' Towns.txt  "Fruits and Nuts.txt" > Output_file

For splitting files:

awk '/^§/{close(file);sub(/^§/,"");file=$0;next} {print > (file)}' Output_file

NOTE: As per OP's comments, in case .txt files needs to be passed to command then we could put /complete/path/to/txt_files/*.txt/ after awk code 1st one and one could remove individual file names from there(not tested it but should work)

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • It says "awk: can't open file Fruits input record number 25, file Fruits source line number 1" The output file only contains towns. – big_smile Sep 19 '19 at 15:02
  • 1
    @big_smile, since your file has space in it, so I wrapped them in `"` try my edited codes I checked them and both are working fine, let me know? – RavinderSingh13 Sep 19 '19 at 15:02
  • The join command works. however, I have 100s of files, so it would be better if it could run on the directory, without needing the file names to be specified. The Split command works, but it leaves the file name in at the top of each document. (E.g. So the newly split towns.txt document begins with towns.txt at the top). – big_smile Sep 19 '19 at 15:08
  • @big_smile, check my edit for join code, it shouldn't write file names now, let me know. – RavinderSingh13 Sep 19 '19 at 15:10
  • @big_smile, to run it on a directory try my code(s) and at last in spite of file names pass like `awk '.........' /path/to/files/*.txt` this will run for all text files etc,similarly you can edit this command as per your need too. – RavinderSingh13 Sep 19 '19 at 15:42