2

I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.

To begin, I have written the following code to separate the relevant data into easy chunks:

#!/bin/bash

destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
    echo $file1
    file2="${file1}.end"
    file3="${file1}.snip"
    sed -e '16,$d' $file1 > $file2
    sed -e '1,15d' $file1 > $file3
done

This worked perfectly, so the next step is the worlds simplest cat command:

cat $file3 $file2 > outfile

However, what I need to do is to stitch file2 to the previous file3. Look at this screenshot of the directory for better understanding.

See how these files are all sequential over time:

*_20090412T235945_20090413T235944_*    ### April 13
*_20090413T235945_20090414T235944_*    ### April 14

So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.

This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.

Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.

Vlad
  • 135
  • 2
  • 13

3 Answers3

4

"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."

If I understand what you want correctly, it can be done with one line of code:

awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3

When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.

Example

To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:

$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15

Here is the result of running our command:

$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15

As you can see, the first two lines of each file have been transferred to the preceding file.

How it works

awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.

  • NR==1 || FNR==16{f=FILENAME ".new"}

    When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.

    For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.

  • print>f

    This prints the current line to file f.

    (If this was a shell script, we would use >>. This is not a shell script. This is awk.)

Using a glob to specify the file names

destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}'  "$destination"*.ascii
John1024
  • 109,961
  • 14
  • 137
  • 171
  • Thank you for what looks like a very simple fix. Just one question, if I wanted to call all of my files into this line how would I go about doing it as opposed to how you did with `file1 file2 file3` (since there are ~1500 files)? Would I use the same `for file1 in ls` command that I used for my initial script? Or is there a better way? – Vlad Aug 24 '16 at 06:42
  • 1
    @Vlad I just added to the end of the answer code to replace the `for file in...`. – John1024 Aug 24 '16 at 06:47
  • It appeared to be working but malfunctioned about 2/3 of the way through the directory with the awk error `too many open files`. Do you have any suggestions? – Vlad Aug 24 '16 at 06:56
  • 1
    @Vlad Sorry about that! Yes, I should have explicitly closed the files. Answer updated. – John1024 Aug 24 '16 at 06:59
1

You could store the previous $file3 value in a variable (and do a check if it is not the first run with -z check):

#!/bin/bash

destination='media/user/directory/'
prev=""
for file1 in $destination*.ascii
do
    echo $file1
    file2="${file1}.end"
    file3="${file1}.snip"
    sed -e '16,$d' $file1 > $file2
    sed -e '1,15d' $file1 > $file3
    if [ -z "$prev" ]; then
       cat $prev $file2 > outfile
    fi
    prev=$file3
done
Krzysztof Krasoń
  • 26,515
  • 16
  • 89
  • 115
  • 1
    Hesitantly abstaining from downvoting, but you should really fix the [useless use of `ls`](http://www.iki.fi/era/unix/award.html#ls) and the broken quoting from the OP's attempt. – tripleee Aug 24 '16 at 06:40
  • Thank you for the time to answer me. I tried this code and defined my outfile as `file4=${file1}.done"` as I had done elsewhere in the code. It ran without error, however only gave me one iteration of a `.done` file at the beginning and didn't continue through the rest of the directory. – Vlad Aug 24 '16 at 08:30
  • @tripleee I tried to make minimal changes to make it work so other changes don't cover what was the real point. What do you mean by *broken quoting*? – Krzysztof Krasoń Aug 24 '16 at 08:39
  • 1
    http://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-variable – tripleee Aug 24 '16 at 08:52
1

Your task is not that difficult at all. You want to gather a list of all _end files in the directory (using a for loop and globbing, NOT looping on the results of ls). Once you have all the end files, you simply parse the dates using parameter expansion w/substing removal say into d1 and d2 for date1 and date2 in:

stuff_20090413T235945_20090414T235944_end
     |    d1  |      |    d2  |

then you simply subtract 1 from d1 into say date0 or d0 and then construct a previous filename out of d0 and d1 using _snip instead of _end. Then just test for the existence of the previous _snip filename, and if it exists, paste your info from the current _end file to the previous _snip file. e.g.

#!/bin/bash

for i in *end; do         ## find all _end files
    d1="${i#*stuff_}"     ## isolate first date in filename
    d1="${d1%%T*}"
    d2="${i%T*}"          ## isolate second date
    d2="${d2##*_}"
    d0=$((d1 - 1))        ## subtract 1 from first, get snip d1
    prev="${i/$d1/$d0}"   ## create previous 'snip' filename
    prev="${prev/$d2/$d1}"
    prev="${prev%end}snip"
    if [ -f "$prev" ]     ## test that prev snip file exists
    then
        printf "paste to : %s\n" "$prev"
        printf "    from : %s\n\n" "$i"
    fi
done

Test Input Files

$ ls -1
stuff_20090413T235945_20090414T235944_end
stuff_20090413T235945_20090414T235944_snip
stuff_20090414T235945_20090415T235944_end
stuff_20090414T235945_20090415T235944_snip
stuff_20090415T235945_20090416T235944_end
stuff_20090415T235945_20090416T235944_snip
stuff_20090416T235945_20090417T235944_end
stuff_20090416T235945_20090417T235944_snip
stuff_20090417T235945_20090418T235944_end
stuff_20090417T235945_20090418T235944_snip
stuff_20090418T235945_20090419T235944_end
stuff_20090418T235945_20090419T235944_snip

Example Use/Output

$ bash endsnip.sh
paste to : stuff_20090413T235945_20090414T235944_snip
    from : stuff_20090414T235945_20090415T235944_end

paste to : stuff_20090414T235945_20090415T235944_snip
    from : stuff_20090415T235945_20090416T235944_end

paste to : stuff_20090415T235945_20090416T235944_snip
    from : stuff_20090416T235945_20090417T235944_end

paste to : stuff_20090416T235945_20090417T235944_snip
    from : stuff_20090417T235945_20090418T235944_end

paste to : stuff_20090417T235945_20090418T235944_snip
    from : stuff_20090418T235945_20090419T235944_end

(of course replace stuff_ with your actual prefix)

Let me know if you have questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thank you for your time, I will certainly put this to the test and report back. Just one question, where in this do I define the outfiles for each of the newly pasted files? – Vlad Aug 24 '16 at 07:06
  • What my understanding of what you want to do is, for each `_end` file, you will take the lines you need to paste to the previous `_snip` file. Since you have the `_end` file, do that in whatever manner you currently are. Then using the dates from the `_end` file name, you locate the appropriate `_snip` file to add that information to (that is the `$prev` file. If you simply need to add the lines to the end of the `_snip` file, just redirect the line (e.g. `echo "$lines_from_end" >> "$prev"` – David C. Rankin Aug 24 '16 at 15:14