17

I have a folder called foo. Foo has some other folders which might have sub folders and text files. I want to find every file which begins with the name year and and read its Nth line and print it to a new file. For example foo has a file called year1 and the sub folders have files called year2, year3 etc. The program will print the 1st line of year1 to a file called writeout, then it will print the 2nd line of year2 to the file writeout etc.

I also didn't really understand how to do a for loop for a file.

So far I have:

#!/bin/bash

for year* in ~/foo
do
  Here I tried writing some code using the sed command but I can't think of something       else.
done

I also get a message in the terminal which says `year*' not a valid identifier. Any ideas?

captain
  • 1,747
  • 5
  • 20
  • 32

7 Answers7

38

Sed can help you.

Recall that sed will normally process all lines in a file AND print each line in the file.

You can turn off that feature, and have sed only print lines of interest by matching a pattern or line number.

So, to print the 2nd line of file 2, you can say

sed -n '2p' file2 > newFile2

To print the 2nd line and then stop processing add the q (for quit) command (you also need braces to group the 2 commands together), i.e.

sed -n '2{p;q;}' file2 > newFile2

(if you are processing large files, this can be quite a time saving).

To make that more general, you can change the number to a variable that will hold a number, i.e.

  lineNo=3
  sed -n "${lineNo}{p;q;}" file3 > newFile3

If you want all of your sliced lines to go into 1 file, then use the shells 'append-redirection', i.e.

 for lineNo in 1 2 3 4 5 ; do
     sed -n  "${lineNo}{p;q;}" file${lineNo} >> aggregateFile
 done

The other postings, with using the results of find ... to drive your filelist, are an excellent approach.

I hope this helps.

shellter
  • 36,525
  • 7
  • 83
  • 90
  • The grouping syntax works in GNU sed. – glenn jackman Nov 03 '11 at 16:29
  • @glennjackman : not sure of your point. grouping syntax works in sed on AIX and solaris too, and to my knowledge and belief is part of the original design of sed. Thanks for the feedback :-) – shellter Nov 03 '11 at 16:32
  • 1
    If you like Python over sed you can do... `python -c "import sys; print(sys.stdin.readlines()[int(sys.argv[1])-1]).strip()" ` (or of course define an alias for that big thing) – floer32 Jan 17 '14 at 20:26
6

Here is one way to do it:

awk "NR==$YEAR" $file
Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
3

Use find to locate the files you want, and then sed to extract what you want:

find foo -type f -name year* |
while read file; do
    line=$(echo $file | sed 's/.*year\([0-9]*\)$/\1/')
    sed -n -e "$line {p; q}" $file
done

This approach:

  • Use find to produce a list of files with a name starting with the string "year".
  • Pipes the file list to a while loop to avoid long command lines
  • Uses sed to extract the desired line number from the name of the file
  • Uses sed to print just the desired line and then immediately quit. (You can leave out the q and just write ${line}p which would work but be potentially less efficient of $file is big. Also, q may not be fully supported on all versions of sed.)

It will not work properly for files with spaces in their names though.

Emil Sit
  • 22,894
  • 7
  • 53
  • 75
1
1.time head -5 emp.lst tail -1
It has taken time for execution is
real 0m0.004s
user 0m0.001s
sys 0m0.001s

or

2.awk 'NR==5' emp.lst
It has taken time for execution is
real 0m0.003s
user 0m0.000s
sys 0m0.002s

or 

3.sed -n '5p' emp.lst
It has taken time for execution is
real 0m0.001s
user 0m0.000s
sys 0m0.001s

or 

4.using some cute trick we can get this with cut command
cut -d “
“ -f 5 emp.lst
# after -d press enter ,it means delimiter is newline
It has taken time for execution is
real 0m0.001s
Sébastien
  • 11,860
  • 11
  • 58
  • 78
parmeet
  • 11
  • 1
  • 1
    While your answer may solve the question, it is always better if you can provide a description of what the issue was and how your answer solves it. This is a suggestion for further improving this and future answers. – Luís Cruz Sep 26 '14 at 10:52
  • 1
    Can you elaborate how your answer is working & helpful? – Rajesh Ujade Sep 26 '14 at 11:04
1

The best way that always works, provided you provide 2 arguments:

$ touch myfile
$ touch mycommand
$ chmod +x mycommand
$ touch yearfiles
$ find / -type f -name year* >> yearfiles
$ nano mycommand
$ touch foo

Type this:

#/bin/bash
head -n $1 $2 >> myfile
less -n 1 myfile >> foo

Use ^X, y, and enter to save. Then run mycommand:

$ ./mycommand 2 yearfiles
$ cat foo
year2

Presuming your year files are:

year1, year2, year3

Additionally, now you have setup, you just have to use $ ./mycommand LINENUMBER FILENAME from now on.

Okx
  • 353
  • 3
  • 23
1

Here you go

sed ${index}'q;d' ${input_file} > ${output_file}
Karol Król
  • 3,320
  • 1
  • 34
  • 37
0

Your task has two sub-tasks: Find the name of all the year files, and then extract the Nth line. Consider the following script:

for file in `find foo -name 'year*'`; do
     YEAR=`echo $file | sed -e 's/.*year\([0-9]*\)$/\1/'`
     head -n $YEAR $file | tail -n 1
done

The find call finds the matching files for you in the directory foo. The second line extracts only the digits at the end of the filename from the filename. The third line then extracts the first N lines from the file, keeping only the last of the first N lines (read: only the Nth line).

thiton
  • 35,651
  • 4
  • 70
  • 100