Iterating with awk over some thousend files and writing to the same files in one or two runs

Question

I have a lot of files in their own directory. All have the same name structure:

2019-10-18-42-IV-Friday.md
2019-10-18-42-IV-Saturday.md
2019-10-18-42-IV-Sunday.md
2019-10-18-43-43-IV-Monday.md
2019-10-18-42-IV Tuesday.md

and so on.

This is in detail: yyyy-mm-dd-dd-week of year-actual quarter-day of week.md

I want to write one line to each file as a second line: With awk I want to extract and expand the dates from the file name and then write them to the appropriate file.

This is the point where I fail.

%!awk -F"-"-" '{print "Today is $6 ", the " $3"."$2"."$1", Kw "$4", in the" $5 ". Quarter."}'

That works well, I get the sentence I want to write into the files.

So put the whole thing in a loop:

ze.sh  
 #!/bin/bash                                                                 
 for i in *.md;                                                              
       j = awk -F " " '{ print "** Today is " $6 ", the" $3"." $2"." $1", Kw " $4 ", in the " $5 ". Quarter. **"}' $i 
 Something with CAT, I suppose.                                                             

end

What do I have to do to make variable i iterate over all files, extract the values for j from $i, and then write $j to the second line of each file?

Thanks a lot for your help.

[Using manjaro linux and bash] GNU bash, Version 5.0.11(1)-release (x86_64-pc-linux-gnu) Linux version 5.2.21-1-MANJARO

In which format you want the date to be written inside file? Is it same format in which name is there? — RavinderSingh13, Oct 18 '19 at 07:05
`2019-10-18-43-43-IV-Monday.md` is not the same format as the other lines. What is the second 43 in there? `2019-10-18-42-IV Tuesday.md` is also different from other lines, it has a space after the quarter. Are these correct or a copy-paste error? What are all the possible formats, and how do you want to handle them? — janos, Oct 18 '19 at 07:22

RavinderSingh13 · Answer 1 · 2019-10-18T07:47:47.397

1

Could you please try following(haven't tested it, GNU awk is needed for this). For writing date on 2nd line, I have chosen same format in which your Input_file has date in it.

awk -i inplace '
FNR==2{
  split(FILENAME,array,"-")
  print array[1]"-"array[2]"-"array[3]
}
1
' *.md

If possible try without -i inplace option first so that changes will not be saved into Input_file and once you are Happy with results then you can add it as shown above to code to make inplace changes into Input_file.

For inplace update supported awk versions see James sir's posted link.

Save modifications in place with awk

edited Oct 18 '19 at 07:47

answered Oct 18 '19 at 07:10

RavinderSingh13

130,504
14
57
93

GAWK (at least 4.1.4) does not have in place editing – dash-o Oct 18 '19 at 07:31
1

See here about versions: https://stackoverflow.com/questions/16529716/save-modifications-in-place-with-awk – James Brown Oct 18 '19 at 07:33
1

@JamesBrown, Thank you for sharing the same sir, added into post now, cheers. – RavinderSingh13 Oct 18 '19 at 07:50

score 0 · Answer 2 · answered Oct 18 '19 at 07:30

You probably want to feed the file name into the AWK script, using the '-' to separate the components.

This script assume the second line need to be appended the AWK output to the file:

for i in *.md ; do
    echo $i | awk -F- 'AWK COMMAND HERE' >> $i
done

If the new text has to be inserted (as the second line) into the new file, the sed program can be used to perform update the file (using in-place edit '-i'). Something like

for i in *.md ; do
    mark=$(echo $i | awk -F- 'AWK COMMAND HERE')
    sed -i -e "2i$mark" $i
done

score 0 · Accepted Answer · answered Oct 18 '19 at 07:52

For updating a file in-place, sed is better suited than awk, because:

You don't need a recent version, older versions can do it too
Can work in both GNU and BSD flavors -> more portable

But first, to split a filename to its parts, you don't need an extra process, the read builtin can do it too. From your examples, we need to extract year, month, day, week numbers, a quarter string, and a weekday name string:

2019-10-18-42-IV-Friday.md
2019-10-18-42-IV-Saturday.md
2019-10-18-42-IV-Sunday.md
2019-10-18-43-43-IV-Monday.md
2019-10-18-42-IV Tuesday.md

For the first 3 lines, this simple expression would work:

IFS=-. read year month day week q dayname rest <<< "$filename"

The last line has a space before the weekday name instead of a -, but that's easy to fix:

IFS='-. ' read year month day week q dayname rest <<< "$filename"

Line 4 is harder to fix, because it has a different number of fields. To handle the extra field, we should add an extra variable term:

IFS='-. ' read year month day week q dayname ext rest <<< "$filename"

And then, if we can assume that the second 43 on that line can be ignored and we can just shift the arguments, then we use a conditional on the value of $ext. That is, for most lines the value of ext will be md (the file extension). If the value is different that means we have an extra field, and we should shift the values:

if [[ $ext != "md" ]; then
    q=$dayname
    dayname=$ext
fi

Now, we can use the variables to format the line you want to insert into the file:

line="Today is $dayname, the $day.$month.$year, Kw $week, in the $q. Quarter."

Finally, we can formulate a sed statement, for example to append our custom formatted line after the first one, ideally in a way that will work with both GNU and BSD flavors of sed.

This will work equivalently with both GNU and BSD versions:

sed -i.bak -e "1 a\\"$'\n'"$line"$'\n' "$filename" && rm *.bak

Notice that .bak backup files are created that must be manually removed.

If you don't want backup files to be created, then I'm afraid you need to use slightly different format for GNU and BSD flavors:

# GNU
sed -i'' -e "1 a\\"$'\n'"$line"$'\n' "$filename"

# BSD
sed -i '' -e "1 a\\"$'\n'"$line"$'\n' "$filename"

In fact if you only need to support GNU flavor, then a simpler form will work too:

sed -i'' "1 a$line" "$filename"

You can put all of that together in a for filename in *.md; do ...; done loop.

score 0 · Answer 4 · answered Oct 18 '19 at 12:29

This is the best solution for me, especially because it copes with the different delimiters.

Many thanks to everyone who was interested in this question and especially to those who posted solutions.

I wish I hadn't made it so hard because I mistyped the example data.

This is now "my" variant of the solution:

for filename in *.md; do 
  IFS='-. ' read year month day week q dayname rest <<< "$filename"
  line="Today is $dayname, the $day.$month.$year, Kw $week, in the $q. Quarter."
  sed -i.bak -e "1 a\\"$'\n'"$line"$'\n' "$filename" && rm *.bak;
  done

Because of the multiple field separators, the result is best to use.

But perhaps I am wrong, and the other solutions also offer the possibility of using different separators: At least '-' and '.' are required.

I am very surprised and pleased how quickly I received very good answers as a newcomer. Hopefully I can give something back.

And I'm also amazed how many different solutions are possible for the problems that arise.

If anyone is interested in what I've done, read on here: I've had a fatal autoimmune disease for two years. Little by little, my brain is destroyed, intermittently.

Especially my memory has suffered a lot; I often don't remember what I did yesterday, learned what still has to be done.

That's why I created day files until 31.12.2030, with a markdown template for each day. There I then record what I have done and learned on those days and what still has to be done.

It was important to me to have the correct date within the individual file. Why no database, why markdown?

I want to have a format that I can use anywhere, on any device and with any OS. A format that doesn't belong to a company, that can change it or make it more expensive, that can take it off the market or limit it with licenses.

It's fast enough. The changes to 4,097 files as described above took less than 2 seconds on my i5 laptop (12 GB Ram, SSD).

Searching with fzf over all files is also very fast. I can simply have the files converted and output as what I just need.

My memory won't come back from this, but I have a chance to log what I forgot.

Thank you very much for your help and attention.

Iterating with awk over some thousend files and writing to the same files in one or two runs

4 Answers4