0

I wrote an awk script designed to inject a function in certain definition files. It looks for lines containing a two word phrase, when found it adds a paragraph after that line, in which the second parameter is used at one point. In this example: If file.txt contains the words "foo bar" on a line, we want to add after it the lines "{" followed by "func(bar)" followed by "}", awk should see $1 as "foo" (static identifier) and $2 as "bar" (can be anything so we read it):

awk '{
    print $0
    if($1 == "foo" && $2 != "") {
        print "{"
        print "func(" $2 ")"
        print "}"
    }
}' file.txt > file_new.txt

The issue is that $2 gains an unwanted newline at the end. This particularly seems to happen if there's nothing else on the line after it. Therefore the result in the produced file ends up being this:

foo bar
{
func(bar
)
}

Instead of the correct format:

foo bar
{
func(bar)
}

Everything else works fine, only problem is this newline getting stored in the second parameter for a strange reason. What do you suggest to get rid of it?

Regarding separators: It's worth noting the detected line can contain any mixture of spaces and tabs, either before after or between the two words. This means something like " foo bar " should still ensure $1 is foo and $2 is bar.

Edit: With the provided example below, the file.txt would be something among the following lines. For different tests try adding spaces or tabs before / between / after foo and bar, as well as the empty line under it:

abc xyz
foo bar

123 789

This is the output of grep foo file.txt | od -c on the original file I'm getting an issue with, it's larger and contains different words but is the exact same functionality as simplified here:

0000000  \t   s   p   e   c   u   l   a   r   m   a   p  \t            
0000020       t   e   x   t   u   r   e   s   /   d   a   r   k   m   o
0000040   d   /   m   e   t   a   l   /   f   l   a   t   /   g   e   n
0000060   _   s   m   o   o   t   h   _   g   o   l   d   0   1   _   s
0000100   _   t   i   l   i   n   g   _   1   d  \n                   s
0000120   p   e   c   u   l   a   r   m   a   p  \t                   t
0000140   e   x   t   u   r   e   s   /   d   a   r   k   m   o   d   /
0000160   m   e   t   a   l   /   f   l   a   t   /   g   e   n   _   s
0000200   m   o   o   t   h   _   g   o   l   d   0   1   _   s   _   t
0000220   i   l   i   n   g   _   2   d  \n
0000231
MirceaKitsune
  • 777
  • 1
  • 5
  • 14
  • I might have seen it but don't think it helped. I don't want to ignore all newlines or no longer add them, only ensure the last parameter doesn't contain a newline it shouldn't have. Also I tried using `printf` instead of `print` but even then this issue occurs, only thing that changes is there's no newlines between the individual print commands which I obviously want. – MirceaKitsune Jan 10 '23 at 17:35
  • Edit your question to include the contents of the input file 'file.txt' – j_b Jan 10 '23 at 17:38
  • Worth noting: As a test I tried disabling the ORS by setting `BEGIN{ORS="";}` or `ORS=""` inside the function: This causes all my prints to appear on the same line, yet even then $2 still creates the bad newline. – MirceaKitsune Jan 10 '23 at 17:43
  • 4
    In the situation described in your comment using `BEGIN{ORS="";}`, awk is definitely not creating a newline. If it's in the output, it was in the input. You probably have DOS line endings, see https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it for one manifestation of that and how to deal with it. – Ed Morton Jan 10 '23 at 17:50
  • https://pastebin.com/RQ68FEj8 This was the initial full version of the script running on one file https://pastebin.com/bDCUNCyc This is an example of one of the text files being modified (name it in.mtr). This should create a full test case if one is needed. – MirceaKitsune Jan 10 '23 at 17:50
  • 1
    A full test case is indeed needed but not something linked in some external site, you're expected to create a [mcve] and include it in your question (no links, no images, just text). – Ed Morton Jan 10 '23 at 17:51
  • That makes sense. I don't understand why $2 is obtaining that newline to begin with, normally it seemed like a bug but it appears to be what awk is doing by default, there probably needs to be a way to tell it not to. I am trying to understand the cause as well. – MirceaKitsune Jan 10 '23 at 17:53
  • The issue is almost certainly your input file, not awk. Please read what I said at https://stackoverflow.com/questions/75073437/awk-adds-undesired-newline-at-the-end-of-last-detected-parameter#comment132482431_75073437 – Ed Morton Jan 10 '23 at 17:54
  • I ran some tests on lines `foo bar` and `foo bar something` and in both cases your script works as expected, so definitely looks/sounds like potential issue with the contents of a file ... either `file.txt` or the scrpt containing the `awk` code; can you update the question with the output from `grep foo file.txt | od -c`? wondering if there's an odd character on the end of `bar` – markp-fuso Jan 10 '23 at 17:55
  • Submitted an edit with a better explanation of the text file for my example as well as the output for that command. – MirceaKitsune Jan 10 '23 at 18:01
  • 1
    If you want to make a human-readable display of a file with nonprintable content, `hexdump -C – Charles Duffy Jan 10 '23 at 18:06
  • StackOverflow doesn't seem to let me attach text tiles or I'd post a full isolated test case (it's from an open source project). Pastebin reformats it and would remove the issue if the file is the problem. Can you suggest a command that would remove the suspected DOS line endings first before any processing by awk? I will check it a bit later. – MirceaKitsune Jan 10 '23 at 18:11
  • You could try `dos2unix` – j_b Jan 10 '23 at 18:22
  • Again (last time), PLEASE read what I said at https://stackoverflow.com/questions/75073437/awk-adds-undesired-newline-at-the-end-of-last-detected-parameter#comment132482431_75073437., including the link I provided that describes the problem AND how to deal with it. – Ed Morton Jan 10 '23 at 18:22
  • another option would be to cut the sample file down to just a few lines, including the problematic line (verify on your end the script generates the extra `\n`), then `base64 sample_file` and post the b64 output in the question; we can use `base64 -d` to reconstitute the same file in our system; having said that ... I don't see anything in the `od -c` output that suggests any issues – markp-fuso Jan 10 '23 at 18:22
  • Regarding `StackOverflow doesn't seem to let me attach text tiles or I'd post a full isolated test case` - good, as we don't want/need to see whatever raw, probably large, file you have lying around, we need you to [edit] your question to post a [mcve] that you create that demonstrates your problem as concisely as possible. – Ed Morton Jan 10 '23 at 18:24
  • Regarding `Regarding separators: It's worth noting...` in your question - what you describe there is simply the documented behavior of awk when using the default FS. – Ed Morton Jan 10 '23 at 18:27
  • Regarding `This is the output of grep foo file.txt | od -c` in your question - no, it's not. There's no `foo` in that output. We can't help you further if you won't show us a real, textual [mcve]. – Ed Morton Jan 10 '23 at 18:30
  • BTW, I asked for `xxd` instead of `od` in part because `xxd` can be reversed back to a byte-identical copy of your original file. – Charles Duffy Jan 10 '23 at 18:40
  • I edited my question to post a minimal example. I'm aware SO has a culture of telling people to do something then once they do it's too much then it's too little again and so on. The script and an example text file are both provided in the question now, I don't know what else I could add. Strangely enough the simple test doesn't produce my issue, so indeed it must be something in the text file format. Indeed I have looked at the question you noted and that does seem like an explanation, I will try to apply it next and see if that succeeds. Thank you for the help so far. – MirceaKitsune Jan 10 '23 at 18:57
  • I redid the `grep word file.txt | od - c` test with a better understanding of what it does: Indeed the problematic lines are shown to end in `\r \n` which matches the issue described in the question posted by @EdMorton Sorry for the delay in finishing this test and glad I can confirm this indeed seems to be the issue. I will attempt the fixes suggested there next. – MirceaKitsune Jan 10 '23 at 19:03
  • `awk '{sub(/\r$/,"")}1' file` removes the problematic `\r` from the string which will hopefully fix this problem in the practical case. Was going to provide that as an answer but since this is closed now will just confirm it here for anyone else who finds this, very difficult issue to spot without being aware of it. – MirceaKitsune Jan 10 '23 at 19:12

0 Answers0