Is there any reason why my AWK functions work only on a shortened version of my file

Question

I have a simple AWK function:

awk '
    BEGIN { FS=" "; RS="\n\n" ; OFS="\n"; ORS="\n" }
    /ms Response/ { print $0 }
    ' $FILE

The FILE is a large log that holds sections like this:

2021-10-13 12:15:12 CDT 526ms Request 
POST / HTTP/1.1 
Content-Type: application/x-www-form-urlencoded 
Host: xxxxxxxxxxxxxxxxxxx 
Content-Length: 279 

<query xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><product><name>drill</name><price>99</price><stock>5</stock></product>/query> 
2021-10-13 12:15:12 CDT 880ms Received

2021-10-13 12:15:12 CDT 896ms Response 
HTTP/1.1 200 OK 
Content-Type: application/xml 
Content-Length: 472

 <?xml version="1.0"?> 
<query type="c" xmlns="xxxxxxxxxxxxxx">  
<product>
<name>screwdriver</name>
<price>5</price>
<stock>51</stock>
</product>
</query>

2021-10-13 12:15:12 CDT 947ms Request 
POST / HTTP/1.1 
Content-Type: application/x-www-form-urlencoded 
Host: xxxxxxxxxxxxxxx
Content-Length: 515 
Expect: 100-continue

The above is just a snippet, the file continues for over 14000 lines, repeating the same pattern.

Now when I run my AWK function on the whole file, it just returns the whole file back. But when I run it on a file that was created with (cat $FILE | head -200), It works as expected by returning:

2021-10-13 12:15:12 CDT 896ms Response
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 472

2021-10-13 12:15:13 CDT 075ms Response
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 3207

2021-10-13 12:15:13 CDT 208ms Response
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 4220

Why can I run this on a shortened file but when I run it on a longer version, it does not work? Even though its the same data in the file?

I am working on Ubuntu 18.04 LTS in Bash.

Thank you!

I replicated your sample input 357 times to create a file with 10,761 lines, then ran your `awk` script against said file; result was an output file with 357 blocks with the first line = `2021-10-13 12:15:12 CDT 896ms Response` — markp-fuso, Dec 17 '21 at 22:52
@markp-fuso that is so strange bc I get back a very different output. Can you tell me the line of code you used to run it? Maybe that's where I am off... — Hugobop, Dec 17 '21 at 22:59
I cut-n-pated your code (above) into my console; just ran a test .... `unix2dos stuff.txt; awk '...' stuff.txt > stuff.out; wc -l stuff*` and guess what ... `stuff.txt` is the same size as `stuff.out`; at this point I'm wondering if your input file has windows/dos line endings (`\r\n`) and if so, can you remove them (eg, `dos2unix filename`) and run your script again? as to why the `cat|head` works ... I'm guessing something you're doing is converting the file to unix line endinges (`\r`) — markp-fuso, Dec 17 '21 at 23:00
That is very possible, thank you! I will investigate that and post back on here what I find — Hugobop, Dec 17 '21 at 23:07
@markp-fuso You are a life saver! That totally was the culprit. Thank you SO much! — Hugobop, Dec 17 '21 at 23:12
anytime you're dealing with files created-in/copied-from windows you need to keep that pesky `\r` at the forefront of any troubleshooting ... it can cause problems with parsing data files ... it can cause problems with executable/sourcable shell script files; especially if running WSL ... very easy to get caught with files containing `\r` :-) — markp-fuso, Dec 17 '21 at 23:15
HTTP headers (e.g. output from `curl -i`) are often terminated by CR-LF. — , Dec 18 '21 at 03:19

score 0 · Accepted Answer · edited Dec 18 '21 at 00:54

0

@markp-fuso's comment helped me. My input file had Windows line endings and I just needed to run the below command prior to executing the AWK:

tr -d '\15\32' < OGfile.txt > unixFile.txt

Then it ran as expected.

I received additional syntax help from the following question: Convert line endings

edited Dec 18 '21 at 00:54

Jeremy Caney

7,102
69
48
77

answered Dec 17 '21 at 23:16

Hugobop

125
10

1

I can't imagine why you're deleting `\32`s and `tr` is the wrong tool for removing line-ending `\15`s (`\r`s), see [why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it). Since you're already using GNU awk, though, you could just change `RS` to accommodate `\r`s with `RS="(\r\n){2}"` or similar and then you don't need to do anything to remove the `\r`s. – Ed Morton Dec 18 '21 at 12:30
1

If you want to convert from windows newlines to unix newlines you might use [`dos2unix`](https://www.tutorialspoint.com/unix_commands/dos2unix.htm) – Daweo Dec 18 '21 at 15:09

score 0 · Answer 2 · answered Dec 19 '21 at 08:00

You can use this:

awk -v RS= -v ORS='\n\n' '/ms Response/'

Or this, to avoid a trailing blank line:

awk -v RS= '/ms Response/ && c++ {printf "\n"} /ms Response/'

If RS is an empty string, the record separator is becomes two or more contiguous new lines.

Is there any reason why my AWK functions work only on a shortened version of my file

2 Answers2