Count number of line in txt file when new line is inside data

Question

I have one txt file which has below data

Name    mobile  url message text
test11  1234567890  www.google.com  "Data Test New
Date:27/02/2020
Items: 1
Total: 3
Regards
ABC DATa
Ph:091 : 123456789"
test12  1234567891  www.google.com  "Data Test New one
Date:17/02/2020
Items: 26
Total: 5
Regards
user test
Ph:091 : 433333333"

Now you can see my last column data has new line character. so when I use below command

awk 'END{print NR}' file.txt

it is giving my length is 15 but actually line length is 3 . Please suggest command for the same

Edited Part: As per the answer given the below script is not working if there's no newline at the end of input file

awk -v RS='"[^"]*"' '{gsub(/\n/, " ", RT); ORS=RT} END{print NR "\n"}' test.txt

Also my file may have 3-4 Million of records . So converting file to unix format will take time and that is not my preference. So Please suggest some optimum solution which should work in both case

head 5.csv | cat -A  
Above command is giving me the output

Name mobile url message text^M$

You cannot have a line with a newline character inside it: you automatically create a new line (which is the whole idea). — Dominique, Nov 27 '20 at 09:43
Does this answer your question? [What's the most robust way to efficiently parse CSV using awk?](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk) — rethab, Nov 27 '20 at 09:44
So you want to known number of newline characters outside `""`? — Daweo, Nov 27 '20 at 09:47
@Daweo I want to count number of line inside this file and for given data it should give 3 — user13000875, Nov 27 '20 at 09:48
@user13000875, we understand you want to count number of lines but our question is what is the Logic of counting a multiple(lines separated by new lines) lines as 1 line, kindly do make it clear in your question. — RavinderSingh13, Nov 27 '20 at 09:49
What you're calling a line isn't a line in POSIX terms because a line is a string that ends in a newline and therefore cannot contain a newline. That's what's confusing everyone. What you should be calling it instead is a record and then you can define what you mean by a record, e.g. a string of text ending in a newline that can contain newlines within quoted fields. — Ed Morton, Nov 27 '20 at 15:19

anubhava · Accepted Answer · 2020-12-09T14:50:45.037

7

Using gnu-awk you can do this using a custom RS:

awk -v RS='"[^"]*"' '{gsub(/(\r?\n){2,}/, "\n"); n+=gsub(/\n/, "&")}
END {print n}' <(sed '$s/$//' file)

15001

Here:

-v RS='"[^"]*"': Uses this regex as input record separator. Which matches a double quoted string
n+=gsub(/\n/, "&"): Dummy replace \n with itself and counts \n in variable n
END {print n}: Prints n in the end
sed '$s/$//' file: For last line adds a newline (in case it is missing)

Code Demo

edited Dec 09 '20 at 14:50

answered Nov 27 '20 at 09:49

anubhava

761,203
64
569
643

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/225762/discussion-on-answer-by-anubhava-count-number-of-line-in-txt-file-when-new-line). – Samuel Liew Dec 10 '20 at 08:13

Sundeep · Answer 2 · 2020-11-27T10:21:59.683

With perl, assuming last line always ends with a newline character

$ perl -0777 -nE 'say s/"[^"]+"(*SKIP)(*F)|\n//g' ip.txt
3

-0777 to slurp entire input file as a single string, so this isn't suitable if the input file is very large
the s command returns number of substitutions made, which is used here to get the count of newlines
"[^"]+"(*SKIP)(*F) will cause newlines within double quotes to be ignored

You can use the below command if you want to count the last line even if it doesn't end with newline character.

perl -0777 -nE 'say scalar split /"[^"]+"(*SKIP)(*F)|\n/' ip.txt

score 0 · Answer 3 · answered Nov 27 '20 at 13:10

0

Same as anubhava but with GNU sed:

<infile sed '/"/ { :a; N; /"$/!ba; s/\n/ /g; }' | wc -l

Output:

answered Nov 27 '20 at 13:10

Thor

45,082
11
119
130

Count number of line in txt file when new line is inside data

3 Answers3

Linked