-1

I need to grep only lines with a certain length but also including newline/linebreak. So the first line will be one char longer than the other one.

Example:

"Random text with certain length\n"
"Random text with certain length"
EOF

I used grep as following:

grep -E "^.{length}$"

resulting printing both lines as they are same char count because it doesn't count the \n as char

Thanks for any ideas.

Miracle
  • 113
  • 1
  • 7
  • I'm not sure if it will work, but try `grep -E "^.{\`wc -c\`}$"`. `^.{length}$|^.{length-1}$\n` might also be what you're looking for – emsimpson92 Oct 31 '18 at 19:05
  • [How to “grep” for line length in a given range?](https://unix.stackexchange.com/q/184519/56041), etc. – jww Oct 31 '18 at 21:56

2 Answers2

1

TL;DR

To me, the easiest way to get the suggested results would be to replace the newlines with sed, prior to piping to to grep (i.e. fold). Then, unfold if necessary.

$ echo -e '"Random text with certain length\n"\n"Random text with certain length"\n' | sed -e ':a;N;$!ba;s/\n"/+"/g' -e '/"+/s//"\n/g' | grep -E "^.{33}$"
"Random text with certain length"
$ echo -e '"Random text with certain length\n"\n"Random text with certain length"\n' | sed -e ':a;N;$!ba;s/\n"/+"/g' -e '/"+/s//"\n/g' | grep -E "^.{34}$"
"Random text with certain length+"
$ echo -e '"Random text with certain length\n"\n"Random text with certain length"\n' | sed -e ':a;N;$!ba;s/\n"/+"/g' -e '/"+/s//"\n/g' | grep -E "^.{34}$" | sed -e '/+"/s//\n"/g'
"Random text with certain length
"

Thanks for clarifying the description. Some of what follows was in reference to the previous description, but seems like a waste to delete ...

I'm not sure I fully understand and made some assumptions.

  1. The lines all have double quotes, or at least something unique to fold/unfold the newlines you want to count.
  2. Either CR+LF or LF alone are what's being considered a 'newline/linebreak'
  3. In the description, \n (LF/$) could mean \r (CR/^M). That works with the reference to wc. Otherwise both grep and wc would not consider the lines the same length.

In other words, as stated, by default grep doesn't count newline (\n) as a character but does count carriage return (\r), whereas wc counts both as a character.

This affirms \n = newline ($) and \r = carriage return (^M)

\n = newline

$ echo -en '\n' | wc -c
1
$ echo -en '\n' | grep -E "^.{1}" | wc -c
0

\r = carriage return

$ echo -en '\r' | wc -c
1
$ echo -en '\r' | grep -E "^.{1}" | wc -c
2

To grep, carriage returns are an extra character. Newlines are not.

This will produce the same character count & result for both lines.

echo -en '\n' | sed -e '/\r/s///g' | grep -E "^.{1}" | wc -c
0
echo -en '\r' | sed -e '/\r/s///g' | grep -E "^.{1}" | wc -c
0

Given the criteria to filter by line length, by itself grep -E will never count a newline/LF as a character and therefore can't do it. Another example where both lines are visually the same length, but aren't actually the same length ...

$ echo -e 'hello\r\nworld\n'
hello
world
$ cat <<< "$(echo -e 'hello\r\nworld\n' | grep -E "^.{5}$")"
world
$ cat <<< "$(echo -e 'hello\r\nworld\n' | grep -E "^.{6}$")"
hello

... and inserting sed into the pipeline, both lines are of equal length {5}:

$ cat <<< "$(echo -e 'hello\r\nworld\n' | sed -e '/\r/s///g' | grep -E "^.{5}$")"
hello
world
$ cat <<< "$(echo -e 'hello\r\nworld\n' | sed -e '/\r/s///g' | grep -E "^.{6}$")"
<no output>
Joseph Tingiris
  • 176
  • 1
  • 6
  • So what should I use to have that one '\n' difference if grep doesn't work that way? I basically only need to get char count including \n – Miracle Oct 31 '18 at 22:37
  • Now that I understand a little better, seems like you need to replace some \n but not all. I added a TL;DR based on your output. Maybe that's what you're trying to do? This may be of help, too. https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed – Joseph Tingiris Oct 31 '18 at 22:59
0

Supposing you have the content save to a file named file.txt, then you can try something like this:

cat file.txt | awk 'length($0) > 38

it will output only the line with the length bigger than 38 chars:

"Random text with certain length\n" <br>

If you do:

cat a.txt | awk 'length($0) > 37'

then both lines are displayed since they all have 37 chars...

Not sure if that's what you wanted in the first place... Give it a try anyway!

Bogdan Stoica
  • 4,349
  • 2
  • 23
  • 38