1

I'm trying to parse minimal mark-up text by lines. Currently I have a for loop that parses letter by letter. See the code below:

Text:

<element id="myE">
This is some text that
represents accurately the way I 
have written my html
file.
</element>

code:

var list = document.getElementById("myE").innerHTML;
var tallie = 0;

for (i=1;i<list.length;i++) {
  if (/*list[i] == " "*/ true) {
    list += 1;
    console.log(list[i]);
  }
}

console.log(tallie);

As expected, the text embedded in the element renders in the DOM as though it were a continuous, properly formatted string. But what I'm finding is that the console recognizes the difference between a non-breaking space and a new line. where " " and

"
"

represent the two respectively.

Since the console appears to know the difference, it seems there should be a way to test for the difference. If you unlock the commented condition, it will start testing for non-breaking spaces. I think there is another way to do this using the character encoding string (not &nbsp, another one). It seems reasonable then to expect to be able to find a character code for a breaking space. Unfortunately I can not find one.

Long story short, how can I achieve a true line by line parsing of an html file?

nastetajup
  • 11
  • 1
  • 4
  • 3
    `'\n'` is a line break. Is that what you are looking for? – Felix Kling Aug 16 '16 at 14:49
  • 1
    You can't the line break is depedant on the many details, screen size, container size, font size, zoom, etc. What you see as a new line I may not. – Liam Aug 16 '16 at 14:49
  • _“But in the console, inline spaces appear as non-breaking spaces `" "`”_ – if `" "` is all you get to see, then how can you tell whether that’s a normal or a non-breaking space …? – CBroe Aug 16 '16 at 14:54
  • What about using the opposite approach? Replace all whitespace, tabs, carriage returns etc by nothing so you have one very long string. Then remove all the characters between html tags (<> ,>, />). The result is a string containing only the content of html tags. (which has a length, etc) – Shilly Aug 16 '16 at 14:55
  • @CBroe Where `This` and `is` separate, the console returns `" "`. Where `that` and `represents` separate, the console returns an alternative to `" "` that can't be displayed in the comments. But if you re-read the question you'll see what the alternative looks like. – nastetajup Aug 16 '16 at 15:00
  • @Shilly While that all sounds really cool, I don't see where it would help me parse the file line by line. – nastetajup Aug 16 '16 at 15:02
  • 1
    If you want to parse line by line, shouldn't you have var list=document.getElementById("myE").innerHTML.split("\n"); ? – bruceceng Aug 16 '16 at 15:04
  • @FelixKling omg Thank you! Jesus why wasn't that in any character code charts I could find? – nastetajup Aug 16 '16 at 15:05
  • @bruceceng I hadn't got that far yet. Was still trying to figure out how to target line breaks. – nastetajup Aug 16 '16 at 15:07
  • 1
    My point was that since the code shown only gives you the length of all the content of html tags, you might as well skip the "loop to parse line by line part' and just replace everything that you don't want to tally up. If we would know why you need to parse a html file line by line, we could offer more or maybe better advice. Just saying since I've struggled alot in the past with line-by-line parsing HTML to get a template engine to work. Anyways, upvoted Klings comment. – Shilly Aug 16 '16 at 15:19
  • @Shilly I see. Well my overall project currently is to build a locally contained media library engine which I can host for myself on a cloud drive. Basically I'm tired of relying on existing services (paid or otherwise) that consistently fall short of my needs for housing, accessing and discovering my obscure media collection/interests. The reason for my question is that I'm attempting to build a minimal mark-up list of media titles which can be easily iterated through and modified via a simple, unbloated, effective interface containing exactly the functions I need. – nastetajup Aug 16 '16 at 17:23

1 Answers1

1

Newline characters are encoded with \n. Sometimes you will also find combinations of carriage return and new line \r\n (see wikipedia on Newline). These should not be confused with a Non Breaking Space &nbsp; or &#160; which are used if you want the browser to not word wrap but still display a space or if you want the browser to not collapse multiple spaces together.

bruceceng
  • 1,844
  • 18
  • 23