-2

I´m working with a text file (containing vcards) and need to remove ^M (leaving newlines) at the end of each line except within the NOTES field (since it seems that there the ^M signals the line continues and wraps around). See enclosed figure taken from a vi editor of the file where the newline characters are shown in blue automatically by vi...

enter image description here

If I read the lines of the file with a with statement (and write them after processing them), how should I process each line?

More broadly, what is the difference between ^M, $, and \n (newline) in Python? What is the role of each?

Note that the focus of my question is completely unrelated to:

Vcard parser with Python

Of course I can just remove the odd behavior in the NOTES field within vi by entering s/\n // or using re or even with serialize or even using the sortedChildren method within vobject. But this is not the focus of my question.

The focus is more broad. It is to understand what is this ^M character, if it is related to newlines, if it is related to Python or just a vobject construct. If it is the latter, and ^M has no general meaning, why it´s signaled in blue by the vi editor? What I find a bit odd is that in vi these line breaks with the ^M are followed by a carriage return and a BLANK space as if "^M$ " was a special sequence to denote "unintentional line break"... Again, is this three character sequence special to vobjects, more generic or just part of my imagination (and in either of these three cases why blue on vi).

What I´m trying to understand in the current question is why vi marks ^M as blue, what is the difference with $ in vi and weather these two characters have any special meaning in python. Since I notice that vi sets ^M in carriage returns related to the "NOTES" field, which seems to have fixed length (regardless of return breaks on NOTES), I´m trying to understand why and I have not found any explanation for it.

Community
  • 1
  • 1
  • 1
    ^M has no meaning in Python. $ has meaning only in a regular expression and means the end of a line. \n is a line break; most Python executables are compiled with universal newline support, so \n could be a carriage return or linefeed or one of each. – kindall Jan 08 '17 at 23:11
  • Thank you Kindall, this is very helpful. Do you know why does vi signal the ^M as blue if it does not have any special meaning? Do you think that ^M is a special character defined by the vcard standard or by the vobjects implementation? If so, should there be a special coding for when we want to use ^M within a notes field? – Brian Barcelona Jan 10 '17 at 01:57
  • `vi` may be displaying embedded carriage returns in the data as `^M`. You didn't mention `vi` so I didn't mention it. – kindall Jan 10 '17 at 02:06
  • That´s helpful Kindall. I should have mentioned it since I had the vi picture for starters (I now have updated the question to reflect this). I do not know what embedded carriage returns are but let me search and learn from this. This is what I was trying to get out of this question. The whole treatment of carriage returns is a bit mind-blowing, especially the fact that vcards have the three sequence ^M, followed by $ and then by a blank space. Thank you again! – Brian Barcelona Jan 10 '17 at 02:18

2 Answers2

2

RFC 6350 states:

Individual lines within vCard are delimited by […] a CRLF sequence (U+000D followed by U+000A).

[…]

Long logical lines of text can be split into a multiple-physical-line representation […] by inserting a CRLF immediately followed by a single white space character (space (U+0020) or horizontal tab (U+0009)).

You're only noticing this in the NOTES section because you found long lines of text there. You have to read the documentation for the file format you are trying to parse.

Community
  • 1
  • 1
Josh Lee
  • 171,072
  • 38
  • 269
  • 275
1

The focus is more broad. It is to understand what is this ^M character, if it is related to newlines, if it is related to Python or just a vobject construct. If it is the latter, and ^M has no general meaning, why it´s signaled in blue by the vi editor?

It marks ^M as blue based on your syntax coloring scheme.

To find the answer to your real question (what is ^M), have a look at What does ^M character mean in Vim?

The documentation for vim has a list of other digraphs and their meaning.

what is the difference with $ in vi and weather these two characters have any special meaning in python.

$ always means end-of-line when used in a regular expression. In and of itself, $ has no special meaning in Python except when used as part of a regular expression - where it means the same thing that it does in vim.

Community
  • 1
  • 1
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284