87

I use emacs to edit my xml files (nxml-mode) and the files were generated by machine don't have any pretty formatting of the tags.

I have searched for pretty printing the entire file with indentation and saving it, but wasn't able to find an automatic way.

Is there a way? Or atleast some editor on linux which can do it.

Community
  • 1
  • 1
cnu
  • 36,135
  • 23
  • 65
  • 63

15 Answers15

118

You don't even need to write your own function - sgml-mode (a gnu emacs core module) has a built-in pretty printing function called (sgml-pretty-print ...) which takes region beginning and end arguments.

If you are cutting and pasting xml and you find your terminal is chopping the lines in arbitrary places you can use this pretty printer which fixes broken lines first.

Juan Garcia
  • 1,261
  • 2
  • 8
  • 2
  • How do I execute a function passing the region as argument? – Alexandre Rademaker Dec 07 '10 at 23:33
  • 1
    (sgml-pretty-print (region-beginning) (region-end)) – ScootyPuff Jan 04 '11 at 15:57
  • 8
    I'm not sure how `sgml-mode` might have changed over time. Today, I invoked `C-x C-f foo.xml`, `M-x sgml-mode`, then `M-x sgml-pretty-print` and my xml file got pretty printed. (Well, emacs hanged for twenty seconds or more before completing. It was a one line file before the pretty print and 720 lines after.) – daveloyall Aug 10 '15 at 18:14
  • 1
    Actually, I also had to do `C-x g` to select the whole buffer as a region. – daveloyall Aug 10 '15 at 20:23
  • 3
    I didn't even have to switch to sgml-mode. It was a M-x command in nXML mode! – nroose Jul 04 '18 at 01:04
  • 2
    Using Emacs 26.2, I can stay in nXML mode, select the whole buffer `C-x h` and then `M-x sgml-pretty-print`. The xml will be pretty formatted now – Swedgin Aug 12 '19 at 13:03
  • In 2020 it seems like `sgml-pretty-print` is still incredibly slow for simple format improvements. – NetMage Nov 09 '20 at 20:56
89

If you only need pretty indenting without introducing any new line-breaks, you can apply the indent-region command to the entire buffer with these keystrokes:

C-x h
C-M-\

If you also need to introduce line-breaks, so that opening and closing tags are on separate lines, you could use the following very nice elisp function, written by Benjamin Ferrari. I found it on his blog and hope it's ok for me to reproduce it here:

(defun bf-pretty-print-xml-region (begin end)
  "Pretty format XML markup in region. You need to have nxml-mode
http://www.emacswiki.org/cgi-bin/wiki/NxmlMode installed to do
this.  The function inserts linebreaks to separate tags that have
nothing but whitespace between them.  It then indents the markup
by using nxml's indentation rules."
  (interactive "r")
  (save-excursion
    (nxml-mode)
    (goto-char begin)
    (while (search-forward-regexp "\>[ \\t]*\<" nil t) 
      (backward-char) (insert "\n") (setq end (1+ end)))
    (indent-region begin end))
  (message "Ah, much better!"))

This doesn't rely on an external tool like Tidy.

Kind Stranger
  • 1,736
  • 13
  • 18
Christian Berg
  • 14,246
  • 9
  • 39
  • 44
  • 1
    Good defun, thanks. Removing the (nxml-mode) from the above pretty-print defun allows it to work in the sgml-mode that is built-in to emacs 22.2.1. But I modified it to do the entire buffer (point-min) to (point-max) because that's my main thing. Also, one bug: for each newline you insert, you will need to increment end. – Cheeso Jun 03 '09 at 15:33
  • How can I use this function in Emacs? I have copied and pasted the function code in *scratch* buffer and evaluated it. Now, how do I invoke this function? – Alexandre Rademaker Feb 21 '11 at 12:01
  • 1
    After evaluating the defun, you can invoke it like any other function: M-x bf-pretty-print-xml-region. (You don't have to type it all, of course, use tab completion: M-x bf should be enough.) You probably don't want to define the function every time you want to use it, so put it somewhere where it is loaded at start-time, e.g. in ~/.emacs.d/init.el – Christian Berg Feb 22 '11 at 16:50
  • 1
    How about breaking long attribute lists? – ceving Sep 21 '12 at 12:33
  • This is fabulous, because tidy complains about invalid character encodings and wants me to clean them up *before* it will reformat the file! Sometimes the point is to see the structure of a broken xml file and tidy will refuse to help. – TauPan May 23 '16 at 08:45
  • for every `(insert "\n")` you also need to increase `end` by 1 so you indent the whole region, otherwise you might miss the last few lines. This correction has already been added to the Benjamin Ferrari blog link provided in this answer. – Kind Stranger Aug 19 '20 at 13:40
35

Emacs can run arbitrary commands with M-|. If you have xmllint installed:

"M-| xmllint --format -" will format the selected region

"C-u M-| xmllint --format -" will do the same, replacing the region with the output

Tim Helmstedt
  • 2,744
  • 1
  • 20
  • 10
25

I use nXML mode for editing and Tidy when I want to format and indent XML or HTML. There is also an Emacs interface to Tidy.

Jeff Trull
  • 1,236
  • 11
  • 16
Marcel Levy
  • 3,407
  • 1
  • 28
  • 39
  • By the end of 2013 tidy.el Version: 20111222.1756 fails to run on Emacs 24 with ```wrong type argument: stringp, nil``` – keiw Dec 21 '13 at 12:38
  • @keiw That's probably because you're doing it in a buffer that doesn't have a file name. Got the same error and traced it to that on my side at least. – Alf Jan 21 '14 at 13:10
22

For introducing line breaks and then pretty printing

M-x sgml-mode
M-x sgml-pretty-print
Talespin_Kit
  • 20,830
  • 29
  • 89
  • 135
20

Thanks to Tim Helmstedt above I made st like this:

(defun nxml-pretty-format ()
    (interactive)
    (save-excursion
        (shell-command-on-region (point-min) (point-max) "xmllint --format -" (buffer-name) t)
        (nxml-mode)
        (indent-region begin end)))

fast and easy. Many thanks.

Sean Allred
  • 3,558
  • 3
  • 32
  • 71
bubak
  • 1,464
  • 1
  • 13
  • 11
8

here's a few tweaks I made to Benjamin Ferrari's version:

  • the search-forward-regexp didn't specify an end, so it would operate on stuff from beginning of region to end of buffer (instead of end of region)
  • Now increments end properly, as Cheeso noted.
  • it would insert a break between <tag></tag>, which modifies its value. Yes, technically we're modifying values of everything here, but an empty start/end is much more likely to be significant. Now uses two separate, slightly more strict searches to avoid that.

Still has the "doesn't rely on external tidy", etc. However, it does require cl for the incf macro.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; pretty print xml region
(defun pretty-print-xml-region (begin end)
  "Pretty format XML markup in region. You need to have nxml-mode
http://www.emacswiki.org/cgi-bin/wiki/NxmlMode installed to do
this.  The function inserts linebreaks to separate tags that have
nothing but whitespace between them.  It then indents the markup
by using nxml's indentation rules."
  (interactive "r")
  (save-excursion
    (nxml-mode)
    (goto-char begin)
    ;; split <foo><foo> or </foo><foo>, but not <foo></foo>
    (while (search-forward-regexp ">[ \t]*<[^/]" end t)
      (backward-char 2) (insert "\n") (incf end))
    ;; split <foo/></foo> and </foo></foo>
    (goto-char begin)
    (while (search-forward-regexp "<.*?/.*?>[ \t]*<" end t)
      (backward-char) (insert "\n") (incf end))
    (indent-region begin end nil)
    (normal-mode))
  (message "All indented!"))
Community
  • 1
  • 1
Jason Viers
  • 1,722
  • 12
  • 20
5

One way of doing is If you have something in below format

<abc>     <abc><abc>   <abc></abc> </abc></abc>       </abc>

In Emacs, try

M-x nxml-mode
M-x replace-regexp RET  > *< RET >C-q C-j< RET 
C-M-\ to indent

This will indent above xml example to below

<abc>
  <abc>
    <abc>
      <abc>
      </abc>
    </abc>
  </abc>
</abc>

In VIM you can do this by

:set ft=xml
:%s/>\s*</>\r</g
ggVG=

Hope this helps.

rajashekar
  • 173
  • 3
  • 10
3

as of 2017 emacs already comes with this capability by default, but you have to write this little function into your ~/.emacs.d/init.el:

(require 'sgml-mode)

(defun reformat-xml ()
  (interactive)
  (save-excursion
    (sgml-pretty-print (point-min) (point-max))
    (indent-region (point-min) (point-max))))

then just call M-x reformat-xml

source: https://davidcapello.com/blog/emacs/reformat-xml-on-emacs/

ninrod
  • 523
  • 6
  • 25
2

I took Jason Viers' version and added logic to put xmlns declarations on their own lines. This assumes that you have xmlns= and xmlns: with no intervening whitespace.

(defun cheeso-pretty-print-xml-region (begin end)
  "Pretty format XML markup in region. You need to have nxml-mode
http://www.emacswiki.org/cgi-bin/wiki/NxmlMode installed to do
this.  The function inserts linebreaks to separate tags that have
nothing but whitespace between them.  It then indents the markup
by using nxml's indentation rules."
  (interactive "r")
  (save-excursion
    (nxml-mode)
    ;; split <foo><bar> or </foo><bar>, but not <foo></foo>
    (goto-char begin)
    (while (search-forward-regexp ">[ \t]*<[^/]" end t)
      (backward-char 2) (insert "\n") (incf end))
    ;; split <foo/></foo> and </foo></foo>
    (goto-char begin)
    (while (search-forward-regexp "<.*?/.*?>[ \t]*<" end t)
      (backward-char) (insert "\n") (incf end))
    ;; put xml namespace decls on newline
    (goto-char begin)
    (while (search-forward-regexp "\\(<\\([a-zA-Z][-:A-Za-z0-9]*\\)\\|['\"]\\) \\(xmlns[=:]\\)" end t)
      (goto-char (match-end 0))
      (backward-char 6) (insert "\n") (incf end))
    (indent-region begin end nil)
    (normal-mode))
  (message "All indented!"))
Community
  • 1
  • 1
Cheeso
  • 189,189
  • 101
  • 473
  • 713
2
  1. Emacs nxml-mode can work on presented format, but you'll have to split the lines.
  2. For longer files that simply isn't worth it. Run this stylesheet (ideally with Saxon which IMHO gets the line indents about right) against longer files to get a nice pretty print. For any elements where you want to retain white space add their names alongside 'programlisting' as in 'programlisting yourElementName'

HTH

DaveP
  • 1,079
  • 1
  • 13
  • 13
1

Tidy looks like a good mode. Must look at it. Will use it if I really need all the features it offers.

Anyway, this problem was nagging me for about a week and I wasn't searching properly. After posting, I started searching and found one site with an elisp function which does it pretty good. The author also suggests using Tidy.

Thanks for answer Marcel (too bad I don't have enough points to upmod you).

Will post about it soon on my blog. Here is a post about it (with a link to Marcel's site).

cnu
  • 36,135
  • 23
  • 65
  • 63
1

I use xml-reformat-tags from xml-parse.el. Usually you will want to have the point at the beginning of the file when running this command.

It's interesting that the file is incorporated into Emacspeak. When I was using Emacspeak on day-by-day basis, I thought xml-reformat-tags is an Emacs builtin. One day I lost it and had to make an internet search for that, and thus entered the wiki page mentioned above.

I'm attaching also my code to start xml-parse. Not sure if this is the best piece of Emacs code, but seems to work for me.

(if (file-exists-p "~/.emacs.d/packages/xml-parse.el")
  (let ((load-path load-path))
    (add-to-list 'load-path "~/.emacs.d/packages")
    (require 'xml-parse))
)
Jarekczek
  • 7,456
  • 3
  • 46
  • 66
1

If you use spacemacs, just use command 'spacemacs/indent-region-or-buffer'.

M-x spacemacs/indent-region-or-buffer
JohnnyZ
  • 51
  • 5
0

I'm afraid I like Benjamin Ferrari version much better. The internal pretty print always places the end tag in a new line after the value, inserting unwanted CR in the tag values.