Skip to main content

parse_html5

Function parse_html5 

Source
pub fn parse_html5(input: &str) -> XmlTree
Expand description

Parses input as HTML using the html5ever spec-compliant parser and returns the resulting XmlTree.

Compared to XmlParser this handles the full range of HTML5 content correctly:

  • Named and numeric entities (&,  , …) are decoded.
  • Void elements (<br>, <img>, <input>, …) are never given children.
  • Implicitly-closed block tags (<p>, <li>, …) are auto-closed per spec.
  • Unclosed tags at EOF are closed automatically.

Offset semantics: node offsets are synthetic (a monotonically increasing counter) and are not byte positions in the source string. This makes the tree unsuitable for persisting reading positions to disk. Use XmlParser when byte-accurate offsets are required.