[ Index ]

PHP Cross Reference of WordPress Trunk (Updated Daily)

Search

title

Body

[close]

/wp-includes/html-api/ -> class-wp-html-processor.php (source)

   1  <?php
   2  /**
   3   * HTML API: WP_HTML_Processor class
   4   *
   5   * @package WordPress
   6   * @subpackage HTML-API
   7   * @since 6.4.0
   8   */
   9  
  10  /**
  11   * Core class used to safely parse and modify an HTML document.
  12   *
  13   * The HTML Processor class properly parses and modifies HTML5 documents.
  14   *
  15   * It supports a subset of the HTML5 specification, and when it encounters
  16   * unsupported markup, it aborts early to avoid unintentionally breaking
  17   * the document. The HTML Processor should never break an HTML document.
  18   *
  19   * While the `WP_HTML_Tag_Processor` is a valuable tool for modifying
  20   * attributes on individual HTML tags, the HTML Processor is more capable
  21   * and useful for the following operations:
  22   *
  23   *  - Querying based on nested HTML structure.
  24   *
  25   * Eventually the HTML Processor will also support:
  26   *  - Wrapping a tag in surrounding HTML.
  27   *  - Unwrapping a tag by removing its parent.
  28   *  - Inserting and removing nodes.
  29   *  - Reading and changing inner content.
  30   *  - Navigating up or around HTML structure.
  31   *
  32   * ## Usage
  33   *
  34   * Use of this class requires three steps:
  35   *
  36   *   1. Call a static creator method with your input HTML document.
  37   *   2. Find the location in the document you are looking for.
  38   *   3. Request changes to the document at that location.
  39   *
  40   * Example:
  41   *
  42   *     $processor = WP_HTML_Processor::create_fragment( $html );
  43   *     if ( $processor->next_tag( array( 'breadcrumbs' => array( 'DIV', 'FIGURE', 'IMG' ) ) ) ) {
  44   *         $processor->add_class( 'responsive-image' );
  45   *     }
  46   *
  47   * #### Breadcrumbs
  48   *
  49   * Breadcrumbs represent the stack of open elements from the root
  50   * of the document or fragment down to the currently-matched node,
  51   * if one is currently selected. Call WP_HTML_Processor::get_breadcrumbs()
  52   * to inspect the breadcrumbs for a matched tag.
  53   *
  54   * Breadcrumbs can specify nested HTML structure and are equivalent
  55   * to a CSS selector comprising tag names separated by the child
  56   * combinator, such as "DIV > FIGURE > IMG".
  57   *
  58   * Since all elements find themselves inside a full HTML document
  59   * when parsed, the return value from `get_breadcrumbs()` will always
  60   * contain any implicit outermost elements. For example, when parsing
  61   * with `create_fragment()` in the `BODY` context (the default), any
  62   * tag in the given HTML document will contain `array( 'HTML', 'BODY', … )`
  63   * in its breadcrumbs.
  64   *
  65   * Despite containing the implied outermost elements in their breadcrumbs,
  66   * tags may be found with the shortest-matching breadcrumb query. That is,
  67   * `array( 'IMG' )` matches all IMG elements and `array( 'P', 'IMG' )`
  68   * matches all IMG elements directly inside a P element. To ensure that no
  69   * partial matches erroneously match it's possible to specify in a query
  70   * the full breadcrumb match all the way down from the root HTML element.
  71   *
  72   * Example:
  73   *
  74   *     $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
  75   *     //               ----- Matches here.
  76   *     $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) );
  77   *
  78   *     $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
  79   *     //                                  ---- Matches here.
  80   *     $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'FIGCAPTION', 'EM' ) ) );
  81   *
  82   *     $html = '<div><img></div><img>';
  83   *     //                       ----- Matches here, because IMG must be a direct child of the implicit BODY.
  84   *     $processor->next_tag( array( 'breadcrumbs' => array( 'BODY', 'IMG' ) ) );
  85   *
  86   * ## HTML Support
  87   *
  88   * This class implements a small part of the HTML5 specification.
  89   * It's designed to operate within its support and abort early whenever
  90   * encountering circumstances it can't properly handle. This is
  91   * the principle way in which this class remains as simple as possible
  92   * without cutting corners and breaking compliance.
  93   *
  94   * ### Supported elements
  95   *
  96   * If any unsupported element appears in the HTML input the HTML Processor
  97   * will abort early and stop all processing. This draconian measure ensures
  98   * that the HTML Processor won't break any HTML it doesn't fully understand.
  99   *
 100   * The following list specifies the HTML tags that _are_ supported:
 101   *
 102   *  - Containers: ADDRESS, BLOCKQUOTE, DETAILS, DIALOG, DIV, FOOTER, HEADER, MAIN, MENU, SPAN, SUMMARY.
 103   *  - Custom elements: All custom elements are supported. :)
 104   *  - Form elements: BUTTON, DATALIST, FIELDSET, INPUT, LABEL, LEGEND, METER, PROGRESS, SEARCH.
 105   *  - Formatting elements: B, BIG, CODE, EM, FONT, I, PRE, SMALL, STRIKE, STRONG, TT, U, WBR.
 106   *  - Heading elements: H1, H2, H3, H4, H5, H6, HGROUP.
 107   *  - Links: A.
 108   *  - Lists: DD, DL, DT, LI, OL, UL.
 109   *  - Media elements: AUDIO, CANVAS, EMBED, FIGCAPTION, FIGURE, IMG, MAP, PICTURE, SOURCE, TRACK, VIDEO.
 110   *  - Paragraph: BR, P.
 111   *  - Phrasing elements: ABBR, AREA, BDI, BDO, CITE, DATA, DEL, DFN, INS, MARK, OUTPUT, Q, SAMP, SUB, SUP, TIME, VAR.
 112   *  - Sectioning elements: ARTICLE, ASIDE, HR, NAV, SECTION.
 113   *  - Templating elements: SLOT.
 114   *  - Text decoration: RUBY.
 115   *  - Deprecated elements: ACRONYM, BLINK, CENTER, DIR, ISINDEX, KEYGEN, LISTING, MULTICOL, NEXTID, PARAM, SPACER.
 116   *
 117   * ### Supported markup
 118   *
 119   * Some kinds of non-normative HTML involve reconstruction of formatting elements and
 120   * re-parenting of mis-nested elements. For example, a DIV tag found inside a TABLE
 121   * may in fact belong _before_ the table in the DOM. If the HTML Processor encounters
 122   * such a case it will stop processing.
 123   *
 124   * The following list specifies HTML markup that _is_ supported:
 125   *
 126   *  - Markup involving only those tags listed above.
 127   *  - Fully-balanced and non-overlapping tags.
 128   *  - HTML with unexpected tag closers.
 129   *  - Some unbalanced or overlapping tags.
 130   *  - P tags after unclosed P tags.
 131   *  - BUTTON tags after unclosed BUTTON tags.
 132   *  - A tags after unclosed A tags that don't involve any active formatting elements.
 133   *
 134   * @since 6.4.0
 135   *
 136   * @see WP_HTML_Tag_Processor
 137   * @see https://html.spec.whatwg.org/
 138   */
 139  class WP_HTML_Processor extends WP_HTML_Tag_Processor {
 140      /**
 141       * The maximum number of bookmarks allowed to exist at any given time.
 142       *
 143       * HTML processing requires more bookmarks than basic tag processing,
 144       * so this class constant from the Tag Processor is overwritten.
 145       *
 146       * @since 6.4.0
 147       *
 148       * @var int
 149       */
 150      const MAX_BOOKMARKS = 100;
 151  
 152      /**
 153       * Holds the working state of the parser, including the stack of
 154       * open elements and the stack of active formatting elements.
 155       *
 156       * Initialized in the constructor.
 157       *
 158       * @since 6.4.0
 159       *
 160       * @var WP_HTML_Processor_State
 161       */
 162      private $state = null;
 163  
 164      /**
 165       * Used to create unique bookmark names.
 166       *
 167       * This class sets a bookmark for every tag in the HTML document that it encounters.
 168       * The bookmark name is auto-generated and increments, starting with `1`. These are
 169       * internal bookmarks and are automatically released when the referring WP_HTML_Token
 170       * goes out of scope and is garbage-collected.
 171       *
 172       * @since 6.4.0
 173       *
 174       * @see WP_HTML_Processor::$release_internal_bookmark_on_destruct
 175       *
 176       * @var int
 177       */
 178      private $bookmark_counter = 0;
 179  
 180      /**
 181       * Stores an explanation for why something failed, if it did.
 182       *
 183       * @see self::get_last_error
 184       *
 185       * @since 6.4.0
 186       *
 187       * @var string|null
 188       */
 189      private $last_error = null;
 190  
 191      /**
 192       * Releases a bookmark when PHP garbage-collects its wrapping WP_HTML_Token instance.
 193       *
 194       * This function is created inside the class constructor so that it can be passed to
 195       * the stack of open elements and the stack of active formatting elements without
 196       * exposing it as a public method on the class.
 197       *
 198       * @since 6.4.0
 199       *
 200       * @var closure
 201       */
 202      private $release_internal_bookmark_on_destruct = null;
 203  
 204      /*
 205       * Public Interface Functions
 206       */
 207  
 208      /**
 209       * Creates an HTML processor in the fragment parsing mode.
 210       *
 211       * Use this for cases where you are processing chunks of HTML that
 212       * will be found within a bigger HTML document, such as rendered
 213       * block output that exists within a post, `the_content` inside a
 214       * rendered site layout.
 215       *
 216       * Fragment parsing occurs within a context, which is an HTML element
 217       * that the document will eventually be placed in. It becomes important
 218       * when special elements have different rules than others, such as inside
 219       * a TEXTAREA or a TITLE tag where things that look like tags are text,
 220       * or inside a SCRIPT tag where things that look like HTML syntax are JS.
 221       *
 222       * The context value should be a representation of the tag into which the
 223       * HTML is found. For most cases this will be the body element. The HTML
 224       * form is provided because a context element may have attributes that
 225       * impact the parse, such as with a SCRIPT tag and its `type` attribute.
 226       *
 227       * ## Current HTML Support
 228       *
 229       *  - The only supported context is `<body>`, which is the default value.
 230       *  - The only supported document encoding is `UTF-8`, which is the default value.
 231       *
 232       * @since 6.4.0
 233       *
 234       * @param string $html     Input HTML fragment to process.
 235       * @param string $context  Context element for the fragment, must be default of `<body>`.
 236       * @param string $encoding Text encoding of the document; must be default of 'UTF-8'.
 237       * @return WP_HTML_Processor|null The created processor if successful, otherwise null.
 238       */
 239  	public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) {
 240          if ( '<body>' !== $context || 'UTF-8' !== $encoding ) {
 241              return null;
 242          }
 243  
 244          $processor                        = new self( $html, self::CONSTRUCTOR_UNLOCK_CODE );
 245          $processor->state->context_node   = array( 'BODY', array() );
 246          $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
 247  
 248          // @todo Create "fake" bookmarks for non-existent but implied nodes.
 249          $processor->bookmarks['root-node']    = new WP_HTML_Span( 0, 0 );
 250          $processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 );
 251  
 252          $processor->state->stack_of_open_elements->push(
 253              new WP_HTML_Token(
 254                  'root-node',
 255                  'HTML',
 256                  false
 257              )
 258          );
 259  
 260          $processor->state->stack_of_open_elements->push(
 261              new WP_HTML_Token(
 262                  'context-node',
 263                  $processor->state->context_node[0],
 264                  false
 265              )
 266          );
 267  
 268          return $processor;
 269      }
 270  
 271      /**
 272       * Constructor.
 273       *
 274       * Do not use this method. Use the static creator methods instead.
 275       *
 276       * @access private
 277       *
 278       * @since 6.4.0
 279       *
 280       * @see WP_HTML_Processor::create_fragment()
 281       *
 282       * @param string      $html                                  HTML to process.
 283       * @param string|null $use_the_static_create_methods_instead This constructor should not be called manually.
 284       */
 285  	public function __construct( $html, $use_the_static_create_methods_instead = null ) {
 286          parent::__construct( $html );
 287  
 288          if ( self::CONSTRUCTOR_UNLOCK_CODE !== $use_the_static_create_methods_instead ) {
 289              _doing_it_wrong(
 290                  __METHOD__,
 291                  sprintf(
 292                      /* translators: %s: WP_HTML_Processor::create_fragment(). */
 293                      __( 'Call %s to create an HTML Processor instead of calling the constructor directly.' ),
 294                      '<code>WP_HTML_Processor::create_fragment()</code>'
 295                  ),
 296                  '6.4.0'
 297              );
 298          }
 299  
 300          $this->state = new WP_HTML_Processor_State();
 301  
 302          /*
 303           * Create this wrapper so that it's possible to pass
 304           * a private method into WP_HTML_Token classes without
 305           * exposing it to any public API.
 306           */
 307          $this->release_internal_bookmark_on_destruct = function ( $name ) {
 308              parent::release_bookmark( $name );
 309          };
 310      }
 311  
 312      /**
 313       * Returns the last error, if any.
 314       *
 315       * Various situations lead to parsing failure but this class will
 316       * return `false` in all those cases. To determine why something
 317       * failed it's possible to request the last error. This can be
 318       * helpful to know to distinguish whether a given tag couldn't
 319       * be found or if content in the document caused the processor
 320       * to give up and abort processing.
 321       *
 322       * Example
 323       *
 324       *     $processor = WP_HTML_Processor::create_fragment( '<template><strong><button><em><p><em>' );
 325       *     false === $processor->next_tag();
 326       *     WP_HTML_Processor::ERROR_UNSUPPORTED === $processor->get_last_error();
 327       *
 328       * @since 6.4.0
 329       *
 330       * @see self::ERROR_UNSUPPORTED
 331       * @see self::ERROR_EXCEEDED_MAX_BOOKMARKS
 332       *
 333       * @return string|null The last error, if one exists, otherwise null.
 334       */
 335  	public function get_last_error() {
 336          return $this->last_error;
 337      }
 338  
 339      /**
 340       * Finds the next tag matching the $query.
 341       *
 342       * @todo Support matching the class name and tag name.
 343       *
 344       * @since 6.4.0
 345       *
 346       * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
 347       *
 348       * @param array|string|null $query {
 349       *     Optional. Which tag name to find, having which class, etc. Default is to find any tag.
 350       *
 351       *     @type string|null $tag_name     Which tag to find, or `null` for "any tag."
 352       *     @type int|null    $match_offset Find the Nth tag matching all search criteria.
 353       *                                     1 for "first" tag, 3 for "third," etc.
 354       *                                     Defaults to first tag.
 355       *     @type string|null $class_name   Tag must contain this whole class name to match.
 356       *     @type string[]    $breadcrumbs  DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`.
 357       *                                     May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`.
 358       * }
 359       * @return bool Whether a tag was matched.
 360       */
 361  	public function next_tag( $query = null ) {
 362          if ( null === $query ) {
 363              while ( $this->step() ) {
 364                  if ( '#tag' !== $this->get_token_type() ) {
 365                      continue;
 366                  }
 367  
 368                  if ( ! $this->is_tag_closer() ) {
 369                      return true;
 370                  }
 371              }
 372  
 373              return false;
 374          }
 375  
 376          if ( is_string( $query ) ) {
 377              $query = array( 'breadcrumbs' => array( $query ) );
 378          }
 379  
 380          if ( ! is_array( $query ) ) {
 381              _doing_it_wrong(
 382                  __METHOD__,
 383                  __( 'Please pass a query array to this function.' ),
 384                  '6.4.0'
 385              );
 386              return false;
 387          }
 388  
 389          if ( ! ( array_key_exists( 'breadcrumbs', $query ) && is_array( $query['breadcrumbs'] ) ) ) {
 390              while ( $this->step() ) {
 391                  if ( '#tag' !== $this->get_token_type() ) {
 392                      continue;
 393                  }
 394  
 395                  if ( ! $this->is_tag_closer() ) {
 396                      return true;
 397                  }
 398              }
 399  
 400              return false;
 401          }
 402  
 403          if ( isset( $query['tag_closers'] ) && 'visit' === $query['tag_closers'] ) {
 404              _doing_it_wrong(
 405                  __METHOD__,
 406                  __( 'Cannot visit tag closers in HTML Processor.' ),
 407                  '6.4.0'
 408              );
 409              return false;
 410          }
 411  
 412          $breadcrumbs  = $query['breadcrumbs'];
 413          $match_offset = isset( $query['match_offset'] ) ? (int) $query['match_offset'] : 1;
 414  
 415          while ( $match_offset > 0 && $this->step() ) {
 416              if ( '#tag' !== $this->get_token_type() ) {
 417                  continue;
 418              }
 419  
 420              if ( $this->matches_breadcrumbs( $breadcrumbs ) && 0 === --$match_offset ) {
 421                  return true;
 422              }
 423          }
 424  
 425          return false;
 426      }
 427  
 428      /**
 429       * Ensures internal accounting is maintained for HTML semantic rules while
 430       * the underlying Tag Processor class is seeking to a bookmark.
 431       *
 432       * This doesn't currently have a way to represent non-tags and doesn't process
 433       * semantic rules for text nodes. For access to the raw tokens consider using
 434       * WP_HTML_Tag_Processor instead.
 435       *
 436       * @since 6.5.0 Added for internal support; do not use.
 437       *
 438       * @access private
 439       *
 440       * @return bool
 441       */
 442  	public function next_token() {
 443          return $this->step();
 444      }
 445  
 446      /**
 447       * Indicates if the currently-matched tag matches the given breadcrumbs.
 448       *
 449       * A "*" represents a single tag wildcard, where any tag matches, but not no tags.
 450       *
 451       * At some point this function _may_ support a `**` syntax for matching any number
 452       * of unspecified tags in the breadcrumb stack. This has been intentionally left
 453       * out, however, to keep this function simple and to avoid introducing backtracking,
 454       * which could open up surprising performance breakdowns.
 455       *
 456       * Example:
 457       *
 458       *     $processor = WP_HTML_Processor::create_fragment( '<div><span><figure><img></figure></span></div>' );
 459       *     $processor->next_tag( 'img' );
 460       *     true  === $processor->matches_breadcrumbs( array( 'figure', 'img' ) );
 461       *     true  === $processor->matches_breadcrumbs( array( 'span', 'figure', 'img' ) );
 462       *     false === $processor->matches_breadcrumbs( array( 'span', 'img' ) );
 463       *     true  === $processor->matches_breadcrumbs( array( 'span', '*', 'img' ) );
 464       *
 465       * @since 6.4.0
 466       *
 467       * @param string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`.
 468       *                              May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`.
 469       * @return bool Whether the currently-matched tag is found at the given nested structure.
 470       */
 471  	public function matches_breadcrumbs( $breadcrumbs ) {
 472          // Everything matches when there are zero constraints.
 473          if ( 0 === count( $breadcrumbs ) ) {
 474              return true;
 475          }
 476  
 477          // Start at the last crumb.
 478          $crumb = end( $breadcrumbs );
 479  
 480          if ( '*' !== $crumb && $this->get_tag() !== strtoupper( $crumb ) ) {
 481              return false;
 482          }
 483  
 484          foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) {
 485              $crumb = strtoupper( current( $breadcrumbs ) );
 486  
 487              if ( '*' !== $crumb && $node->node_name !== $crumb ) {
 488                  return false;
 489              }
 490  
 491              if ( false === prev( $breadcrumbs ) ) {
 492                  return true;
 493              }
 494          }
 495  
 496          return false;
 497      }
 498  
 499      /**
 500       * Steps through the HTML document and stop at the next tag, if any.
 501       *
 502       * @since 6.4.0
 503       *
 504       * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
 505       *
 506       * @see self::PROCESS_NEXT_NODE
 507       * @see self::REPROCESS_CURRENT_NODE
 508       *
 509       * @param string $node_to_process Whether to parse the next node or reprocess the current node.
 510       * @return bool Whether a tag was matched.
 511       */
 512  	public function step( $node_to_process = self::PROCESS_NEXT_NODE ) {
 513          // Refuse to proceed if there was a previous error.
 514          if ( null !== $this->last_error ) {
 515              return false;
 516          }
 517  
 518          if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) {
 519              /*
 520               * Void elements still hop onto the stack of open elements even though
 521               * there's no corresponding closing tag. This is important for managing
 522               * stack-based operations such as "navigate to parent node" or checking
 523               * on an element's breadcrumbs.
 524               *
 525               * When moving on to the next node, therefore, if the bottom-most element
 526               * on the stack is a void element, it must be closed.
 527               *
 528               * @todo Once self-closing foreign elements and BGSOUND are supported,
 529               *        they must also be implicitly closed here too. BGSOUND is
 530               *        special since it's only self-closing if the self-closing flag
 531               *        is provided in the opening tag, otherwise it expects a tag closer.
 532               */
 533              $top_node = $this->state->stack_of_open_elements->current_node();
 534              if (
 535                  $top_node && (
 536                      // Void elements.
 537                      self::is_void( $top_node->node_name ) ||
 538                      // Comments, text nodes, and other atomic tokens.
 539                      '#' === $top_node->node_name[0] ||
 540                      // Doctype declarations.
 541                      'html' === $top_node->node_name
 542                  )
 543              ) {
 544                  $this->state->stack_of_open_elements->pop();
 545              }
 546          }
 547  
 548          if ( self::PROCESS_NEXT_NODE === $node_to_process ) {
 549              parent::next_token();
 550          }
 551  
 552          // Finish stepping when there are no more tokens in the document.
 553          if (
 554              WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state ||
 555              WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state
 556          ) {
 557              return false;
 558          }
 559  
 560          $this->state->current_token = new WP_HTML_Token(
 561              $this->bookmark_token(),
 562              $this->get_token_name(),
 563              $this->has_self_closing_flag(),
 564              $this->release_internal_bookmark_on_destruct
 565          );
 566  
 567          try {
 568              switch ( $this->state->insertion_mode ) {
 569                  case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY:
 570                      return $this->step_in_body();
 571  
 572                  default:
 573                      $this->last_error = self::ERROR_UNSUPPORTED;
 574                      throw new WP_HTML_Unsupported_Exception( "No support for parsing in the '{$this->state->insertion_mode}' state." );
 575              }
 576          } catch ( WP_HTML_Unsupported_Exception $e ) {
 577              /*
 578               * Exceptions are used in this class to escape deep call stacks that
 579               * otherwise might involve messier calling and return conventions.
 580               */
 581              return false;
 582          }
 583      }
 584  
 585      /**
 586       * Computes the HTML breadcrumbs for the currently-matched node, if matched.
 587       *
 588       * Breadcrumbs start at the outermost parent and descend toward the matched element.
 589       * They always include the entire path from the root HTML node to the matched element.
 590       *
 591       * @todo It could be more efficient to expose a generator-based version of this function
 592       *       to avoid creating the array copy on tag iteration. If this is done, it would likely
 593       *       be more useful to walk up the stack when yielding instead of starting at the top.
 594       *
 595       * Example
 596       *
 597       *     $processor = WP_HTML_Processor::create_fragment( '<p><strong><em><img></em></strong></p>' );
 598       *     $processor->next_tag( 'IMG' );
 599       *     $processor->get_breadcrumbs() === array( 'HTML', 'BODY', 'P', 'STRONG', 'EM', 'IMG' );
 600       *
 601       * @since 6.4.0
 602       *
 603       * @return string[]|null Array of tag names representing path to matched node, if matched, otherwise NULL.
 604       */
 605  	public function get_breadcrumbs() {
 606          $breadcrumbs = array();
 607          foreach ( $this->state->stack_of_open_elements->walk_down() as $stack_item ) {
 608              $breadcrumbs[] = $stack_item->node_name;
 609          }
 610  
 611          return $breadcrumbs;
 612      }
 613  
 614      /**
 615       * Parses next element in the 'in body' insertion mode.
 616       *
 617       * This internal function performs the 'in body' insertion mode
 618       * logic for the generalized WP_HTML_Processor::step() function.
 619       *
 620       * @since 6.4.0
 621       *
 622       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
 623       *
 624       * @see https://html.spec.whatwg.org/#parsing-main-inbody
 625       * @see WP_HTML_Processor::step
 626       *
 627       * @return bool Whether an element was found.
 628       */
 629  	private function step_in_body() {
 630          $token_name = $this->get_token_name();
 631          $token_type = $this->get_token_type();
 632          $op_sigil   = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
 633          $op         = "{$op_sigil}{$token_name}";
 634  
 635          switch ( $op ) {
 636              case '#comment':
 637              case '#funky-comment':
 638              case '#presumptuous-tag':
 639                  $this->insert_html_element( $this->state->current_token );
 640                  return true;
 641  
 642              case '#text':
 643                  $this->reconstruct_active_formatting_elements();
 644  
 645                  $current_token = $this->bookmarks[ $this->state->current_token->bookmark_name ];
 646  
 647                  /*
 648                   * > A character token that is U+0000 NULL
 649                   *
 650                   * Any successive sequence of NULL bytes is ignored and won't
 651                   * trigger active format reconstruction. Therefore, if the text
 652                   * only comprises NULL bytes then the token should be ignored
 653                   * here, but if there are any other characters in the stream
 654                   * the active formats should be reconstructed.
 655                   */
 656                  if (
 657                      1 <= $current_token->length &&
 658                      "\x00" === $this->html[ $current_token->start ] &&
 659                      strspn( $this->html, "\x00", $current_token->start, $current_token->length ) === $current_token->length
 660                  ) {
 661                      // Parse error: ignore the token.
 662                      return $this->step();
 663                  }
 664  
 665                  /*
 666                   * Whitespace-only text does not affect the frameset-ok flag.
 667                   * It is probably inter-element whitespace, but it may also
 668                   * contain character references which decode only to whitespace.
 669                   */
 670                  $text = $this->get_modifiable_text();
 671                  if ( strlen( $text ) !== strspn( $text, " \t\n\f\r" ) ) {
 672                      $this->state->frameset_ok = false;
 673                  }
 674  
 675                  $this->insert_html_element( $this->state->current_token );
 676                  return true;
 677  
 678              case 'html':
 679                  /*
 680                   * > A DOCTYPE token
 681                   * > Parse error. Ignore the token.
 682                   */
 683                  return $this->step();
 684  
 685              /*
 686               * > A start tag whose tag name is "button"
 687               */
 688              case '+BUTTON':
 689                  if ( $this->state->stack_of_open_elements->has_element_in_scope( 'BUTTON' ) ) {
 690                      // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
 691                      $this->generate_implied_end_tags();
 692                      $this->state->stack_of_open_elements->pop_until( 'BUTTON' );
 693                  }
 694  
 695                  $this->reconstruct_active_formatting_elements();
 696                  $this->insert_html_element( $this->state->current_token );
 697                  $this->state->frameset_ok = false;
 698  
 699                  return true;
 700  
 701              /*
 702               * > A start tag whose tag name is one of: "address", "article", "aside",
 703               * > "blockquote", "center", "details", "dialog", "dir", "div", "dl",
 704               * > "fieldset", "figcaption", "figure", "footer", "header", "hgroup",
 705               * > "main", "menu", "nav", "ol", "p", "search", "section", "summary", "ul"
 706               */
 707              case '+ADDRESS':
 708              case '+ARTICLE':
 709              case '+ASIDE':
 710              case '+BLOCKQUOTE':
 711              case '+CENTER':
 712              case '+DETAILS':
 713              case '+DIALOG':
 714              case '+DIR':
 715              case '+DIV':
 716              case '+DL':
 717              case '+FIELDSET':
 718              case '+FIGCAPTION':
 719              case '+FIGURE':
 720              case '+FOOTER':
 721              case '+HEADER':
 722              case '+HGROUP':
 723              case '+MAIN':
 724              case '+MENU':
 725              case '+NAV':
 726              case '+OL':
 727              case '+P':
 728              case '+SEARCH':
 729              case '+SECTION':
 730              case '+SUMMARY':
 731              case '+UL':
 732                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
 733                      $this->close_a_p_element();
 734                  }
 735  
 736                  $this->insert_html_element( $this->state->current_token );
 737                  return true;
 738  
 739              /*
 740               * > An end tag whose tag name is one of: "address", "article", "aside", "blockquote",
 741               * > "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset",
 742               * > "figcaption", "figure", "footer", "header", "hgroup", "listing", "main",
 743               * > "menu", "nav", "ol", "pre", "search", "section", "summary", "ul"
 744               */
 745              case '-ADDRESS':
 746              case '-ARTICLE':
 747              case '-ASIDE':
 748              case '-BLOCKQUOTE':
 749              case '-BUTTON':
 750              case '-CENTER':
 751              case '-DETAILS':
 752              case '-DIALOG':
 753              case '-DIR':
 754              case '-DIV':
 755              case '-DL':
 756              case '-FIELDSET':
 757              case '-FIGCAPTION':
 758              case '-FIGURE':
 759              case '-FOOTER':
 760              case '-HEADER':
 761              case '-HGROUP':
 762              case '-LISTING':
 763              case '-MAIN':
 764              case '-MENU':
 765              case '-NAV':
 766              case '-OL':
 767              case '-PRE':
 768              case '-SEARCH':
 769              case '-SECTION':
 770              case '-SUMMARY':
 771              case '-UL':
 772                  if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) {
 773                      // @todo Report parse error.
 774                      // Ignore the token.
 775                      return $this->step();
 776                  }
 777  
 778                  $this->generate_implied_end_tags();
 779                  if ( $this->state->stack_of_open_elements->current_node()->node_name !== $token_name ) {
 780                      // @todo Record parse error: this error doesn't impact parsing.
 781                  }
 782                  $this->state->stack_of_open_elements->pop_until( $token_name );
 783                  return true;
 784  
 785              /*
 786               * > A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
 787               */
 788              case '+H1':
 789              case '+H2':
 790              case '+H3':
 791              case '+H4':
 792              case '+H5':
 793              case '+H6':
 794                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
 795                      $this->close_a_p_element();
 796                  }
 797  
 798                  if (
 799                      in_array(
 800                          $this->state->stack_of_open_elements->current_node()->node_name,
 801                          array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ),
 802                          true
 803                      )
 804                  ) {
 805                      // @todo Indicate a parse error once it's possible.
 806                      $this->state->stack_of_open_elements->pop();
 807                  }
 808  
 809                  $this->insert_html_element( $this->state->current_token );
 810                  return true;
 811  
 812              /*
 813               * > A start tag whose tag name is one of: "pre", "listing"
 814               */
 815              case '+PRE':
 816              case '+LISTING':
 817                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
 818                      $this->close_a_p_element();
 819                  }
 820                  $this->insert_html_element( $this->state->current_token );
 821                  $this->state->frameset_ok = false;
 822                  return true;
 823  
 824              /*
 825               * > An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
 826               */
 827              case '-H1':
 828              case '-H2':
 829              case '-H3':
 830              case '-H4':
 831              case '-H5':
 832              case '-H6':
 833                  if ( ! $this->state->stack_of_open_elements->has_element_in_scope( '(internal: H1 through H6 - do not use)' ) ) {
 834                      /*
 835                       * This is a parse error; ignore the token.
 836                       *
 837                       * @todo Indicate a parse error once it's possible.
 838                       */
 839                      return $this->step();
 840                  }
 841  
 842                  $this->generate_implied_end_tags();
 843  
 844                  if ( $this->state->stack_of_open_elements->current_node()->node_name !== $token_name ) {
 845                      // @todo Record parse error: this error doesn't impact parsing.
 846                  }
 847  
 848                  $this->state->stack_of_open_elements->pop_until( '(internal: H1 through H6 - do not use)' );
 849                  return true;
 850  
 851              /*
 852               * > A start tag whose tag name is "li"
 853               * > A start tag whose tag name is one of: "dd", "dt"
 854               */
 855              case '+DD':
 856              case '+DT':
 857              case '+LI':
 858                  $this->state->frameset_ok = false;
 859                  $node                     = $this->state->stack_of_open_elements->current_node();
 860                  $is_li                    = 'LI' === $token_name;
 861  
 862                  in_body_list_loop:
 863                  /*
 864                   * The logic for LI and DT/DD is the same except for one point: LI elements _only_
 865                   * close other LI elements, but a DT or DD element closes _any_ open DT or DD element.
 866                   */
 867                  if ( $is_li ? 'LI' === $node->node_name : ( 'DD' === $node->node_name || 'DT' === $node->node_name ) ) {
 868                      $node_name = $is_li ? 'LI' : $node->node_name;
 869                      $this->generate_implied_end_tags( $node_name );
 870                      if ( $node_name !== $this->state->stack_of_open_elements->current_node()->node_name ) {
 871                          // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
 872                      }
 873  
 874                      $this->state->stack_of_open_elements->pop_until( $node_name );
 875                      goto in_body_list_done;
 876                  }
 877  
 878                  if (
 879                      'ADDRESS' !== $node->node_name &&
 880                      'DIV' !== $node->node_name &&
 881                      'P' !== $node->node_name &&
 882                      $this->is_special( $node->node_name )
 883                  ) {
 884                      /*
 885                       * > If node is in the special category, but is not an address, div,
 886                       * > or p element, then jump to the step labeled done below.
 887                       */
 888                      goto in_body_list_done;
 889                  } else {
 890                      /*
 891                       * > Otherwise, set node to the previous entry in the stack of open elements
 892                       * > and return to the step labeled loop.
 893                       */
 894                      foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) {
 895                          $node = $item;
 896                          break;
 897                      }
 898                      goto in_body_list_loop;
 899                  }
 900  
 901                  in_body_list_done:
 902                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
 903                      $this->close_a_p_element();
 904                  }
 905  
 906                  $this->insert_html_element( $this->state->current_token );
 907                  return true;
 908  
 909              /*
 910               * > An end tag whose tag name is "li"
 911               * > An end tag whose tag name is one of: "dd", "dt"
 912               */
 913              case '-DD':
 914              case '-DT':
 915              case '-LI':
 916                  if (
 917                      /*
 918                       * An end tag whose tag name is "li":
 919                       * If the stack of open elements does not have an li element in list item scope,
 920                       * then this is a parse error; ignore the token.
 921                       */
 922                      (
 923                          'LI' === $token_name &&
 924                          ! $this->state->stack_of_open_elements->has_element_in_list_item_scope( 'LI' )
 925                      ) ||
 926                      /*
 927                       * An end tag whose tag name is one of: "dd", "dt":
 928                       * If the stack of open elements does not have an element in scope that is an
 929                       * HTML element with the same tag name as that of the token, then this is a
 930                       * parse error; ignore the token.
 931                       */
 932                      (
 933                          'LI' !== $token_name &&
 934                          ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name )
 935                      )
 936                  ) {
 937                      /*
 938                       * This is a parse error, ignore the token.
 939                       *
 940                       * @todo Indicate a parse error once it's possible.
 941                       */
 942                      return $this->step();
 943                  }
 944  
 945                  $this->generate_implied_end_tags( $token_name );
 946  
 947                  if ( $token_name !== $this->state->stack_of_open_elements->current_node()->node_name ) {
 948                      // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
 949                  }
 950  
 951                  $this->state->stack_of_open_elements->pop_until( $token_name );
 952                  return true;
 953  
 954              /*
 955               * > An end tag whose tag name is "p"
 956               */
 957              case '-P':
 958                  if ( ! $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
 959                      $this->insert_html_element( $this->state->current_token );
 960                  }
 961  
 962                  $this->close_a_p_element();
 963                  return true;
 964  
 965              // > A start tag whose tag name is "a"
 966              case '+A':
 967                  foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
 968                      switch ( $item->node_name ) {
 969                          case 'marker':
 970                              break;
 971  
 972                          case 'A':
 973                              $this->run_adoption_agency_algorithm();
 974                              $this->state->active_formatting_elements->remove_node( $item );
 975                              $this->state->stack_of_open_elements->remove_node( $item );
 976                              break;
 977                      }
 978                  }
 979  
 980                  $this->reconstruct_active_formatting_elements();
 981                  $this->insert_html_element( $this->state->current_token );
 982                  $this->state->active_formatting_elements->push( $this->state->current_token );
 983                  return true;
 984  
 985              /*
 986               * > A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i",
 987               * > "s", "small", "strike", "strong", "tt", "u"
 988               */
 989              case '+B':
 990              case '+BIG':
 991              case '+CODE':
 992              case '+EM':
 993              case '+FONT':
 994              case '+I':
 995              case '+S':
 996              case '+SMALL':
 997              case '+STRIKE':
 998              case '+STRONG':
 999              case '+TT':
1000              case '+U':
1001                  $this->reconstruct_active_formatting_elements();
1002                  $this->insert_html_element( $this->state->current_token );
1003                  $this->state->active_formatting_elements->push( $this->state->current_token );
1004                  return true;
1005  
1006              /*
1007               * > An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i",
1008               * > "nobr", "s", "small", "strike", "strong", "tt", "u"
1009               */
1010              case '-A':
1011              case '-B':
1012              case '-BIG':
1013              case '-CODE':
1014              case '-EM':
1015              case '-FONT':
1016              case '-I':
1017              case '-S':
1018              case '-SMALL':
1019              case '-STRIKE':
1020              case '-STRONG':
1021              case '-TT':
1022              case '-U':
1023                  $this->run_adoption_agency_algorithm();
1024                  return true;
1025  
1026              /*
1027               * > An end tag whose tag name is "br"
1028               * >   Parse error. Drop the attributes from the token, and act as described in the next
1029               * >   entry; i.e. act as if this was a "br" start tag token with no attributes, rather
1030               * >   than the end tag token that it actually is.
1031               */
1032              case '-BR':
1033                  $this->last_error = self::ERROR_UNSUPPORTED;
1034                  throw new WP_HTML_Unsupported_Exception( 'Closing BR tags require unimplemented special handling.' );
1035  
1036              /*
1037               * > A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr"
1038               */
1039              case '+AREA':
1040              case '+BR':
1041              case '+EMBED':
1042              case '+IMG':
1043              case '+KEYGEN':
1044              case '+WBR':
1045                  $this->reconstruct_active_formatting_elements();
1046                  $this->insert_html_element( $this->state->current_token );
1047                  $this->state->frameset_ok = false;
1048                  return true;
1049  
1050              /*
1051               * > A start tag whose tag name is "input"
1052               */
1053              case '+INPUT':
1054                  $this->reconstruct_active_formatting_elements();
1055                  $this->insert_html_element( $this->state->current_token );
1056                  $type_attribute = $this->get_attribute( 'type' );
1057                  /*
1058                   * > If the token does not have an attribute with the name "type", or if it does,
1059                   * > but that attribute's value is not an ASCII case-insensitive match for the
1060                   * > string "hidden", then: set the frameset-ok flag to "not ok".
1061                   */
1062                  if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) {
1063                      $this->state->frameset_ok = false;
1064                  }
1065                  return true;
1066  
1067              /*
1068               * > A start tag whose tag name is "hr"
1069               */
1070              case '+HR':
1071                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
1072                      $this->close_a_p_element();
1073                  }
1074                  $this->insert_html_element( $this->state->current_token );
1075                  $this->state->frameset_ok = false;
1076                  return true;
1077  
1078              /*
1079               * > A start tag whose tag name is one of: "param", "source", "track"
1080               */
1081              case '+PARAM':
1082              case '+SOURCE':
1083              case '+TRACK':
1084                  $this->insert_html_element( $this->state->current_token );
1085                  return true;
1086          }
1087  
1088          /*
1089           * These tags require special handling in the 'in body' insertion mode
1090           * but that handling hasn't yet been implemented.
1091           *
1092           * As the rules for each tag are implemented, the corresponding tag
1093           * name should be removed from this list. An accompanying test should
1094           * help ensure this list is maintained.
1095           *
1096           * @see Tests_HtmlApi_WpHtmlProcessor::test_step_in_body_fails_on_unsupported_tags
1097           *
1098           * Since this switch structure throws a WP_HTML_Unsupported_Exception, it's
1099           * possible to handle "any other start tag" and "any other end tag" below,
1100           * as that guarantees execution doesn't proceed for the unimplemented tags.
1101           *
1102           * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody
1103           */
1104          switch ( $token_name ) {
1105              case 'APPLET':
1106              case 'BASE':
1107              case 'BASEFONT':
1108              case 'BGSOUND':
1109              case 'BODY':
1110              case 'CAPTION':
1111              case 'COL':
1112              case 'COLGROUP':
1113              case 'FORM':
1114              case 'FRAME':
1115              case 'FRAMESET':
1116              case 'HEAD':
1117              case 'HTML':
1118              case 'IFRAME':
1119              case 'LINK':
1120              case 'MARQUEE':
1121              case 'MATH':
1122              case 'META':
1123              case 'NOBR':
1124              case 'NOEMBED':
1125              case 'NOFRAMES':
1126              case 'NOSCRIPT':
1127              case 'OBJECT':
1128              case 'OPTGROUP':
1129              case 'OPTION':
1130              case 'PLAINTEXT':
1131              case 'RB':
1132              case 'RP':
1133              case 'RT':
1134              case 'RTC':
1135              case 'SARCASM':
1136              case 'SCRIPT':
1137              case 'SELECT':
1138              case 'STYLE':
1139              case 'SVG':
1140              case 'TABLE':
1141              case 'TBODY':
1142              case 'TD':
1143              case 'TEMPLATE':
1144              case 'TEXTAREA':
1145              case 'TFOOT':
1146              case 'TH':
1147              case 'THEAD':
1148              case 'TITLE':
1149              case 'TR':
1150              case 'XMP':
1151                  $this->last_error = self::ERROR_UNSUPPORTED;
1152                  throw new WP_HTML_Unsupported_Exception( "Cannot process {$token_name} element." );
1153          }
1154  
1155          if ( ! $this->is_tag_closer() ) {
1156              /*
1157               * > Any other start tag
1158               */
1159              $this->reconstruct_active_formatting_elements();
1160              $this->insert_html_element( $this->state->current_token );
1161              return true;
1162          } else {
1163              /*
1164               * > Any other end tag
1165               */
1166  
1167              /*
1168               * Find the corresponding tag opener in the stack of open elements, if
1169               * it exists before reaching a special element, which provides a kind
1170               * of boundary in the stack. For example, a `</custom-tag>` should not
1171               * close anything beyond its containing `P` or `DIV` element.
1172               */
1173              foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) {
1174                  if ( $token_name === $node->node_name ) {
1175                      break;
1176                  }
1177  
1178                  if ( self::is_special( $node->node_name ) ) {
1179                      // This is a parse error, ignore the token.
1180                      return $this->step();
1181                  }
1182              }
1183  
1184              $this->generate_implied_end_tags( $token_name );
1185              if ( $node !== $this->state->stack_of_open_elements->current_node() ) {
1186                  // @todo Record parse error: this error doesn't impact parsing.
1187              }
1188  
1189              foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
1190                  $this->state->stack_of_open_elements->pop();
1191                  if ( $node === $item ) {
1192                      return true;
1193                  }
1194              }
1195          }
1196      }
1197  
1198      /*
1199       * Internal helpers
1200       */
1201  
1202      /**
1203       * Creates a new bookmark for the currently-matched token and returns the generated name.
1204       *
1205       * @since 6.4.0
1206       * @since 6.5.0 Renamed from bookmark_tag() to bookmark_token().
1207       *
1208       * @throws Exception When unable to allocate requested bookmark.
1209       *
1210       * @return string|false Name of created bookmark, or false if unable to create.
1211       */
1212  	private function bookmark_token() {
1213          if ( ! parent::set_bookmark( ++$this->bookmark_counter ) ) {
1214              $this->last_error = self::ERROR_EXCEEDED_MAX_BOOKMARKS;
1215              throw new Exception( 'could not allocate bookmark' );
1216          }
1217  
1218          return "{$this->bookmark_counter}";
1219      }
1220  
1221      /*
1222       * HTML semantic overrides for Tag Processor
1223       */
1224  
1225      /**
1226       * Returns the uppercase name of the matched tag.
1227       *
1228       * The semantic rules for HTML specify that certain tags be reprocessed
1229       * with a different tag name. Because of this, the tag name presented
1230       * by the HTML Processor may differ from the one reported by the HTML
1231       * Tag Processor, which doesn't apply these semantic rules.
1232       *
1233       * Example:
1234       *
1235       *     $processor = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' );
1236       *     $processor->next_tag() === true;
1237       *     $processor->get_tag() === 'DIV';
1238       *
1239       *     $processor->next_tag() === false;
1240       *     $processor->get_tag() === null;
1241       *
1242       * @since 6.4.0
1243       *
1244       * @return string|null Name of currently matched tag in input HTML, or `null` if none found.
1245       */
1246  	public function get_tag() {
1247          if ( null !== $this->last_error ) {
1248              return null;
1249          }
1250  
1251          $tag_name = parent::get_tag();
1252  
1253          switch ( $tag_name ) {
1254              case 'IMAGE':
1255                  /*
1256                   * > A start tag whose tag name is "image"
1257                   * > Change the token's tag name to "img" and reprocess it. (Don't ask.)
1258                   */
1259                  return 'IMG';
1260  
1261              default:
1262                  return $tag_name;
1263          }
1264      }
1265  
1266      /**
1267       * Removes a bookmark that is no longer needed.
1268       *
1269       * Releasing a bookmark frees up the small
1270       * performance overhead it requires.
1271       *
1272       * @since 6.4.0
1273       *
1274       * @param string $bookmark_name Name of the bookmark to remove.
1275       * @return bool Whether the bookmark already existed before removal.
1276       */
1277  	public function release_bookmark( $bookmark_name ) {
1278          return parent::release_bookmark( "_{$bookmark_name}" );
1279      }
1280  
1281      /**
1282       * Moves the internal cursor in the HTML Processor to a given bookmark's location.
1283       *
1284       * Be careful! Seeking backwards to a previous location resets the parser to the
1285       * start of the document and reparses the entire contents up until it finds the
1286       * sought-after bookmarked location.
1287       *
1288       * In order to prevent accidental infinite loops, there's a
1289       * maximum limit on the number of times seek() can be called.
1290       *
1291       * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
1292       *
1293       * @since 6.4.0
1294       *
1295       * @param string $bookmark_name Jump to the place in the document identified by this bookmark name.
1296       * @return bool Whether the internal cursor was successfully moved to the bookmark's location.
1297       */
1298  	public function seek( $bookmark_name ) {
1299          // Flush any pending updates to the document before beginning.
1300          $this->get_updated_html();
1301  
1302          $actual_bookmark_name = "_{$bookmark_name}";
1303          $processor_started_at = $this->state->current_token
1304              ? $this->bookmarks[ $this->state->current_token->bookmark_name ]->start
1305              : 0;
1306          $bookmark_starts_at   = $this->bookmarks[ $actual_bookmark_name ]->start;
1307          $direction            = $bookmark_starts_at > $processor_started_at ? 'forward' : 'backward';
1308  
1309          /*
1310           * If seeking backwards, it's possible that the sought-after bookmark exists within an element
1311           * which has been closed before the current cursor; in other words, it has already been removed
1312           * from the stack of open elements. This means that it's insufficient to simply pop off elements
1313           * from the stack of open elements which appear after the bookmarked location and then jump to
1314           * that location, as the elements which were open before won't be re-opened.
1315           *
1316           * In order to maintain consistency, the HTML Processor rewinds to the start of the document
1317           * and reparses everything until it finds the sought-after bookmark.
1318           *
1319           * There are potentially better ways to do this: cache the parser state for each bookmark and
1320           * restore it when seeking; store an immutable and idempotent register of where elements open
1321           * and close.
1322           *
1323           * If caching the parser state it will be essential to properly maintain the cached stack of
1324           * open elements and active formatting elements when modifying the document. This could be a
1325           * tedious and time-consuming process as well, and so for now will not be performed.
1326           *
1327           * It may be possible to track bookmarks for where elements open and close, and in doing so
1328           * be able to quickly recalculate breadcrumbs for any element in the document. It may even
1329           * be possible to remove the stack of open elements and compute it on the fly this way.
1330           * If doing this, the parser would need to track the opening and closing locations for all
1331           * tokens in the breadcrumb path for any and all bookmarks. By utilizing bookmarks themselves
1332           * this list could be automatically maintained while modifying the document. Finding the
1333           * breadcrumbs would then amount to traversing that list from the start until the token
1334           * being inspected. Once an element closes, if there are no bookmarks pointing to locations
1335           * within that element, then all of these locations may be forgotten to save on memory use
1336           * and computation time.
1337           */
1338          if ( 'backward' === $direction ) {
1339              /*
1340               * Instead of clearing the parser state and starting fresh, calling the stack methods
1341               * maintains the proper flags in the parser.
1342               */
1343              foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
1344                  if ( 'context-node' === $item->bookmark_name ) {
1345                      break;
1346                  }
1347  
1348                  $this->state->stack_of_open_elements->remove_node( $item );
1349              }
1350  
1351              foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
1352                  if ( 'context-node' === $item->bookmark_name ) {
1353                      break;
1354                  }
1355  
1356                  $this->state->active_formatting_elements->remove_node( $item );
1357              }
1358  
1359              parent::seek( 'context-node' );
1360              $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
1361              $this->state->frameset_ok    = true;
1362          }
1363  
1364          // When moving forwards, reparse the document until reaching the same location as the original bookmark.
1365          if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
1366              return true;
1367          }
1368  
1369          while ( $this->step() ) {
1370              if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
1371                  return true;
1372              }
1373          }
1374  
1375          return false;
1376      }
1377  
1378      /**
1379       * Sets a bookmark in the HTML document.
1380       *
1381       * Bookmarks represent specific places or tokens in the HTML
1382       * document, such as a tag opener or closer. When applying
1383       * edits to a document, such as setting an attribute, the
1384       * text offsets of that token may shift; the bookmark is
1385       * kept updated with those shifts and remains stable unless
1386       * the entire span of text in which the token sits is removed.
1387       *
1388       * Release bookmarks when they are no longer needed.
1389       *
1390       * Example:
1391       *
1392       *     <main><h2>Surprising fact you may not know!</h2></main>
1393       *           ^  ^
1394       *            \-|-- this `H2` opener bookmark tracks the token
1395       *
1396       *     <main class="clickbait"><h2>Surprising fact you may no…
1397       *                             ^  ^
1398       *                              \-|-- it shifts with edits
1399       *
1400       * Bookmarks provide the ability to seek to a previously-scanned
1401       * place in the HTML document. This avoids the need to re-scan
1402       * the entire document.
1403       *
1404       * Example:
1405       *
1406       *     <ul><li>One</li><li>Two</li><li>Three</li></ul>
1407       *                                 ^^^^
1408       *                                 want to note this last item
1409       *
1410       *     $p = new WP_HTML_Tag_Processor( $html );
1411       *     $in_list = false;
1412       *     while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) {
1413       *         if ( 'UL' === $p->get_tag() ) {
1414       *             if ( $p->is_tag_closer() ) {
1415       *                 $in_list = false;
1416       *                 $p->set_bookmark( 'resume' );
1417       *                 if ( $p->seek( 'last-li' ) ) {
1418       *                     $p->add_class( 'last-li' );
1419       *                 }
1420       *                 $p->seek( 'resume' );
1421       *                 $p->release_bookmark( 'last-li' );
1422       *                 $p->release_bookmark( 'resume' );
1423       *             } else {
1424       *                 $in_list = true;
1425       *             }
1426       *         }
1427       *
1428       *         if ( 'LI' === $p->get_tag() ) {
1429       *             $p->set_bookmark( 'last-li' );
1430       *         }
1431       *     }
1432       *
1433       * Bookmarks intentionally hide the internal string offsets
1434       * to which they refer. They are maintained internally as
1435       * updates are applied to the HTML document and therefore
1436       * retain their "position" - the location to which they
1437       * originally pointed. The inability to use bookmarks with
1438       * functions like `substr` is therefore intentional to guard
1439       * against accidentally breaking the HTML.
1440       *
1441       * Because bookmarks allocate memory and require processing
1442       * for every applied update, they are limited and require
1443       * a name. They should not be created with programmatically-made
1444       * names, such as "li_{$index}" with some loop. As a general
1445       * rule they should only be created with string-literal names
1446       * like "start-of-section" or "last-paragraph".
1447       *
1448       * Bookmarks are a powerful tool to enable complicated behavior.
1449       * Consider double-checking that you need this tool if you are
1450       * reaching for it, as inappropriate use could lead to broken
1451       * HTML structure or unwanted processing overhead.
1452       *
1453       * @since 6.4.0
1454       *
1455       * @param string $bookmark_name Identifies this particular bookmark.
1456       * @return bool Whether the bookmark was successfully created.
1457       */
1458  	public function set_bookmark( $bookmark_name ) {
1459          return parent::set_bookmark( "_{$bookmark_name}" );
1460      }
1461  
1462      /**
1463       * Checks whether a bookmark with the given name exists.
1464       *
1465       * @since 6.5.0
1466       *
1467       * @param string $bookmark_name Name to identify a bookmark that potentially exists.
1468       * @return bool Whether that bookmark exists.
1469       */
1470  	public function has_bookmark( $bookmark_name ) {
1471          return parent::has_bookmark( "_{$bookmark_name}" );
1472      }
1473  
1474      /*
1475       * HTML Parsing Algorithms
1476       */
1477  
1478      /**
1479       * Closes a P element.
1480       *
1481       * @since 6.4.0
1482       *
1483       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1484       *
1485       * @see https://html.spec.whatwg.org/#close-a-p-element
1486       */
1487  	private function close_a_p_element() {
1488          $this->generate_implied_end_tags( 'P' );
1489          $this->state->stack_of_open_elements->pop_until( 'P' );
1490      }
1491  
1492      /**
1493       * Closes elements that have implied end tags.
1494       *
1495       * @since 6.4.0
1496       *
1497       * @see https://html.spec.whatwg.org/#generate-implied-end-tags
1498       *
1499       * @param string|null $except_for_this_element Perform as if this element doesn't exist in the stack of open elements.
1500       */
1501  	private function generate_implied_end_tags( $except_for_this_element = null ) {
1502          $elements_with_implied_end_tags = array(
1503              'DD',
1504              'DT',
1505              'LI',
1506              'P',
1507          );
1508  
1509          $current_node = $this->state->stack_of_open_elements->current_node();
1510          while (
1511              $current_node && $current_node->node_name !== $except_for_this_element &&
1512              in_array( $this->state->stack_of_open_elements->current_node(), $elements_with_implied_end_tags, true )
1513          ) {
1514              $this->state->stack_of_open_elements->pop();
1515          }
1516      }
1517  
1518      /**
1519       * Closes elements that have implied end tags, thoroughly.
1520       *
1521       * See the HTML specification for an explanation why this is
1522       * different from generating end tags in the normal sense.
1523       *
1524       * @since 6.4.0
1525       *
1526       * @see WP_HTML_Processor::generate_implied_end_tags
1527       * @see https://html.spec.whatwg.org/#generate-implied-end-tags
1528       */
1529  	private function generate_implied_end_tags_thoroughly() {
1530          $elements_with_implied_end_tags = array(
1531              'DD',
1532              'DT',
1533              'LI',
1534              'P',
1535          );
1536  
1537          while ( in_array( $this->state->stack_of_open_elements->current_node(), $elements_with_implied_end_tags, true ) ) {
1538              $this->state->stack_of_open_elements->pop();
1539          }
1540      }
1541  
1542      /**
1543       * Reconstructs the active formatting elements.
1544       *
1545       * > This has the effect of reopening all the formatting elements that were opened
1546       * > in the current body, cell, or caption (whichever is youngest) that haven't
1547       * > been explicitly closed.
1548       *
1549       * @since 6.4.0
1550       *
1551       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1552       *
1553       * @see https://html.spec.whatwg.org/#reconstruct-the-active-formatting-elements
1554       *
1555       * @return bool Whether any formatting elements needed to be reconstructed.
1556       */
1557  	private function reconstruct_active_formatting_elements() {
1558          /*
1559           * > If there are no entries in the list of active formatting elements, then there is nothing
1560           * > to reconstruct; stop this algorithm.
1561           */
1562          if ( 0 === $this->state->active_formatting_elements->count() ) {
1563              return false;
1564          }
1565  
1566          $last_entry = $this->state->active_formatting_elements->current_node();
1567          if (
1568  
1569              /*
1570               * > If the last (most recently added) entry in the list of active formatting elements is a marker;
1571               * > stop this algorithm.
1572               */
1573              'marker' === $last_entry->node_name ||
1574  
1575              /*
1576               * > If the last (most recently added) entry in the list of active formatting elements is an
1577               * > element that is in the stack of open elements, then there is nothing to reconstruct;
1578               * > stop this algorithm.
1579               */
1580              $this->state->stack_of_open_elements->contains_node( $last_entry )
1581          ) {
1582              return false;
1583          }
1584  
1585          $this->last_error = self::ERROR_UNSUPPORTED;
1586          throw new WP_HTML_Unsupported_Exception( 'Cannot reconstruct active formatting elements when advancing and rewinding is required.' );
1587      }
1588  
1589      /**
1590       * Runs the adoption agency algorithm.
1591       *
1592       * @since 6.4.0
1593       *
1594       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1595       *
1596       * @see https://html.spec.whatwg.org/#adoption-agency-algorithm
1597       */
1598  	private function run_adoption_agency_algorithm() {
1599          $budget       = 1000;
1600          $subject      = $this->get_tag();
1601          $current_node = $this->state->stack_of_open_elements->current_node();
1602  
1603          if (
1604              // > If the current node is an HTML element whose tag name is subject
1605              $current_node && $subject === $current_node->node_name &&
1606              // > the current node is not in the list of active formatting elements
1607              ! $this->state->active_formatting_elements->contains_node( $current_node )
1608          ) {
1609              $this->state->stack_of_open_elements->pop();
1610              return;
1611          }
1612  
1613          $outer_loop_counter = 0;
1614          while ( $budget-- > 0 ) {
1615              if ( $outer_loop_counter++ >= 8 ) {
1616                  return;
1617              }
1618  
1619              /*
1620               * > Let formatting element be the last element in the list of active formatting elements that:
1621               * >   - is between the end of the list and the last marker in the list,
1622               * >     if any, or the start of the list otherwise,
1623               * >   - and has the tag name subject.
1624               */
1625              $formatting_element = null;
1626              foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
1627                  if ( 'marker' === $item->node_name ) {
1628                      break;
1629                  }
1630  
1631                  if ( $subject === $item->node_name ) {
1632                      $formatting_element = $item;
1633                      break;
1634                  }
1635              }
1636  
1637              // > If there is no such element, then return and instead act as described in the "any other end tag" entry above.
1638              if ( null === $formatting_element ) {
1639                  $this->last_error = self::ERROR_UNSUPPORTED;
1640                  throw new WP_HTML_Unsupported_Exception( 'Cannot run adoption agency when "any other end tag" is required.' );
1641              }
1642  
1643              // > If formatting element is not in the stack of open elements, then this is a parse error; remove the element from the list, and return.
1644              if ( ! $this->state->stack_of_open_elements->contains_node( $formatting_element ) ) {
1645                  $this->state->active_formatting_elements->remove_node( $formatting_element );
1646                  return;
1647              }
1648  
1649              // > If formatting element is in the stack of open elements, but the element is not in scope, then this is a parse error; return.
1650              if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $formatting_element->node_name ) ) {
1651                  return;
1652              }
1653  
1654              /*
1655               * > Let furthest block be the topmost node in the stack of open elements that is lower in the stack
1656               * > than formatting element, and is an element in the special category. There might not be one.
1657               */
1658              $is_above_formatting_element = true;
1659              $furthest_block              = null;
1660              foreach ( $this->state->stack_of_open_elements->walk_down() as $item ) {
1661                  if ( $is_above_formatting_element && $formatting_element->bookmark_name !== $item->bookmark_name ) {
1662                      continue;
1663                  }
1664  
1665                  if ( $is_above_formatting_element ) {
1666                      $is_above_formatting_element = false;
1667                      continue;
1668                  }
1669  
1670                  if ( self::is_special( $item->node_name ) ) {
1671                      $furthest_block = $item;
1672                      break;
1673                  }
1674              }
1675  
1676              /*
1677               * > If there is no furthest block, then the UA must first pop all the nodes from the bottom of the
1678               * > stack of open elements, from the current node up to and including formatting element, then
1679               * > remove formatting element from the list of active formatting elements, and finally return.
1680               */
1681              if ( null === $furthest_block ) {
1682                  foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
1683                      $this->state->stack_of_open_elements->pop();
1684  
1685                      if ( $formatting_element->bookmark_name === $item->bookmark_name ) {
1686                          $this->state->active_formatting_elements->remove_node( $formatting_element );
1687                          return;
1688                      }
1689                  }
1690              }
1691  
1692              $this->last_error = self::ERROR_UNSUPPORTED;
1693              throw new WP_HTML_Unsupported_Exception( 'Cannot extract common ancestor in adoption agency algorithm.' );
1694          }
1695  
1696          $this->last_error = self::ERROR_UNSUPPORTED;
1697          throw new WP_HTML_Unsupported_Exception( 'Cannot run adoption agency when looping required.' );
1698      }
1699  
1700      /**
1701       * Inserts an HTML element on the stack of open elements.
1702       *
1703       * @since 6.4.0
1704       *
1705       * @see https://html.spec.whatwg.org/#insert-a-foreign-element
1706       *
1707       * @param WP_HTML_Token $token Name of bookmark pointing to element in original input HTML.
1708       */
1709  	private function insert_html_element( $token ) {
1710          $this->state->stack_of_open_elements->push( $token );
1711      }
1712  
1713      /*
1714       * HTML Specification Helpers
1715       */
1716  
1717      /**
1718       * Returns whether an element of a given name is in the HTML special category.
1719       *
1720       * @since 6.4.0
1721       *
1722       * @see https://html.spec.whatwg.org/#special
1723       *
1724       * @param string $tag_name Name of element to check.
1725       * @return bool Whether the element of the given name is in the special category.
1726       */
1727  	public static function is_special( $tag_name ) {
1728          $tag_name = strtoupper( $tag_name );
1729  
1730          return (
1731              'ADDRESS' === $tag_name ||
1732              'APPLET' === $tag_name ||
1733              'AREA' === $tag_name ||
1734              'ARTICLE' === $tag_name ||
1735              'ASIDE' === $tag_name ||
1736              'BASE' === $tag_name ||
1737              'BASEFONT' === $tag_name ||
1738              'BGSOUND' === $tag_name ||
1739              'BLOCKQUOTE' === $tag_name ||
1740              'BODY' === $tag_name ||
1741              'BR' === $tag_name ||
1742              'BUTTON' === $tag_name ||
1743              'CAPTION' === $tag_name ||
1744              'CENTER' === $tag_name ||
1745              'COL' === $tag_name ||
1746              'COLGROUP' === $tag_name ||
1747              'DD' === $tag_name ||
1748              'DETAILS' === $tag_name ||
1749              'DIR' === $tag_name ||
1750              'DIV' === $tag_name ||
1751              'DL' === $tag_name ||
1752              'DT' === $tag_name ||
1753              'EMBED' === $tag_name ||
1754              'FIELDSET' === $tag_name ||
1755              'FIGCAPTION' === $tag_name ||
1756              'FIGURE' === $tag_name ||
1757              'FOOTER' === $tag_name ||
1758              'FORM' === $tag_name ||
1759              'FRAME' === $tag_name ||
1760              'FRAMESET' === $tag_name ||
1761              'H1' === $tag_name ||
1762              'H2' === $tag_name ||
1763              'H3' === $tag_name ||
1764              'H4' === $tag_name ||
1765              'H5' === $tag_name ||
1766              'H6' === $tag_name ||
1767              'HEAD' === $tag_name ||
1768              'HEADER' === $tag_name ||
1769              'HGROUP' === $tag_name ||
1770              'HR' === $tag_name ||
1771              'HTML' === $tag_name ||
1772              'IFRAME' === $tag_name ||
1773              'IMG' === $tag_name ||
1774              'INPUT' === $tag_name ||
1775              'KEYGEN' === $tag_name ||
1776              'LI' === $tag_name ||
1777              'LINK' === $tag_name ||
1778              'LISTING' === $tag_name ||
1779              'MAIN' === $tag_name ||
1780              'MARQUEE' === $tag_name ||
1781              'MENU' === $tag_name ||
1782              'META' === $tag_name ||
1783              'NAV' === $tag_name ||
1784              'NOEMBED' === $tag_name ||
1785              'NOFRAMES' === $tag_name ||
1786              'NOSCRIPT' === $tag_name ||
1787              'OBJECT' === $tag_name ||
1788              'OL' === $tag_name ||
1789              'P' === $tag_name ||
1790              'PARAM' === $tag_name ||
1791              'PLAINTEXT' === $tag_name ||
1792              'PRE' === $tag_name ||
1793              'SCRIPT' === $tag_name ||
1794              'SEARCH' === $tag_name ||
1795              'SECTION' === $tag_name ||
1796              'SELECT' === $tag_name ||
1797              'SOURCE' === $tag_name ||
1798              'STYLE' === $tag_name ||
1799              'SUMMARY' === $tag_name ||
1800              'TABLE' === $tag_name ||
1801              'TBODY' === $tag_name ||
1802              'TD' === $tag_name ||
1803              'TEMPLATE' === $tag_name ||
1804              'TEXTAREA' === $tag_name ||
1805              'TFOOT' === $tag_name ||
1806              'TH' === $tag_name ||
1807              'THEAD' === $tag_name ||
1808              'TITLE' === $tag_name ||
1809              'TR' === $tag_name ||
1810              'TRACK' === $tag_name ||
1811              'UL' === $tag_name ||
1812              'WBR' === $tag_name ||
1813              'XMP' === $tag_name ||
1814  
1815              // MathML.
1816              'MI' === $tag_name ||
1817              'MO' === $tag_name ||
1818              'MN' === $tag_name ||
1819              'MS' === $tag_name ||
1820              'MTEXT' === $tag_name ||
1821              'ANNOTATION-XML' === $tag_name ||
1822  
1823              // SVG.
1824              'FOREIGNOBJECT' === $tag_name ||
1825              'DESC' === $tag_name ||
1826              'TITLE' === $tag_name
1827          );
1828      }
1829  
1830      /**
1831       * Returns whether a given element is an HTML Void Element
1832       *
1833       * > area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr
1834       *
1835       * @since 6.4.0
1836       *
1837       * @see https://html.spec.whatwg.org/#void-elements
1838       *
1839       * @param string $tag_name Name of HTML tag to check.
1840       * @return bool Whether the given tag is an HTML Void Element.
1841       */
1842  	public static function is_void( $tag_name ) {
1843          $tag_name = strtoupper( $tag_name );
1844  
1845          return (
1846              'AREA' === $tag_name ||
1847              'BASE' === $tag_name ||
1848              'BASEFONT' === $tag_name || // Obsolete but still treated as void.
1849              'BGSOUND' === $tag_name || // Obsolete but still treated as void.
1850              'BR' === $tag_name ||
1851              'COL' === $tag_name ||
1852              'EMBED' === $tag_name ||
1853              'FRAME' === $tag_name ||
1854              'HR' === $tag_name ||
1855              'IMG' === $tag_name ||
1856              'INPUT' === $tag_name ||
1857              'KEYGEN' === $tag_name || // Obsolete but still treated as void.
1858              'LINK' === $tag_name ||
1859              'META' === $tag_name ||
1860              'PARAM' === $tag_name || // Obsolete but still treated as void.
1861              'SOURCE' === $tag_name ||
1862              'TRACK' === $tag_name ||
1863              'WBR' === $tag_name
1864          );
1865      }
1866  
1867      /*
1868       * Constants that would pollute the top of the class if they were found there.
1869       */
1870  
1871      /**
1872       * Indicates that the next HTML token should be parsed and processed.
1873       *
1874       * @since 6.4.0
1875       *
1876       * @var string
1877       */
1878      const PROCESS_NEXT_NODE = 'process-next-node';
1879  
1880      /**
1881       * Indicates that the current HTML token should be reprocessed in the newly-selected insertion mode.
1882       *
1883       * @since 6.4.0
1884       *
1885       * @var string
1886       */
1887      const REPROCESS_CURRENT_NODE = 'reprocess-current-node';
1888  
1889      /**
1890       * Indicates that the current HTML token should be processed without advancing the parser.
1891       *
1892       * @since 6.5.0
1893       *
1894       * @var string
1895       */
1896      const PROCESS_CURRENT_NODE = 'process-current-node';
1897  
1898      /**
1899       * Indicates that the parser encountered unsupported markup and has bailed.
1900       *
1901       * @since 6.4.0
1902       *
1903       * @var string
1904       */
1905      const ERROR_UNSUPPORTED = 'unsupported';
1906  
1907      /**
1908       * Indicates that the parser encountered more HTML tokens than it
1909       * was able to process and has bailed.
1910       *
1911       * @since 6.4.0
1912       *
1913       * @var string
1914       */
1915      const ERROR_EXCEEDED_MAX_BOOKMARKS = 'exceeded-max-bookmarks';
1916  
1917      /**
1918       * Unlock code that must be passed into the constructor to create this class.
1919       *
1920       * This class extends the WP_HTML_Tag_Processor, which has a public class
1921       * constructor. Therefore, it's not possible to have a private constructor here.
1922       *
1923       * This unlock code is used to ensure that anyone calling the constructor is
1924       * doing so with a full understanding that it's intended to be a private API.
1925       *
1926       * @access private
1927       */
1928      const CONSTRUCTOR_UNLOCK_CODE = 'Use WP_HTML_Processor::create_fragment() instead of calling the class constructor directly.';
1929  }


Generated : Sat Apr 27 08:20:02 2024 Cross-referenced by PHPXref