[ Index ]

PHP Cross Reference of WordPress Trunk (Updated Daily)





/wp-includes/html-api/ -> class-wp-html-processor.php (source)

   1  <?php
   2  /**
   3   * HTML API: WP_HTML_Processor class
   4   *
   5   * @package WordPress
   6   * @subpackage HTML-API
   7   * @since 6.4.0
   8   */
  10  /**
  11   * Core class used to safely parse and modify an HTML document.
  12   *
  13   * The HTML Processor class properly parses and modifies HTML5 documents.
  14   *
  15   * It supports a subset of the HTML5 specification, and when it encounters
  16   * unsupported markup, it aborts early to avoid unintentionally breaking
  17   * the document. The HTML Processor should never break an HTML document.
  18   *
  19   * While the `WP_HTML_Tag_Processor` is a valuable tool for modifying
  20   * attributes on individual HTML tags, the HTML Processor is more capable
  21   * and useful for the following operations:
  22   *
  23   *  - Querying based on nested HTML structure.
  24   *
  25   * Eventually the HTML Processor will also support:
  26   *  - Wrapping a tag in surrounding HTML.
  27   *  - Unwrapping a tag by removing its parent.
  28   *  - Inserting and removing nodes.
  29   *  - Reading and changing inner content.
  30   *  - Navigating up or around HTML structure.
  31   *
  32   * ## Usage
  33   *
  34   * Use of this class requires three steps:
  35   *
  36   *   1. Call a static creator method with your input HTML document.
  37   *   2. Find the location in the document you are looking for.
  38   *   3. Request changes to the document at that location.
  39   *
  40   * Example:
  41   *
  42   *     $processor = WP_HTML_Processor::create_fragment( $html );
  43   *     if ( $processor->next_tag( array( 'breadcrumbs' => array( 'DIV', 'FIGURE', 'IMG' ) ) ) ) {
  44   *         $processor->add_class( 'responsive-image' );
  45   *     }
  46   *
  47   * #### Breadcrumbs
  48   *
  49   * Breadcrumbs represent the stack of open elements from the root
  50   * of the document or fragment down to the currently-matched node,
  51   * if one is currently selected. Call WP_HTML_Processor::get_breadcrumbs()
  52   * to inspect the breadcrumbs for a matched tag.
  53   *
  54   * Breadcrumbs can specify nested HTML structure and are equivalent
  55   * to a CSS selector comprising tag names separated by the child
  56   * combinator, such as "DIV > FIGURE > IMG".
  57   *
  58   * Since all elements find themselves inside a full HTML document
  59   * when parsed, the return value from `get_breadcrumbs()` will always
  60   * contain any implicit outermost elements. For example, when parsing
  61   * with `create_fragment()` in the `BODY` context (the default), any
  62   * tag in the given HTML document will contain `array( 'HTML', 'BODY', … )`
  63   * in its breadcrumbs.
  64   *
  65   * Despite containing the implied outermost elements in their breadcrumbs,
  66   * tags may be found with the shortest-matching breadcrumb query. That is,
  67   * `array( 'IMG' )` matches all IMG elements and `array( 'P', 'IMG' )`
  68   * matches all IMG elements directly inside a P element. To ensure that no
  69   * partial matches erroneously match it's possible to specify in a query
  70   * the full breadcrumb match all the way down from the root HTML element.
  71   *
  72   * Example:
  73   *
  74   *     $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
  75   *     //               ----- Matches here.
  76   *     $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) );
  77   *
  78   *     $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
  79   *     //                                  ---- Matches here.
  80   *     $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'FIGCAPTION', 'EM' ) ) );
  81   *
  82   *     $html = '<div><img></div><img>';
  83   *     //                       ----- Matches here, because IMG must be a direct child of the implicit BODY.
  84   *     $processor->next_tag( array( 'breadcrumbs' => array( 'BODY', 'IMG' ) ) );
  85   *
  86   * ## HTML Support
  87   *
  88   * This class implements a small part of the HTML5 specification.
  89   * It's designed to operate within its support and abort early whenever
  90   * encountering circumstances it can't properly handle. This is
  91   * the principle way in which this class remains as simple as possible
  92   * without cutting corners and breaking compliance.
  93   *
  94   * ### Supported elements
  95   *
  96   * If any unsupported element appears in the HTML input the HTML Processor
  97   * will abort early and stop all processing. This draconian measure ensures
  98   * that the HTML Processor won't break any HTML it doesn't fully understand.
  99   *
 100   * The HTML Processor supports all elements other than a specific set:
 101   *
 102   *  - Any element inside a TABLE.
 103   *  - Any element inside foreign content, including SVG and MATH.
 104   *  - Any element outside the IN BODY insertion mode, e.g. doctype declarations, meta, links.
 105   *
 106   * ### Supported markup
 107   *
 108   * Some kinds of non-normative HTML involve reconstruction of formatting elements and
 109   * re-parenting of mis-nested elements. For example, a DIV tag found inside a TABLE
 110   * may in fact belong _before_ the table in the DOM. If the HTML Processor encounters
 111   * such a case it will stop processing.
 112   *
 113   * The following list illustrates some common examples of unexpected HTML inputs that
 114   * the HTML Processor properly parses and represents:
 115   *
 116   *  - HTML with optional tags omitted, e.g. `<p>one<p>two`.
 117   *  - HTML with unexpected tag closers, e.g. `<p>one </span> more</p>`.
 118   *  - Non-void tags with self-closing flag, e.g. `<div/>the DIV is still open.</div>`.
 119   *  - Heading elements which close open heading elements of another level, e.g. `<h1>Closed by </h2>`.
 120   *  - Elements containing text that looks like other tags but isn't, e.g. `<title>The <img> is plaintext</title>`.
 121   *  - SCRIPT and STYLE tags containing text that looks like HTML but isn't, e.g. `<script>document.write('<p>Hi</p>');</script>`.
 122   *  - SCRIPT content which has been escaped, e.g. `<script><!-- document.write('<script>console.log("hi")</script>') --></script>`.
 123   *
 124   * ### Unsupported Features
 125   *
 126   * This parser does not report parse errors.
 127   *
 128   * Normally, when additional HTML or BODY tags are encountered in a document, if there
 129   * are any additional attributes on them that aren't found on the previous elements,
 130   * the existing HTML and BODY elements adopt those missing attribute values. This
 131   * parser does not add those additional attributes.
 132   *
 133   * In certain situations, elements are moved to a different part of the document in
 134   * a process called "adoption" and "fostering." Because the nodes move to a location
 135   * in the document that the parser had already processed, this parser does not support
 136   * these situations and will bail.
 137   *
 138   * @since 6.4.0
 139   *
 140   * @see WP_HTML_Tag_Processor
 141   * @see https://html.spec.whatwg.org/
 142   */
 143  class WP_HTML_Processor extends WP_HTML_Tag_Processor {
 144      /**
 145       * The maximum number of bookmarks allowed to exist at any given time.
 146       *
 147       * HTML processing requires more bookmarks than basic tag processing,
 148       * so this class constant from the Tag Processor is overwritten.
 149       *
 150       * @since 6.4.0
 151       *
 152       * @var int
 153       */
 154      const MAX_BOOKMARKS = 100;
 156      /**
 157       * Holds the working state of the parser, including the stack of
 158       * open elements and the stack of active formatting elements.
 159       *
 160       * Initialized in the constructor.
 161       *
 162       * @since 6.4.0
 163       *
 164       * @var WP_HTML_Processor_State
 165       */
 166      private $state;
 168      /**
 169       * Used to create unique bookmark names.
 170       *
 171       * This class sets a bookmark for every tag in the HTML document that it encounters.
 172       * The bookmark name is auto-generated and increments, starting with `1`. These are
 173       * internal bookmarks and are automatically released when the referring WP_HTML_Token
 174       * goes out of scope and is garbage-collected.
 175       *
 176       * @since 6.4.0
 177       *
 178       * @see WP_HTML_Processor::$release_internal_bookmark_on_destruct
 179       *
 180       * @var int
 181       */
 182      private $bookmark_counter = 0;
 184      /**
 185       * Stores an explanation for why something failed, if it did.
 186       *
 187       * @see self::get_last_error
 188       *
 189       * @since 6.4.0
 190       *
 191       * @var string|null
 192       */
 193      private $last_error = null;
 195      /**
 196       * Stores context for why the parser bailed on unsupported HTML, if it did.
 197       *
 198       * @see self::get_unsupported_exception
 199       *
 200       * @since 6.7.0
 201       *
 202       * @var WP_HTML_Unsupported_Exception|null
 203       */
 204      private $unsupported_exception = null;
 206      /**
 207       * Releases a bookmark when PHP garbage-collects its wrapping WP_HTML_Token instance.
 208       *
 209       * This function is created inside the class constructor so that it can be passed to
 210       * the stack of open elements and the stack of active formatting elements without
 211       * exposing it as a public method on the class.
 212       *
 213       * @since 6.4.0
 214       *
 215       * @var Closure|null
 216       */
 217      private $release_internal_bookmark_on_destruct = null;
 219      /**
 220       * Stores stack events which arise during parsing of the
 221       * HTML document, which will then supply the "match" events.
 222       *
 223       * @since 6.6.0
 224       *
 225       * @var WP_HTML_Stack_Event[]
 226       */
 227      private $element_queue = array();
 229      /**
 230       * Stores the current breadcrumbs.
 231       *
 232       * @since 6.7.0
 233       *
 234       * @var string[]
 235       */
 236      private $breadcrumbs = array();
 238      /**
 239       * Current stack event, if set, representing a matched token.
 240       *
 241       * Because the parser may internally point to a place further along in a document
 242       * than the nodes which have already been processed (some "virtual" nodes may have
 243       * appeared while scanning the HTML document), this will point at the "current" node
 244       * being processed. It comes from the front of the element queue.
 245       *
 246       * @since 6.6.0
 247       *
 248       * @var WP_HTML_Stack_Event|null
 249       */
 250      private $current_element = null;
 252      /**
 253       * Context node if created as a fragment parser.
 254       *
 255       * @var WP_HTML_Token|null
 256       */
 257      private $context_node = null;
 259      /*
 260       * Public Interface Functions
 261       */
 263      /**
 264       * Creates an HTML processor in the fragment parsing mode.
 265       *
 266       * Use this for cases where you are processing chunks of HTML that
 267       * will be found within a bigger HTML document, such as rendered
 268       * block output that exists within a post, `the_content` inside a
 269       * rendered site layout.
 270       *
 271       * Fragment parsing occurs within a context, which is an HTML element
 272       * that the document will eventually be placed in. It becomes important
 273       * when special elements have different rules than others, such as inside
 274       * a TEXTAREA or a TITLE tag where things that look like tags are text,
 275       * or inside a SCRIPT tag where things that look like HTML syntax are JS.
 276       *
 277       * The context value should be a representation of the tag into which the
 278       * HTML is found. For most cases this will be the body element. The HTML
 279       * form is provided because a context element may have attributes that
 280       * impact the parse, such as with a SCRIPT tag and its `type` attribute.
 281       *
 282       * ## Current HTML Support
 283       *
 284       *  - The only supported context is `<body>`, which is the default value.
 285       *  - The only supported document encoding is `UTF-8`, which is the default value.
 286       *
 287       * @since 6.4.0
 288       * @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances.
 289       *
 290       * @param string $html     Input HTML fragment to process.
 291       * @param string $context  Context element for the fragment, must be default of `<body>`.
 292       * @param string $encoding Text encoding of the document; must be default of 'UTF-8'.
 293       * @return static|null The created processor if successful, otherwise null.
 294       */
 295  	public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) {
 296          if ( '<body>' !== $context || 'UTF-8' !== $encoding ) {
 297              return null;
 298          }
 300          $processor                             = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
 301          $processor->state->context_node        = array( 'BODY', array() );
 302          $processor->state->insertion_mode      = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
 303          $processor->state->encoding            = $encoding;
 304          $processor->state->encoding_confidence = 'certain';
 306          // @todo Create "fake" bookmarks for non-existent but implied nodes.
 307          $processor->bookmarks['root-node']    = new WP_HTML_Span( 0, 0 );
 308          $processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 );
 310          $root_node = new WP_HTML_Token(
 311              'root-node',
 312              'HTML',
 313              false
 314          );
 316          $processor->state->stack_of_open_elements->push( $root_node );
 318          $context_node = new WP_HTML_Token(
 319              'context-node',
 320              $processor->state->context_node[0],
 321              false
 322          );
 324          $processor->context_node = $context_node;
 325          $processor->breadcrumbs  = array( 'HTML', $context_node->node_name );
 327          return $processor;
 328      }
 330      /**
 331       * Creates an HTML processor in the full parsing mode.
 332       *
 333       * It's likely that a fragment parser is more appropriate, unless sending an
 334       * entire HTML document from start to finish. Consider a fragment parser with
 335       * a context node of `<body>`.
 336       *
 337       * Since UTF-8 is the only currently-accepted charset, if working with a
 338       * document that isn't UTF-8, it's important to convert the document before
 339       * creating the processor: pass in the converted HTML.
 340       *
 341       * @param string      $html                    Input HTML document to process.
 342       * @param string|null $known_definite_encoding Optional. If provided, specifies the charset used
 343       *                                             in the input byte stream. Currently must be UTF-8.
 344       * @return static|null The created processor if successful, otherwise null.
 345       */
 346  	public static function create_full_parser( $html, $known_definite_encoding = 'UTF-8' ) {
 347          if ( 'UTF-8' !== $known_definite_encoding ) {
 348              return null;
 349          }
 351          $processor                             = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
 352          $processor->state->encoding            = $known_definite_encoding;
 353          $processor->state->encoding_confidence = 'certain';
 355          return $processor;
 356      }
 358      /**
 359       * Constructor.
 360       *
 361       * Do not use this method. Use the static creator methods instead.
 362       *
 363       * @access private
 364       *
 365       * @since 6.4.0
 366       *
 367       * @see WP_HTML_Processor::create_fragment()
 368       *
 369       * @param string      $html                                  HTML to process.
 370       * @param string|null $use_the_static_create_methods_instead This constructor should not be called manually.
 371       */
 372  	public function __construct( $html, $use_the_static_create_methods_instead = null ) {
 373          parent::__construct( $html );
 375          if ( self::CONSTRUCTOR_UNLOCK_CODE !== $use_the_static_create_methods_instead ) {
 376              _doing_it_wrong(
 377                  __METHOD__,
 378                  sprintf(
 379                      /* translators: %s: WP_HTML_Processor::create_fragment(). */
 380                      __( 'Call %s to create an HTML Processor instead of calling the constructor directly.' ),
 381                      '<code>WP_HTML_Processor::create_fragment()</code>'
 382                  ),
 383                  '6.4.0'
 384              );
 385          }
 387          $this->state = new WP_HTML_Processor_State();
 389          $this->state->stack_of_open_elements->set_push_handler(
 390              function ( WP_HTML_Token $token ): void {
 391                  $is_virtual            = ! isset( $this->state->current_token ) || $this->is_tag_closer();
 392                  $same_node             = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name;
 393                  $provenance            = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real';
 394                  $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::PUSH, $provenance );
 396                  $this->change_parsing_namespace( $token->namespace );
 397              }
 398          );
 400          $this->state->stack_of_open_elements->set_pop_handler(
 401              function ( WP_HTML_Token $token ): void {
 402                  $is_virtual            = ! isset( $this->state->current_token ) || ! $this->is_tag_closer();
 403                  $same_node             = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name;
 404                  $provenance            = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real';
 405                  $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::POP, $provenance );
 406                  $adjusted_current_node = $this->get_adjusted_current_node();
 407                  $this->change_parsing_namespace(
 408                      $adjusted_current_node
 409                          ? $adjusted_current_node->namespace
 410                          : 'html'
 411                  );
 412              }
 413          );
 415          /*
 416           * Create this wrapper so that it's possible to pass
 417           * a private method into WP_HTML_Token classes without
 418           * exposing it to any public API.
 419           */
 420          $this->release_internal_bookmark_on_destruct = function ( string $name ): void {
 421              parent::release_bookmark( $name );
 422          };
 423      }
 425      /**
 426       * Stops the parser and terminates its execution when encountering unsupported markup.
 427       *
 428       * @throws WP_HTML_Unsupported_Exception Halts execution of the parser.
 429       *
 430       * @since 6.7.0
 431       *
 432       * @param string $message Explains support is missing in order to parse the current node.
 433       */
 434  	private function bail( string $message ) {
 435          $here  = $this->bookmarks[ $this->state->current_token->bookmark_name ];
 436          $token = substr( $this->html, $here->start, $here->length );
 438          $open_elements = array();
 439          foreach ( $this->state->stack_of_open_elements->stack as $item ) {
 440              $open_elements[] = $item->node_name;
 441          }
 443          $active_formats = array();
 444          foreach ( $this->state->active_formatting_elements->walk_down() as $item ) {
 445              $active_formats[] = $item->node_name;
 446          }
 448          $this->last_error = self::ERROR_UNSUPPORTED;
 450          $this->unsupported_exception = new WP_HTML_Unsupported_Exception(
 451              $message,
 452              $this->state->current_token->node_name,
 453              $here->start,
 454              $token,
 455              $open_elements,
 456              $active_formats
 457          );
 459          throw $this->unsupported_exception;
 460      }
 462      /**
 463       * Returns the last error, if any.
 464       *
 465       * Various situations lead to parsing failure but this class will
 466       * return `false` in all those cases. To determine why something
 467       * failed it's possible to request the last error. This can be
 468       * helpful to know to distinguish whether a given tag couldn't
 469       * be found or if content in the document caused the processor
 470       * to give up and abort processing.
 471       *
 472       * Example
 473       *
 474       *     $processor = WP_HTML_Processor::create_fragment( '<template><strong><button><em><p><em>' );
 475       *     false === $processor->next_tag();
 476       *     WP_HTML_Processor::ERROR_UNSUPPORTED === $processor->get_last_error();
 477       *
 478       * @since 6.4.0
 479       *
 480       * @see self::ERROR_UNSUPPORTED
 481       * @see self::ERROR_EXCEEDED_MAX_BOOKMARKS
 482       *
 483       * @return string|null The last error, if one exists, otherwise null.
 484       */
 485  	public function get_last_error(): ?string {
 486          return $this->last_error;
 487      }
 489      /**
 490       * Returns context for why the parser aborted due to unsupported HTML, if it did.
 491       *
 492       * This is meant for debugging purposes, not for production use.
 493       *
 494       * @since 6.7.0
 495       *
 496       * @see self::$unsupported_exception
 497       *
 498       * @return WP_HTML_Unsupported_Exception|null
 499       */
 500  	public function get_unsupported_exception() {
 501          return $this->unsupported_exception;
 502      }
 504      /**
 505       * Finds the next tag matching the $query.
 506       *
 507       * @todo Support matching the class name and tag name.
 508       *
 509       * @since 6.4.0
 510       * @since 6.6.0 Visits all tokens, including virtual ones.
 511       *
 512       * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
 513       *
 514       * @param array|string|null $query {
 515       *     Optional. Which tag name to find, having which class, etc. Default is to find any tag.
 516       *
 517       *     @type string|null $tag_name     Which tag to find, or `null` for "any tag."
 518       *     @type string      $tag_closers  'visit' to pause at tag closers, 'skip' or unset to only visit openers.
 519       *     @type int|null    $match_offset Find the Nth tag matching all search criteria.
 520       *                                     1 for "first" tag, 3 for "third," etc.
 521       *                                     Defaults to first tag.
 522       *     @type string|null $class_name   Tag must contain this whole class name to match.
 523       *     @type string[]    $breadcrumbs  DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`.
 524       *                                     May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`.
 525       * }
 526       * @return bool Whether a tag was matched.
 527       */
 528  	public function next_tag( $query = null ): bool {
 529          $visit_closers = isset( $query['tag_closers'] ) && 'visit' === $query['tag_closers'];
 531          if ( null === $query ) {
 532              while ( $this->next_token() ) {
 533                  if ( '#tag' !== $this->get_token_type() ) {
 534                      continue;
 535                  }
 537                  if ( ! $this->is_tag_closer() || $visit_closers ) {
 538                      return true;
 539                  }
 540              }
 542              return false;
 543          }
 545          if ( is_string( $query ) ) {
 546              $query = array( 'breadcrumbs' => array( $query ) );
 547          }
 549          if ( ! is_array( $query ) ) {
 550              _doing_it_wrong(
 551                  __METHOD__,
 552                  __( 'Please pass a query array to this function.' ),
 553                  '6.4.0'
 554              );
 555              return false;
 556          }
 558          $needs_class = ( isset( $query['class_name'] ) && is_string( $query['class_name'] ) )
 559              ? $query['class_name']
 560              : null;
 562          if ( ! ( array_key_exists( 'breadcrumbs', $query ) && is_array( $query['breadcrumbs'] ) ) ) {
 563              while ( $this->next_token() ) {
 564                  if ( '#tag' !== $this->get_token_type() ) {
 565                      continue;
 566                  }
 568                  if ( isset( $query['tag_name'] ) && $query['tag_name'] !== $this->get_token_name() ) {
 569                      continue;
 570                  }
 572                  if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) {
 573                      continue;
 574                  }
 576                  if ( ! $this->is_tag_closer() || $visit_closers ) {
 577                      return true;
 578                  }
 579              }
 581              return false;
 582          }
 584          $breadcrumbs  = $query['breadcrumbs'];
 585          $match_offset = isset( $query['match_offset'] ) ? (int) $query['match_offset'] : 1;
 587          while ( $match_offset > 0 && $this->next_token() ) {
 588              if ( '#tag' !== $this->get_token_type() || $this->is_tag_closer() ) {
 589                  continue;
 590              }
 592              if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) {
 593                  continue;
 594              }
 596              if ( $this->matches_breadcrumbs( $breadcrumbs ) && 0 === --$match_offset ) {
 597                  return true;
 598              }
 599          }
 601          return false;
 602      }
 604      /**
 605       * Ensures internal accounting is maintained for HTML semantic rules while
 606       * the underlying Tag Processor class is seeking to a bookmark.
 607       *
 608       * This doesn't currently have a way to represent non-tags and doesn't process
 609       * semantic rules for text nodes. For access to the raw tokens consider using
 610       * WP_HTML_Tag_Processor instead.
 611       *
 612       * @since 6.5.0 Added for internal support; do not use.
 613       *
 614       * @access private
 615       *
 616       * @return bool
 617       */
 618  	public function next_token(): bool {
 619          $this->current_element = null;
 621          if ( isset( $this->last_error ) ) {
 622              return false;
 623          }
 625          /*
 626           * Prime the events if there are none.
 627           *
 628           * @todo In some cases, probably related to the adoption agency
 629           *       algorithm, this call to step() doesn't create any new
 630           *       events. Calling it again creates them. Figure out why
 631           *       this is and if it's inherent or if it's a bug. Looping
 632           *       until there are events or until there are no more
 633           *       tokens works in the meantime and isn't obviously wrong.
 634           */
 635          if ( empty( $this->element_queue ) && $this->step() ) {
 636              return $this->next_token();
 637          }
 639          // Process the next event on the queue.
 640          $this->current_element = array_shift( $this->element_queue );
 641          if ( ! isset( $this->current_element ) ) {
 642              // There are no tokens left, so close all remaining open elements.
 643              while ( $this->state->stack_of_open_elements->pop() ) {
 644                  continue;
 645              }
 647              return empty( $this->element_queue ) ? false : $this->next_token();
 648          }
 650          $is_pop = WP_HTML_Stack_Event::POP === $this->current_element->operation;
 652          /*
 653           * The root node only exists in the fragment parser, and closing it
 654           * indicates that the parse is complete. Stop before popping it from
 655           * the breadcrumbs.
 656           */
 657          if ( 'root-node' === $this->current_element->token->bookmark_name ) {
 658              return $this->next_token();
 659          }
 661          // Adjust the breadcrumbs for this event.
 662          if ( $is_pop ) {
 663              array_pop( $this->breadcrumbs );
 664          } else {
 665              $this->breadcrumbs[] = $this->current_element->token->node_name;
 666          }
 668          // Avoid sending close events for elements which don't expect a closing.
 669          if ( $is_pop && ! $this->expects_closer( $this->current_element->token ) ) {
 670              return $this->next_token();
 671          }
 673          return true;
 674      }
 676      /**
 677       * Indicates if the current tag token is a tag closer.
 678       *
 679       * Example:
 680       *
 681       *     $p = WP_HTML_Processor::create_fragment( '<div></div>' );
 682       *     $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) );
 683       *     $p->is_tag_closer() === false;
 684       *
 685       *     $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) );
 686       *     $p->is_tag_closer() === true;
 687       *
 688       * @since 6.6.0 Subclassed for HTML Processor.
 689       *
 690       * @return bool Whether the current tag is a tag closer.
 691       */
 692  	public function is_tag_closer(): bool {
 693          return $this->is_virtual()
 694              ? ( WP_HTML_Stack_Event::POP === $this->current_element->operation && '#tag' === $this->get_token_type() )
 695              : parent::is_tag_closer();
 696      }
 698      /**
 699       * Indicates if the currently-matched token is virtual, created by a stack operation
 700       * while processing HTML, rather than a token found in the HTML text itself.
 701       *
 702       * @since 6.6.0
 703       *
 704       * @return bool Whether the current token is virtual.
 705       */
 706  	private function is_virtual(): bool {
 707          return (
 708              isset( $this->current_element->provenance ) &&
 709              'virtual' === $this->current_element->provenance
 710          );
 711      }
 713      /**
 714       * Indicates if the currently-matched tag matches the given breadcrumbs.
 715       *
 716       * A "*" represents a single tag wildcard, where any tag matches, but not no tags.
 717       *
 718       * At some point this function _may_ support a `**` syntax for matching any number
 719       * of unspecified tags in the breadcrumb stack. This has been intentionally left
 720       * out, however, to keep this function simple and to avoid introducing backtracking,
 721       * which could open up surprising performance breakdowns.
 722       *
 723       * Example:
 724       *
 725       *     $processor = WP_HTML_Processor::create_fragment( '<div><span><figure><img></figure></span></div>' );
 726       *     $processor->next_tag( 'img' );
 727       *     true  === $processor->matches_breadcrumbs( array( 'figure', 'img' ) );
 728       *     true  === $processor->matches_breadcrumbs( array( 'span', 'figure', 'img' ) );
 729       *     false === $processor->matches_breadcrumbs( array( 'span', 'img' ) );
 730       *     true  === $processor->matches_breadcrumbs( array( 'span', '*', 'img' ) );
 731       *
 732       * @since 6.4.0
 733       *
 734       * @param string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`.
 735       *                              May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`.
 736       * @return bool Whether the currently-matched tag is found at the given nested structure.
 737       */
 738  	public function matches_breadcrumbs( $breadcrumbs ): bool {
 739          // Everything matches when there are zero constraints.
 740          if ( 0 === count( $breadcrumbs ) ) {
 741              return true;
 742          }
 744          // Start at the last crumb.
 745          $crumb = end( $breadcrumbs );
 747          if ( '*' !== $crumb && $this->get_tag() !== strtoupper( $crumb ) ) {
 748              return false;
 749          }
 751          for ( $i = count( $this->breadcrumbs ) - 1; $i >= 0; $i-- ) {
 752              $node  = $this->breadcrumbs[ $i ];
 753              $crumb = strtoupper( current( $breadcrumbs ) );
 755              if ( '*' !== $crumb && $node !== $crumb ) {
 756                  return false;
 757              }
 759              if ( false === prev( $breadcrumbs ) ) {
 760                  return true;
 761              }
 762          }
 764          return false;
 765      }
 767      /**
 768       * Indicates if the currently-matched node expects a closing
 769       * token, or if it will self-close on the next step.
 770       *
 771       * Most HTML elements expect a closer, such as a P element or
 772       * a DIV element. Others, like an IMG element are void and don't
 773       * have a closing tag. Special elements, such as SCRIPT and STYLE,
 774       * are treated just like void tags. Text nodes and self-closing
 775       * foreign content will also act just like a void tag, immediately
 776       * closing as soon as the processor advances to the next token.
 777       *
 778       * @since 6.6.0
 779       *
 780       * @param WP_HTML_Token|null $node Optional. Node to examine, if provided.
 781       *                                 Default is to examine current node.
 782       * @return bool|null Whether to expect a closer for the currently-matched node,
 783       *                   or `null` if not matched on any token.
 784       */
 785  	public function expects_closer( WP_HTML_Token $node = null ): ?bool {
 786          $token_name = $node->node_name ?? $this->get_token_name();
 788          if ( ! isset( $token_name ) ) {
 789              return null;
 790          }
 792          $token_namespace        = $node->namespace ?? $this->get_namespace();
 793          $token_has_self_closing = $node->has_self_closing_flag ?? $this->has_self_closing_flag();
 795          return ! (
 796              // Comments, text nodes, and other atomic tokens.
 797              '#' === $token_name[0] ||
 798              // Doctype declarations.
 799              'html' === $token_name ||
 800              // Void elements.
 801              self::is_void( $token_name ) ||
 802              // Special atomic elements.
 803              ( 'html' === $token_namespace && in_array( $token_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) ) ||
 804              // Self-closing elements in foreign content.
 805              ( 'html' !== $token_namespace && $token_has_self_closing )
 806          );
 807      }
 809      /**
 810       * Steps through the HTML document and stop at the next tag, if any.
 811       *
 812       * @since 6.4.0
 813       *
 814       * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
 815       *
 816       * @see self::PROCESS_NEXT_NODE
 817       * @see self::REPROCESS_CURRENT_NODE
 818       *
 819       * @param string $node_to_process Whether to parse the next node or reprocess the current node.
 820       * @return bool Whether a tag was matched.
 821       */
 822  	public function step( $node_to_process = self::PROCESS_NEXT_NODE ): bool {
 823          // Refuse to proceed if there was a previous error.
 824          if ( null !== $this->last_error ) {
 825              return false;
 826          }
 828          if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) {
 829              /*
 830               * Void elements still hop onto the stack of open elements even though
 831               * there's no corresponding closing tag. This is important for managing
 832               * stack-based operations such as "navigate to parent node" or checking
 833               * on an element's breadcrumbs.
 834               *
 835               * When moving on to the next node, therefore, if the bottom-most element
 836               * on the stack is a void element, it must be closed.
 837               */
 838              $top_node = $this->state->stack_of_open_elements->current_node();
 839              if ( isset( $top_node ) && ! $this->expects_closer( $top_node ) ) {
 840                  $this->state->stack_of_open_elements->pop();
 841              }
 842          }
 844          if ( self::PROCESS_NEXT_NODE === $node_to_process ) {
 845              parent::next_token();
 846              if ( WP_HTML_Tag_Processor::STATE_TEXT_NODE === $this->parser_state ) {
 847                  parent::subdivide_text_appropriately();
 848              }
 849          }
 851          // Finish stepping when there are no more tokens in the document.
 852          if (
 853              WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state ||
 854              WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state
 855          ) {
 856              return false;
 857          }
 859          $adjusted_current_node = $this->get_adjusted_current_node();
 860          $is_closer             = $this->is_tag_closer();
 861          $is_start_tag          = WP_HTML_Tag_Processor::STATE_MATCHED_TAG === $this->parser_state && ! $is_closer;
 862          $token_name            = $this->get_token_name();
 864          if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) {
 865              $this->state->current_token = new WP_HTML_Token(
 866                  $this->bookmark_token(),
 867                  $token_name,
 868                  $this->has_self_closing_flag(),
 869                  $this->release_internal_bookmark_on_destruct
 870              );
 871          }
 873          $parse_in_current_insertion_mode = (
 874              0 === $this->state->stack_of_open_elements->count() ||
 875              'html' === $adjusted_current_node->namespace ||
 876              (
 877                  'math' === $adjusted_current_node->integration_node_type &&
 878                  (
 879                      ( $is_start_tag && ! in_array( $token_name, array( 'MGLYPH', 'MALIGNMARK' ), true ) ) ||
 880                      '#text' === $token_name
 881                  )
 882              ) ||
 883              (
 884                  'math' === $adjusted_current_node->namespace &&
 885                  'ANNOTATION-XML' === $adjusted_current_node->node_name &&
 886                  $is_start_tag && 'SVG' === $token_name
 887              ) ||
 888              (
 889                  'html' === $adjusted_current_node->integration_node_type &&
 890                  ( $is_start_tag || '#text' === $token_name )
 891              )
 892          );
 894          try {
 895              if ( ! $parse_in_current_insertion_mode ) {
 896                  return $this->step_in_foreign_content();
 897              }
 899              switch ( $this->state->insertion_mode ) {
 900                  case WP_HTML_Processor_State::INSERTION_MODE_INITIAL:
 901                      return $this->step_initial();
 903                  case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML:
 904                      return $this->step_before_html();
 906                  case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD:
 907                      return $this->step_before_head();
 909                  case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD:
 910                      return $this->step_in_head();
 912                  case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT:
 913                      return $this->step_in_head_noscript();
 915                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD:
 916                      return $this->step_after_head();
 918                  case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY:
 919                      return $this->step_in_body();
 921                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE:
 922                      return $this->step_in_table();
 924                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT:
 925                      return $this->step_in_table_text();
 927                  case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION:
 928                      return $this->step_in_caption();
 930                  case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP:
 931                      return $this->step_in_column_group();
 933                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY:
 934                      return $this->step_in_table_body();
 936                  case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW:
 937                      return $this->step_in_row();
 939                  case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL:
 940                      return $this->step_in_cell();
 942                  case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT:
 943                      return $this->step_in_select();
 945                  case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE:
 946                      return $this->step_in_select_in_table();
 948                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE:
 949                      return $this->step_in_template();
 951                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY:
 952                      return $this->step_after_body();
 954                  case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET:
 955                      return $this->step_in_frameset();
 957                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET:
 958                      return $this->step_after_frameset();
 960                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY:
 961                      return $this->step_after_after_body();
 963                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET:
 964                      return $this->step_after_after_frameset();
 966                  // This should be unreachable but PHP doesn't have total type checking on switch.
 967                  default:
 968                      $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." );
 969              }
 970          } catch ( WP_HTML_Unsupported_Exception $e ) {
 971              /*
 972               * Exceptions are used in this class to escape deep call stacks that
 973               * otherwise might involve messier calling and return conventions.
 974               */
 975              return false;
 976          }
 977      }
 979      /**
 980       * Computes the HTML breadcrumbs for the currently-matched node, if matched.
 981       *
 982       * Breadcrumbs start at the outermost parent and descend toward the matched element.
 983       * They always include the entire path from the root HTML node to the matched element.
 984       *
 985       * @todo It could be more efficient to expose a generator-based version of this function
 986       *       to avoid creating the array copy on tag iteration. If this is done, it would likely
 987       *       be more useful to walk up the stack when yielding instead of starting at the top.
 988       *
 989       * Example
 990       *
 991       *     $processor = WP_HTML_Processor::create_fragment( '<p><strong><em><img></em></strong></p>' );
 992       *     $processor->next_tag( 'IMG' );
 993       *     $processor->get_breadcrumbs() === array( 'HTML', 'BODY', 'P', 'STRONG', 'EM', 'IMG' );
 994       *
 995       * @since 6.4.0
 996       *
 997       * @return string[]|null Array of tag names representing path to matched node, if matched, otherwise NULL.
 998       */
 999  	public function get_breadcrumbs(): ?array {
1000          return $this->breadcrumbs;
1001      }
1003      /**
1004       * Returns the nesting depth of the current location in the document.
1005       *
1006       * Example:
1007       *
1008       *     $processor = WP_HTML_Processor::create_fragment( '<div><p></p></div>' );
1009       *     // The processor starts in the BODY context, meaning it has depth from the start: HTML > BODY.
1010       *     2 === $processor->get_current_depth();
1011       *
1012       *     // Opening the DIV element increases the depth.
1013       *     $processor->next_token();
1014       *     3 === $processor->get_current_depth();
1015       *
1016       *     // Opening the P element increases the depth.
1017       *     $processor->next_token();
1018       *     4 === $processor->get_current_depth();
1019       *
1020       *     // The P element is closed during `next_token()` so the depth is decreased to reflect that.
1021       *     $processor->next_token();
1022       *     3 === $processor->get_current_depth();
1023       *
1024       * @since 6.6.0
1025       *
1026       * @return int Nesting-depth of current location in the document.
1027       */
1028  	public function get_current_depth(): int {
1029          return count( $this->breadcrumbs );
1030      }
1032      /**
1033       * Parses next element in the 'initial' insertion mode.
1034       *
1035       * This internal function performs the 'initial' insertion mode
1036       * logic for the generalized WP_HTML_Processor::step() function.
1037       *
1038       * @since 6.7.0
1039       *
1040       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1041       *
1042       * @see https://html.spec.whatwg.org/#the-initial-insertion-mode
1043       * @see WP_HTML_Processor::step
1044       *
1045       * @return bool Whether an element was found.
1046       */
1047  	private function step_initial(): bool {
1048          $token_name = $this->get_token_name();
1049          $token_type = $this->get_token_type();
1050          $op_sigil   = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
1051          $op         = "{$op_sigil}{$token_name}";
1053          switch ( $op ) {
1054              /*
1055               * > A character token that is one of U+0009 CHARACTER TABULATION,
1056               * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1057               * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1058               *
1059               * Parse error: ignore the token.
1060               */
1061              case '#text':
1062                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1063                      return $this->step();
1064                  }
1065                  goto initial_anything_else;
1066                  break;
1068              /*
1069               * > A comment token
1070               */
1071              case '#comment':
1072              case '#funky-comment':
1073              case '#presumptuous-tag':
1074                  $this->insert_html_element( $this->state->current_token );
1075                  return true;
1077              /*
1078               * > A DOCTYPE token
1079               */
1080              case 'html':
1081                  $doctype = $this->get_doctype_info();
1082                  if ( null !== $doctype && 'quirks' === $doctype->indicated_compatability_mode ) {
1083                      $this->compat_mode = WP_HTML_Tag_Processor::QUIRKS_MODE;
1084                  }
1086                  /*
1087                   * > Then, switch the insertion mode to "before html".
1088                   */
1089                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
1090                  $this->insert_html_element( $this->state->current_token );
1091                  return true;
1092          }
1094          /*
1095           * > Anything else
1096           */
1097          initial_anything_else:
1098          $this->compat_mode           = WP_HTML_Tag_Processor::QUIRKS_MODE;
1099          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
1100          return $this->step( self::REPROCESS_CURRENT_NODE );
1101      }
1103      /**
1104       * Parses next element in the 'before html' insertion mode.
1105       *
1106       * This internal function performs the 'before html' insertion mode
1107       * logic for the generalized WP_HTML_Processor::step() function.
1108       *
1109       * @since 6.7.0
1110       *
1111       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1112       *
1113       * @see https://html.spec.whatwg.org/#the-before-html-insertion-mode
1114       * @see WP_HTML_Processor::step
1115       *
1116       * @return bool Whether an element was found.
1117       */
1118  	private function step_before_html(): bool {
1119          $token_name = $this->get_token_name();
1120          $token_type = $this->get_token_type();
1121          $is_closer  = parent::is_tag_closer();
1122          $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1123          $op         = "{$op_sigil}{$token_name}";
1125          switch ( $op ) {
1126              /*
1127               * > A DOCTYPE token
1128               */
1129              case 'html':
1130                  // Parse error: ignore the token.
1131                  return $this->step();
1133              /*
1134               * > A comment token
1135               */
1136              case '#comment':
1137              case '#funky-comment':
1138              case '#presumptuous-tag':
1139                  $this->insert_html_element( $this->state->current_token );
1140                  return true;
1142              /*
1143               * > A character token that is one of U+0009 CHARACTER TABULATION,
1144               * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1145               * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1146               *
1147               * Parse error: ignore the token.
1148               */
1149              case '#text':
1150                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1151                      return $this->step();
1152                  }
1153                  goto before_html_anything_else;
1154                  break;
1156              /*
1157               * > A start tag whose tag name is "html"
1158               */
1159              case '+HTML':
1160                  $this->insert_html_element( $this->state->current_token );
1161                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
1162                  return true;
1164              /*
1165               * > An end tag whose tag name is one of: "head", "body", "html", "br"
1166               *
1167               * Closing BR tags are always reported by the Tag Processor as opening tags.
1168               */
1169              case '-HEAD':
1170              case '-BODY':
1171              case '-HTML':
1172                  /*
1173                   * > Act as described in the "anything else" entry below.
1174                   */
1175                  goto before_html_anything_else;
1176                  break;
1177          }
1179          /*
1180           * > Any other end tag
1181           */
1182          if ( $is_closer ) {
1183              // Parse error: ignore the token.
1184              return $this->step();
1185          }
1187          /*
1188           * > Anything else.
1189           *
1190           * > Create an html element whose node document is the Document object.
1191           * > Append it to the Document object. Put this element in the stack of open elements.
1192           * > Switch the insertion mode to "before head", then reprocess the token.
1193           */
1194          before_html_anything_else:
1195          $this->insert_virtual_node( 'HTML' );
1196          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
1197          return $this->step( self::REPROCESS_CURRENT_NODE );
1198      }
1200      /**
1201       * Parses next element in the 'before head' insertion mode.
1202       *
1203       * This internal function performs the 'before head' insertion mode
1204       * logic for the generalized WP_HTML_Processor::step() function.
1205       *
1206       * @since 6.7.0 Stub implementation.
1207       *
1208       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1209       *
1210       * @see https://html.spec.whatwg.org/#the-before-head-insertion-mode
1211       * @see WP_HTML_Processor::step
1212       *
1213       * @return bool Whether an element was found.
1214       */
1215  	private function step_before_head(): bool {
1216          $token_name = $this->get_token_name();
1217          $token_type = $this->get_token_type();
1218          $is_closer  = parent::is_tag_closer();
1219          $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1220          $op         = "{$op_sigil}{$token_name}";
1222          switch ( $op ) {
1223              /*
1224               * > A character token that is one of U+0009 CHARACTER TABULATION,
1225               * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1226               * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1227               *
1228               * Parse error: ignore the token.
1229               */
1230              case '#text':
1231                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1232                      return $this->step();
1233                  }
1234                  goto before_head_anything_else;
1235                  break;
1237              /*
1238               * > A comment token
1239               */
1240              case '#comment':
1241              case '#funky-comment':
1242              case '#presumptuous-tag':
1243                  $this->insert_html_element( $this->state->current_token );
1244                  return true;
1246              /*
1247               * > A DOCTYPE token
1248               */
1249              case 'html':
1250                  // Parse error: ignore the token.
1251                  return $this->step();
1253              /*
1254               * > A start tag whose tag name is "html"
1255               */
1256              case '+HTML':
1257                  return $this->step_in_body();
1259              /*
1260               * > A start tag whose tag name is "head"
1261               */
1262              case '+HEAD':
1263                  $this->insert_html_element( $this->state->current_token );
1264                  $this->state->head_element   = $this->state->current_token;
1265                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
1266                  return true;
1268              /*
1269               * > An end tag whose tag name is one of: "head", "body", "html", "br"
1270               * > Act as described in the "anything else" entry below.
1271               *
1272               * Closing BR tags are always reported by the Tag Processor as opening tags.
1273               */
1274              case '-HEAD':
1275              case '-BODY':
1276              case '-HTML':
1277                  goto before_head_anything_else;
1278                  break;
1279          }
1281          if ( $is_closer ) {
1282              // Parse error: ignore the token.
1283              return $this->step();
1284          }
1286          /*
1287           * > Anything else
1288           *
1289           * > Insert an HTML element for a "head" start tag token with no attributes.
1290           */
1291          before_head_anything_else:
1292          $this->state->head_element   = $this->insert_virtual_node( 'HEAD' );
1293          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
1294          return $this->step( self::REPROCESS_CURRENT_NODE );
1295      }
1297      /**
1298       * Parses next element in the 'in head' insertion mode.
1299       *
1300       * This internal function performs the 'in head' insertion mode
1301       * logic for the generalized WP_HTML_Processor::step() function.
1302       *
1303       * @since 6.7.0
1304       *
1305       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1306       *
1307       * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhead
1308       * @see WP_HTML_Processor::step
1309       *
1310       * @return bool Whether an element was found.
1311       */
1312  	private function step_in_head(): bool {
1313          $token_name = $this->get_token_name();
1314          $token_type = $this->get_token_type();
1315          $is_closer  = parent::is_tag_closer();
1316          $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1317          $op         = "{$op_sigil}{$token_name}";
1319          switch ( $op ) {
1320              case '#text':
1321                  /*
1322                   * > A character token that is one of U+0009 CHARACTER TABULATION,
1323                   * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1324                   * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1325                   */
1326                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1327                      // Insert the character.
1328                      $this->insert_html_element( $this->state->current_token );
1329                      return true;
1330                  }
1332                  goto in_head_anything_else;
1333                  break;
1335              /*
1336               * > A comment token
1337               */
1338              case '#comment':
1339              case '#funky-comment':
1340              case '#presumptuous-tag':
1341                  $this->insert_html_element( $this->state->current_token );
1342                  return true;
1344              /*
1345               * > A DOCTYPE token
1346               */
1347              case 'html':
1348                  // Parse error: ignore the token.
1349                  return $this->step();
1351              /*
1352               * > A start tag whose tag name is "html"
1353               */
1354              case '+HTML':
1355                  return $this->step_in_body();
1357              /*
1358               * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link"
1359               */
1360              case '+BASE':
1361              case '+BASEFONT':
1362              case '+BGSOUND':
1363              case '+LINK':
1364                  $this->insert_html_element( $this->state->current_token );
1365                  return true;
1367              /*
1368               * > A start tag whose tag name is "meta"
1369               */
1370              case '+META':
1371                  $this->insert_html_element( $this->state->current_token );
1373                  /*
1374                   * > If the active speculative HTML parser is null, then:
1375                   * >   - If the element has a charset attribute, and getting an encoding from
1376                   * >     its value results in an encoding, and the confidence is currently
1377                   * >     tentative, then change the encoding to the resulting encoding.
1378                   */
1379                  $charset = $this->get_attribute( 'charset' );
1380                  if ( is_string( $charset ) && 'tentative' === $this->state->encoding_confidence ) {
1381                      $this->bail( 'Cannot yet process META tags with charset to determine encoding.' );
1382                  }
1384                  /*
1385                   * >   - Otherwise, if the element has an http-equiv attribute whose value is
1386                   * >     an ASCII case-insensitive match for the string "Content-Type", and
1387                   * >     the element has a content attribute, and applying the algorithm for
1388                   * >     extracting a character encoding from a meta element to that attribute's
1389                   * >     value returns an encoding, and the confidence is currently tentative,
1390                   * >     then change the encoding to the extracted encoding.
1391                   */
1392                  $http_equiv = $this->get_attribute( 'http-equiv' );
1393                  $content    = $this->get_attribute( 'content' );
1394                  if (
1395                      is_string( $http_equiv ) &&
1396                      is_string( $content ) &&
1397                      0 === strcasecmp( $http_equiv, 'Content-Type' ) &&
1398                      'tentative' === $this->state->encoding_confidence
1399                  ) {
1400                      $this->bail( 'Cannot yet process META tags with http-equiv Content-Type to determine encoding.' );
1401                  }
1403                  return true;
1405              /*
1406               * > A start tag whose tag name is "title"
1407               */
1408              case '+TITLE':
1409                  $this->insert_html_element( $this->state->current_token );
1410                  return true;
1412              /*
1413               * > A start tag whose tag name is "noscript", if the scripting flag is enabled
1414               * > A start tag whose tag name is one of: "noframes", "style"
1415               *
1416               * The scripting flag is never enabled in this parser.
1417               */
1418              case '+NOFRAMES':
1419              case '+STYLE':
1420                  $this->insert_html_element( $this->state->current_token );
1421                  return true;
1423              /*
1424               * > A start tag whose tag name is "noscript", if the scripting flag is disabled
1425               */
1426              case '+NOSCRIPT':
1427                  $this->insert_html_element( $this->state->current_token );
1428                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT;
1429                  return true;
1431              /*
1432               * > A start tag whose tag name is "script"
1433               *
1434               * @todo Could the adjusted insertion location be anything other than the current location?
1435               */
1436              case '+SCRIPT':
1437                  $this->insert_html_element( $this->state->current_token );
1438                  return true;
1440              /*
1441               * > An end tag whose tag name is "head"
1442               */
1443              case '-HEAD':
1444                  $this->state->stack_of_open_elements->pop();
1445                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD;
1446                  return true;
1448              /*
1449               * > An end tag whose tag name is one of: "body", "html", "br"
1450               *
1451               * BR tags are always reported by the Tag Processor as opening tags.
1452               */
1453              case '-BODY':
1454              case '-HTML':
1455                  /*
1456                   * > Act as described in the "anything else" entry below.
1457                   */
1458                  goto in_head_anything_else;
1459                  break;
1461              /*
1462               * > A start tag whose tag name is "template"
1463               *
1464               * @todo Could the adjusted insertion location be anything other than the current location?
1465               */
1466              case '+TEMPLATE':
1467                  $this->state->active_formatting_elements->insert_marker();
1468                  $this->state->frameset_ok = false;
1470                  $this->state->insertion_mode                      = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
1471                  $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
1473                  $this->insert_html_element( $this->state->current_token );
1474                  return true;
1476              /*
1477               * > An end tag whose tag name is "template"
1478               */
1479              case '-TEMPLATE':
1480                  if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
1481                      // @todo Indicate a parse error once it's possible.
1482                      return $this->step();
1483                  }
1485                  $this->generate_implied_end_tags_thoroughly();
1486                  if ( ! $this->state->stack_of_open_elements->current_node_is( 'TEMPLATE' ) ) {
1487                      // @todo Indicate a parse error once it's possible.
1488                  }
1490                  $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' );
1491                  $this->state->active_formatting_elements->clear_up_to_last_marker();
1492                  array_pop( $this->state->stack_of_template_insertion_modes );
1493                  $this->reset_insertion_mode_appropriately();
1494                  return true;
1495          }
1497          /*
1498           * > A start tag whose tag name is "head"
1499           * > Any other end tag
1500           */
1501          if ( '+HEAD' === $op || $is_closer ) {
1502              // Parse error: ignore the token.
1503              return $this->step();
1504          }
1506          /*
1507           * > Anything else
1508           */
1509          in_head_anything_else:
1510          $this->state->stack_of_open_elements->pop();
1511          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD;
1512          return $this->step( self::REPROCESS_CURRENT_NODE );
1513      }
1515      /**
1516       * Parses next element in the 'in head noscript' insertion mode.
1517       *
1518       * This internal function performs the 'in head noscript' insertion mode
1519       * logic for the generalized WP_HTML_Processor::step() function.
1520       *
1521       * @since 6.7.0 Stub implementation.
1522       *
1523       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1524       *
1525       * @see https://html.spec.whatwg.org/#parsing-main-inheadnoscript
1526       * @see WP_HTML_Processor::step
1527       *
1528       * @return bool Whether an element was found.
1529       */
1530  	private function step_in_head_noscript(): bool {
1531          $token_name = $this->get_token_name();
1532          $token_type = $this->get_token_type();
1533          $is_closer  = parent::is_tag_closer();
1534          $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1535          $op         = "{$op_sigil}{$token_name}";
1537          switch ( $op ) {
1538              /*
1539               * > A character token that is one of U+0009 CHARACTER TABULATION,
1540               * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1541               * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1542               *
1543               * Parse error: ignore the token.
1544               */
1545              case '#text':
1546                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1547                      return $this->step_in_head();
1548                  }
1550                  goto in_head_noscript_anything_else;
1551                  break;
1553              /*
1554               * > A DOCTYPE token
1555               */
1556              case 'html':
1557                  // Parse error: ignore the token.
1558                  return $this->step();
1560              /*
1561               * > A start tag whose tag name is "html"
1562               */
1563              case '+HTML':
1564                  return $this->step_in_body();
1566              /*
1567               * > An end tag whose tag name is "noscript"
1568               */
1569              case '-NOSCRIPT':
1570                  $this->state->stack_of_open_elements->pop();
1571                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
1572                  return true;
1574              /*
1575               * > A comment token
1576               * >
1577               * > A start tag whose tag name is one of: "basefont", "bgsound",
1578               * > "link", "meta", "noframes", "style"
1579               */
1580              case '#comment':
1581              case '#funky-comment':
1582              case '#presumptuous-tag':
1583              case '+BASEFONT':
1584              case '+BGSOUND':
1585              case '+LINK':
1586              case '+META':
1587              case '+NOFRAMES':
1588              case '+STYLE':
1589                  return $this->step_in_head();
1591              /*
1592               * > An end tag whose tag name is "br"
1593               *
1594               * This should never happen, as the Tag Processor prevents showing a BR closing tag.
1595               */
1596          }
1598          /*
1599           * > A start tag whose tag name is one of: "head", "noscript"
1600           * > Any other end tag
1601           */
1602          if ( '+HEAD' === $op || '+NOSCRIPT' === $op || $is_closer ) {
1603              // Parse error: ignore the token.
1604              return $this->step();
1605          }
1607          /*
1608           * > Anything else
1609           *
1610           * Anything here is a parse error.
1611           */
1612          in_head_noscript_anything_else:
1613          $this->state->stack_of_open_elements->pop();
1614          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
1615          return $this->step( self::REPROCESS_CURRENT_NODE );
1616      }
1618      /**
1619       * Parses next element in the 'after head' insertion mode.
1620       *
1621       * This internal function performs the 'after head' insertion mode
1622       * logic for the generalized WP_HTML_Processor::step() function.
1623       *
1624       * @since 6.7.0 Stub implementation.
1625       *
1626       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1627       *
1628       * @see https://html.spec.whatwg.org/#the-after-head-insertion-mode
1629       * @see WP_HTML_Processor::step
1630       *
1631       * @return bool Whether an element was found.
1632       */
1633  	private function step_after_head(): bool {
1634          $token_name = $this->get_token_name();
1635          $token_type = $this->get_token_type();
1636          $is_closer  = parent::is_tag_closer();
1637          $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1638          $op         = "{$op_sigil}{$token_name}";
1640          switch ( $op ) {
1641              /*
1642               * > A character token that is one of U+0009 CHARACTER TABULATION,
1643               * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1644               * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1645               */
1646              case '#text':
1647                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1648                      // Insert the character.
1649                      $this->insert_html_element( $this->state->current_token );
1650                      return true;
1651                  }
1652                  goto after_head_anything_else;
1653                  break;
1655              /*
1656               * > A comment token
1657               */
1658              case '#comment':
1659              case '#funky-comment':
1660              case '#presumptuous-tag':
1661                  $this->insert_html_element( $this->state->current_token );
1662                  return true;
1664              /*
1665               * > A DOCTYPE token
1666               */
1667              case 'html':
1668                  // Parse error: ignore the token.
1669                  return $this->step();
1671              /*
1672               * > A start tag whose tag name is "html"
1673               */
1674              case '+HTML':
1675                  return $this->step_in_body();
1677              /*
1678               * > A start tag whose tag name is "body"
1679               */
1680              case '+BODY':
1681                  $this->insert_html_element( $this->state->current_token );
1682                  $this->state->frameset_ok    = false;
1683                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
1684                  return true;
1686              /*
1687               * > A start tag whose tag name is "frameset"
1688               */
1689              case '+FRAMESET':
1690                  $this->insert_html_element( $this->state->current_token );
1691                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET;
1692                  return true;
1694              /*
1695               * > A start tag whose tag name is one of: "base", "basefont", "bgsound",
1696               * > "link", "meta", "noframes", "script", "style", "template", "title"
1697               *
1698               * Anything here is a parse error.
1699               */
1700              case '+BASE':
1701              case '+BASEFONT':
1702              case '+BGSOUND':
1703              case '+LINK':
1704              case '+META':
1705              case '+NOFRAMES':
1706              case '+SCRIPT':
1707              case '+STYLE':
1708              case '+TEMPLATE':
1709              case '+TITLE':
1710                  /*
1711                   * > Push the node pointed to by the head element pointer onto the stack of open elements.
1712                   * > Process the token using the rules for the "in head" insertion mode.
1713                   * > Remove the node pointed to by the head element pointer from the stack of open elements. (It might not be the current node at this point.)
1714                   */
1715                  $this->bail( 'Cannot process elements after HEAD which reopen the HEAD element.' );
1716                  /*
1717                   * Do not leave this break in when adding support; it's here to prevent
1718                   * WPCS from getting confused at the switch structure without a return,
1719                   * because it doesn't know that `bail()` always throws.
1720                   */
1721                  break;
1723              /*
1724               * > An end tag whose tag name is "template"
1725               */
1726              case '-TEMPLATE':
1727                  return $this->step_in_head();
1729              /*
1730               * > An end tag whose tag name is one of: "body", "html", "br"
1731               *
1732               * Closing BR tags are always reported by the Tag Processor as opening tags.
1733               */
1734              case '-BODY':
1735              case '-HTML':
1736                  /*
1737                   * > Act as described in the "anything else" entry below.
1738                   */
1739                  goto after_head_anything_else;
1740                  break;
1741          }
1743          /*
1744           * > A start tag whose tag name is "head"
1745           * > Any other end tag
1746           */
1747          if ( '+HEAD' === $op || $is_closer ) {
1748              // Parse error: ignore the token.
1749              return $this->step();
1750          }
1752          /*
1753           * > Anything else
1754           * > Insert an HTML element for a "body" start tag token with no attributes.
1755           */
1756          after_head_anything_else:
1757          $this->insert_virtual_node( 'BODY' );
1758          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
1759          return $this->step( self::REPROCESS_CURRENT_NODE );
1760      }
1762      /**
1763       * Parses next element in the 'in body' insertion mode.
1764       *
1765       * This internal function performs the 'in body' insertion mode
1766       * logic for the generalized WP_HTML_Processor::step() function.
1767       *
1768       * @since 6.4.0
1769       *
1770       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1771       *
1772       * @see https://html.spec.whatwg.org/#parsing-main-inbody
1773       * @see WP_HTML_Processor::step
1774       *
1775       * @return bool Whether an element was found.
1776       */
1777  	private function step_in_body(): bool {
1778          $token_name = $this->get_token_name();
1779          $token_type = $this->get_token_type();
1780          $op_sigil   = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
1781          $op         = "{$op_sigil}{$token_name}";
1783          switch ( $op ) {
1784              case '#text':
1785                  /*
1786                   * > A character token that is U+0000 NULL
1787                   *
1788                   * Any successive sequence of NULL bytes is ignored and won't
1789                   * trigger active format reconstruction. Therefore, if the text
1790                   * only comprises NULL bytes then the token should be ignored
1791                   * here, but if there are any other characters in the stream
1792                   * the active formats should be reconstructed.
1793                   */
1794                  if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) {
1795                      // Parse error: ignore the token.
1796                      return $this->step();
1797                  }
1799                  $this->reconstruct_active_formatting_elements();
1801                  /*
1802                   * Whitespace-only text does not affect the frameset-ok flag.
1803                   * It is probably inter-element whitespace, but it may also
1804                   * contain character references which decode only to whitespace.
1805                   */
1806                  if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) {
1807                      $this->state->frameset_ok = false;
1808                  }
1810                  $this->insert_html_element( $this->state->current_token );
1811                  return true;
1813              case '#comment':
1814              case '#funky-comment':
1815              case '#presumptuous-tag':
1816                  $this->insert_html_element( $this->state->current_token );
1817                  return true;
1819              /*
1820               * > A DOCTYPE token
1821               * > Parse error. Ignore the token.
1822               */
1823              case 'html':
1824                  return $this->step();
1826              /*
1827               * > A start tag whose tag name is "html"
1828               */
1829              case '+HTML':
1830                  if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
1831                      /*
1832                       * > Otherwise, for each attribute on the token, check to see if the attribute
1833                       * > is already present on the top element of the stack of open elements. If
1834                       * > it is not, add the attribute and its corresponding value to that element.
1835                       *
1836                       * This parser does not currently support this behavior: ignore the token.
1837                       */
1838                  }
1840                  // Ignore the token.
1841                  return $this->step();
1843              /*
1844               * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link",
1845               * > "meta", "noframes", "script", "style", "template", "title"
1846               * >
1847               * > An end tag whose tag name is "template"
1848               */
1849              case '+BASE':
1850              case '+BASEFONT':
1851              case '+BGSOUND':
1852              case '+LINK':
1853              case '+META':
1854              case '+NOFRAMES':
1855              case '+SCRIPT':
1856              case '+STYLE':
1857              case '+TEMPLATE':
1858              case '+TITLE':
1859              case '-TEMPLATE':
1860                  return $this->step_in_head();
1862              /*
1863               * > A start tag whose tag name is "body"
1864               *
1865               * This tag in the IN BODY insertion mode is a parse error.
1866               */
1867              case '+BODY':
1868                  if (
1869                      1 === $this->state->stack_of_open_elements->count() ||
1870                      'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) ||
1871                      $this->state->stack_of_open_elements->contains( 'TEMPLATE' )
1872                  ) {
1873                      // Ignore the token.
1874                      return $this->step();
1875                  }
1877                  /*
1878                   * > Otherwise, set the frameset-ok flag to "not ok"; then, for each attribute
1879                   * > on the token, check to see if the attribute is already present on the body
1880                   * > element (the second element) on the stack of open elements, and if it is
1881                   * > not, add the attribute and its corresponding value to that element.
1882                   *
1883                   * This parser does not currently support this behavior: ignore the token.
1884                   */
1885                  $this->state->frameset_ok = false;
1886                  return $this->step();
1888              /*
1889               * > A start tag whose tag name is "frameset"
1890               *
1891               * This tag in the IN BODY insertion mode is a parse error.
1892               */
1893              case '+FRAMESET':
1894                  if (
1895                      1 === $this->state->stack_of_open_elements->count() ||
1896                      'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) ||
1897                      false === $this->state->frameset_ok
1898                  ) {
1899                      // Ignore the token.
1900                      return $this->step();
1901                  }
1903                  /*
1904                   * > Otherwise, run the following steps:
1905                   */
1906                  $this->bail( 'Cannot process non-ignored FRAMESET tags.' );
1907                  break;
1909              /*
1910               * > An end tag whose tag name is "body"
1911               */
1912              case '-BODY':
1913                  if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) {
1914                      // Parse error: ignore the token.
1915                      return $this->step();
1916                  }
1918                  /*
1919                   * > Otherwise, if there is a node in the stack of open elements that is not either a
1920                   * > dd element, a dt element, an li element, an optgroup element, an option element,
1921                   * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody
1922                   * > element, a td element, a tfoot element, a th element, a thread element, a tr
1923                   * > element, the body element, or the html element, then this is a parse error.
1924                   *
1925                   * There is nothing to do for this parse error, so don't check for it.
1926                   */
1928                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY;
1929                  return true;
1931              /*
1932               * > An end tag whose tag name is "html"
1933               */
1934              case '-HTML':
1935                  if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) {
1936                      // Parse error: ignore the token.
1937                      return $this->step();
1938                  }
1940                  /*
1941                   * > Otherwise, if there is a node in the stack of open elements that is not either a
1942                   * > dd element, a dt element, an li element, an optgroup element, an option element,
1943                   * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody
1944                   * > element, a td element, a tfoot element, a th element, a thread element, a tr
1945                   * > element, the body element, or the html element, then this is a parse error.
1946                   *
1947                   * There is nothing to do for this parse error, so don't check for it.
1948                   */
1950                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY;
1951                  return $this->step( self::REPROCESS_CURRENT_NODE );
1953              /*
1954               * > A start tag whose tag name is one of: "address", "article", "aside",
1955               * > "blockquote", "center", "details", "dialog", "dir", "div", "dl",
1956               * > "fieldset", "figcaption", "figure", "footer", "header", "hgroup",
1957               * > "main", "menu", "nav", "ol", "p", "search", "section", "summary", "ul"
1958               */
1959              case '+ADDRESS':
1960              case '+ARTICLE':
1961              case '+ASIDE':
1962              case '+BLOCKQUOTE':
1963              case '+CENTER':
1964              case '+DETAILS':
1965              case '+DIALOG':
1966              case '+DIR':
1967              case '+DIV':
1968              case '+DL':
1969              case '+FIELDSET':
1970              case '+FIGCAPTION':
1971              case '+FIGURE':
1972              case '+FOOTER':
1973              case '+HEADER':
1974              case '+HGROUP':
1975              case '+MAIN':
1976              case '+MENU':
1977              case '+NAV':
1978              case '+OL':
1979              case '+P':
1980              case '+SEARCH':
1981              case '+SECTION':
1982              case '+SUMMARY':
1983              case '+UL':
1984                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
1985                      $this->close_a_p_element();
1986                  }
1988                  $this->insert_html_element( $this->state->current_token );
1989                  return true;
1991              /*
1992               * > A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
1993               */
1994              case '+H1':
1995              case '+H2':
1996              case '+H3':
1997              case '+H4':
1998              case '+H5':
1999              case '+H6':
2000                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2001                      $this->close_a_p_element();
2002                  }
2004                  if (
2005                      in_array(
2006                          $this->state->stack_of_open_elements->current_node()->node_name,
2007                          array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ),
2008                          true
2009                      )
2010                  ) {
2011                      // @todo Indicate a parse error once it's possible.
2012                      $this->state->stack_of_open_elements->pop();
2013                  }
2015                  $this->insert_html_element( $this->state->current_token );
2016                  return true;
2018              /*
2019               * > A start tag whose tag name is one of: "pre", "listing"
2020               */
2021              case '+PRE':
2022              case '+LISTING':
2023                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2024                      $this->close_a_p_element();
2025                  }
2027                  /*
2028                   * > If the next token is a U+000A LINE FEED (LF) character token,
2029                   * > then ignore that token and move on to the next one. (Newlines
2030                   * > at the start of pre blocks are ignored as an authoring convenience.)
2031                   *
2032                   * This is handled in `get_modifiable_text()`.
2033                   */
2035                  $this->insert_html_element( $this->state->current_token );
2036                  $this->state->frameset_ok = false;
2037                  return true;
2039              /*
2040               * > A start tag whose tag name is "form"
2041               */
2042              case '+FORM':
2043                  $stack_contains_template = $this->state->stack_of_open_elements->contains( 'TEMPLATE' );
2045                  if ( isset( $this->state->form_element ) && ! $stack_contains_template ) {
2046                      // Parse error: ignore the token.
2047                      return $this->step();
2048                  }
2050                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2051                      $this->close_a_p_element();
2052                  }
2054                  $this->insert_html_element( $this->state->current_token );
2055                  if ( ! $stack_contains_template ) {
2056                      $this->state->form_element = $this->state->current_token;
2057                  }
2059                  return true;
2061              /*
2062               * > A start tag whose tag name is "li"
2063               * > A start tag whose tag name is one of: "dd", "dt"
2064               */
2065              case '+DD':
2066              case '+DT':
2067              case '+LI':
2068                  $this->state->frameset_ok = false;
2069                  $node                     = $this->state->stack_of_open_elements->current_node();
2070                  $is_li                    = 'LI' === $token_name;
2072                  in_body_list_loop:
2073                  /*
2074                   * The logic for LI and DT/DD is the same except for one point: LI elements _only_
2075                   * close other LI elements, but a DT or DD element closes _any_ open DT or DD element.
2076                   */
2077                  if ( $is_li ? 'LI' === $node->node_name : ( 'DD' === $node->node_name || 'DT' === $node->node_name ) ) {
2078                      $node_name = $is_li ? 'LI' : $node->node_name;
2079                      $this->generate_implied_end_tags( $node_name );
2080                      if ( ! $this->state->stack_of_open_elements->current_node_is( $node_name ) ) {
2081                          // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2082                      }
2084                      $this->state->stack_of_open_elements->pop_until( $node_name );
2085                      goto in_body_list_done;
2086                  }
2088                  if (
2089                      'ADDRESS' !== $node->node_name &&
2090                      'DIV' !== $node->node_name &&
2091                      'P' !== $node->node_name &&
2092                      self::is_special( $node )
2093                  ) {
2094                      /*
2095                       * > If node is in the special category, but is not an address, div,
2096                       * > or p element, then jump to the step labeled done below.
2097                       */
2098                      goto in_body_list_done;
2099                  } else {
2100                      /*
2101                       * > Otherwise, set node to the previous entry in the stack of open elements
2102                       * > and return to the step labeled loop.
2103                       */
2104                      foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) {
2105                          $node = $item;
2106                          break;
2107                      }
2108                      goto in_body_list_loop;
2109                  }
2111                  in_body_list_done:
2112                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2113                      $this->close_a_p_element();
2114                  }
2116                  $this->insert_html_element( $this->state->current_token );
2117                  return true;
2119              case '+PLAINTEXT':
2120                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2121                      $this->close_a_p_element();
2122                  }
2124                  /*
2125                   * @todo This may need to be handled in the Tag Processor and turn into
2126                   *       a single self-contained tag like TEXTAREA, whose modifiable text
2127                   *       is the rest of the input document as plaintext.
2128                   */
2129                  $this->bail( 'Cannot process PLAINTEXT elements.' );
2130                  break;
2132              /*
2133               * > A start tag whose tag name is "button"
2134               */
2135              case '+BUTTON':
2136                  if ( $this->state->stack_of_open_elements->has_element_in_scope( 'BUTTON' ) ) {
2137                      // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2138                      $this->generate_implied_end_tags();
2139                      $this->state->stack_of_open_elements->pop_until( 'BUTTON' );
2140                  }
2142                  $this->reconstruct_active_formatting_elements();
2143                  $this->insert_html_element( $this->state->current_token );
2144                  $this->state->frameset_ok = false;
2146                  return true;
2148              /*
2149               * > An end tag whose tag name is one of: "address", "article", "aside", "blockquote",
2150               * > "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset",
2151               * > "figcaption", "figure", "footer", "header", "hgroup", "listing", "main",
2152               * > "menu", "nav", "ol", "pre", "search", "section", "summary", "ul"
2153               */
2154              case '-ADDRESS':
2155              case '-ARTICLE':
2156              case '-ASIDE':
2157              case '-BLOCKQUOTE':
2158              case '-BUTTON':
2159              case '-CENTER':
2160              case '-DETAILS':
2161              case '-DIALOG':
2162              case '-DIR':
2163              case '-DIV':
2164              case '-DL':
2165              case '-FIELDSET':
2166              case '-FIGCAPTION':
2167              case '-FIGURE':
2168              case '-FOOTER':
2169              case '-HEADER':
2170              case '-HGROUP':
2171              case '-LISTING':
2172              case '-MAIN':
2173              case '-MENU':
2174              case '-NAV':
2175              case '-OL':
2176              case '-PRE':
2177              case '-SEARCH':
2178              case '-SECTION':
2179              case '-SUMMARY':
2180              case '-UL':
2181                  if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) {
2182                      // @todo Report parse error.
2183                      // Ignore the token.
2184                      return $this->step();
2185                  }
2187                  $this->generate_implied_end_tags();
2188                  if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2189                      // @todo Record parse error: this error doesn't impact parsing.
2190                  }
2191                  $this->state->stack_of_open_elements->pop_until( $token_name );
2192                  return true;
2194              /*
2195               * > An end tag whose tag name is "form"
2196               */
2197              case '-FORM':
2198                  if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
2199                      $node                      = $this->state->form_element;
2200                      $this->state->form_element = null;
2202                      /*
2203                       * > If node is null or if the stack of open elements does not have node
2204                       * > in scope, then this is a parse error; return and ignore the token.
2205                       *
2206                       * @todo It's necessary to check if the form token itself is in scope, not
2207                       *       simply whether any FORM is in scope.
2208                       */
2209                      if (
2210                          null === $node ||
2211                          ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' )
2212                      ) {
2213                          // Parse error: ignore the token.
2214                          return $this->step();
2215                      }
2217                      $this->generate_implied_end_tags();
2218                      if ( $node !== $this->state->stack_of_open_elements->current_node() ) {
2219                          // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2220                          $this->bail( 'Cannot close a FORM when other elements remain open as this would throw off the breadcrumbs for the following tokens.' );
2221                      }
2223                      $this->state->stack_of_open_elements->remove_node( $node );
2224                  } else {
2225                      /*
2226                       * > If the stack of open elements does not have a form element in scope,
2227                       * > then this is a parse error; return and ignore the token.
2228                       *
2229                       * Note that unlike in the clause above, this is checking for any FORM in scope.
2230                       */
2231                      if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' ) ) {
2232                          // Parse error: ignore the token.
2233                          return $this->step();
2234                      }
2236                      $this->generate_implied_end_tags();
2238                      if ( ! $this->state->stack_of_open_elements->current_node_is( 'FORM' ) ) {
2239                          // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2240                      }
2242                      $this->state->stack_of_open_elements->pop_until( 'FORM' );
2243                      return true;
2244                  }
2245                  break;
2247              /*
2248               * > An end tag whose tag name is "p"
2249               */
2250              case '-P':
2251                  if ( ! $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2252                      $this->insert_html_element( $this->state->current_token );
2253                  }
2255                  $this->close_a_p_element();
2256                  return true;
2258              /*
2259               * > An end tag whose tag name is "li"
2260               * > An end tag whose tag name is one of: "dd", "dt"
2261               */
2262              case '-DD':
2263              case '-DT':
2264              case '-LI':
2265                  if (
2266                      /*
2267                       * An end tag whose tag name is "li":
2268                       * If the stack of open elements does not have an li element in list item scope,
2269                       * then this is a parse error; ignore the token.
2270                       */
2271                      (
2272                          'LI' === $token_name &&
2273                          ! $this->state->stack_of_open_elements->has_element_in_list_item_scope( 'LI' )
2274                      ) ||
2275                      /*
2276                       * An end tag whose tag name is one of: "dd", "dt":
2277                       * If the stack of open elements does not have an element in scope that is an
2278                       * HTML element with the same tag name as that of the token, then this is a
2279                       * parse error; ignore the token.
2280                       */
2281                      (
2282                          'LI' !== $token_name &&
2283                          ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name )
2284                      )
2285                  ) {
2286                      /*
2287                       * This is a parse error, ignore the token.
2288                       *
2289                       * @todo Indicate a parse error once it's possible.
2290                       */
2291                      return $this->step();
2292                  }
2294                  $this->generate_implied_end_tags( $token_name );
2296                  if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2297                      // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2298                  }
2300                  $this->state->stack_of_open_elements->pop_until( $token_name );
2301                  return true;
2303              /*
2304               * > An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
2305               */
2306              case '-H1':
2307              case '-H2':
2308              case '-H3':
2309              case '-H4':
2310              case '-H5':
2311              case '-H6':
2312                  if ( ! $this->state->stack_of_open_elements->has_element_in_scope( '(internal: H1 through H6 - do not use)' ) ) {
2313                      /*
2314                       * This is a parse error; ignore the token.
2315                       *
2316                       * @todo Indicate a parse error once it's possible.
2317                       */
2318                      return $this->step();
2319                  }
2321                  $this->generate_implied_end_tags();
2323                  if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2324                      // @todo Record parse error: this error doesn't impact parsing.
2325                  }
2327                  $this->state->stack_of_open_elements->pop_until( '(internal: H1 through H6 - do not use)' );
2328                  return true;
2330              /*
2331               * > A start tag whose tag name is "a"
2332               */
2333              case '+A':
2334                  foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
2335                      switch ( $item->node_name ) {
2336                          case 'marker':
2337                              break 2;
2339                          case 'A':
2340                              $this->run_adoption_agency_algorithm();
2341                              $this->state->active_formatting_elements->remove_node( $item );
2342                              $this->state->stack_of_open_elements->remove_node( $item );
2343                              break 2;
2344                      }
2345                  }
2347                  $this->reconstruct_active_formatting_elements();
2348                  $this->insert_html_element( $this->state->current_token );
2349                  $this->state->active_formatting_elements->push( $this->state->current_token );
2350                  return true;
2352              /*
2353               * > A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i",
2354               * > "s", "small", "strike", "strong", "tt", "u"
2355               */
2356              case '+B':
2357              case '+BIG':
2358              case '+CODE':
2359              case '+EM':
2360              case '+FONT':
2361              case '+I':
2362              case '+S':
2363              case '+SMALL':
2364              case '+STRIKE':
2365              case '+STRONG':
2366              case '+TT':
2367              case '+U':
2368                  $this->reconstruct_active_formatting_elements();
2369                  $this->insert_html_element( $this->state->current_token );
2370                  $this->state->active_formatting_elements->push( $this->state->current_token );
2371                  return true;
2373              /*
2374               * > A start tag whose tag name is "nobr"
2375               */
2376              case '+NOBR':
2377                  $this->reconstruct_active_formatting_elements();
2379                  if ( $this->state->stack_of_open_elements->has_element_in_scope( 'NOBR' ) ) {
2380                      // Parse error.
2381                      $this->run_adoption_agency_algorithm();
2382                      $this->reconstruct_active_formatting_elements();
2383                  }
2385                  $this->insert_html_element( $this->state->current_token );
2386                  $this->state->active_formatting_elements->push( $this->state->current_token );
2387                  return true;
2389              /*
2390               * > An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i",
2391               * > "nobr", "s", "small", "strike", "strong", "tt", "u"
2392               */
2393              case '-A':
2394              case '-B':
2395              case '-BIG':
2396              case '-CODE':
2397              case '-EM':
2398              case '-FONT':
2399              case '-I':
2400              case '-NOBR':
2401              case '-S':
2402              case '-SMALL':
2403              case '-STRIKE':
2404              case '-STRONG':
2405              case '-TT':
2406              case '-U':
2407                  $this->run_adoption_agency_algorithm();
2408                  return true;
2410              /*
2411               * > A start tag whose tag name is one of: "applet", "marquee", "object"
2412               */
2413              case '+APPLET':
2414              case '+MARQUEE':
2415              case '+OBJECT':
2416                  $this->reconstruct_active_formatting_elements();
2417                  $this->insert_html_element( $this->state->current_token );
2418                  $this->state->active_formatting_elements->insert_marker();
2419                  $this->state->frameset_ok = false;
2420                  return true;
2422              /*
2423               * > A end tag token whose tag name is one of: "applet", "marquee", "object"
2424               */
2425              case '-APPLET':
2426              case '-MARQUEE':
2427              case '-OBJECT':
2428                  if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) {
2429                      // Parse error: ignore the token.
2430                      return $this->step();
2431                  }
2433                  $this->generate_implied_end_tags();
2434                  if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2435                      // This is a parse error.
2436                  }
2438                  $this->state->stack_of_open_elements->pop_until( $token_name );
2439                  $this->state->active_formatting_elements->clear_up_to_last_marker();
2440                  return true;
2442              /*
2443               * > A start tag whose tag name is "table"
2444               */
2445              case '+TABLE':
2446                  /*
2447                   * > If the Document is not set to quirks mode, and the stack of open elements
2448                   * > has a p element in button scope, then close a p element.
2449                   */
2450                  if (
2451                      WP_HTML_Tag_Processor::QUIRKS_MODE !== $this->compat_mode &&
2452                      $this->state->stack_of_open_elements->has_p_in_button_scope()
2453                  ) {
2454                      $this->close_a_p_element();
2455                  }
2457                  $this->insert_html_element( $this->state->current_token );
2458                  $this->state->frameset_ok    = false;
2459                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
2460                  return true;
2462              /*
2463               * > An end tag whose tag name is "br"
2464               *
2465               * This is prevented from happening because the Tag Processor
2466               * reports all closing BR tags as if they were opening tags.
2467               */
2469              /*
2470               * > A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr"
2471               */
2472              case '+AREA':
2473              case '+BR':
2474              case '+EMBED':
2475              case '+IMG':
2476              case '+KEYGEN':
2477              case '+WBR':
2478                  $this->reconstruct_active_formatting_elements();
2479                  $this->insert_html_element( $this->state->current_token );
2480                  $this->state->frameset_ok = false;
2481                  return true;
2483              /*
2484               * > A start tag whose tag name is "input"
2485               */
2486              case '+INPUT':
2487                  $this->reconstruct_active_formatting_elements();
2488                  $this->insert_html_element( $this->state->current_token );
2490                  /*
2491                   * > If the token does not have an attribute with the name "type", or if it does,
2492                   * > but that attribute's value is not an ASCII case-insensitive match for the
2493                   * > string "hidden", then: set the frameset-ok flag to "not ok".
2494                   */
2495                  $type_attribute = $this->get_attribute( 'type' );
2496                  if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) {
2497                      $this->state->frameset_ok = false;
2498                  }
2500                  return true;
2502              /*
2503               * > A start tag whose tag name is one of: "param", "source", "track"
2504               */
2505              case '+PARAM':
2506              case '+SOURCE':
2507              case '+TRACK':
2508                  $this->insert_html_element( $this->state->current_token );
2509                  return true;
2511              /*
2512               * > A start tag whose tag name is "hr"
2513               */
2514              case '+HR':
2515                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2516                      $this->close_a_p_element();
2517                  }
2518                  $this->insert_html_element( $this->state->current_token );
2519                  $this->state->frameset_ok = false;
2520                  return true;
2522              /*
2523               * > A start tag whose tag name is "image"
2524               */
2525              case '+IMAGE':
2526                  /*
2527                   * > Parse error. Change the token's tag name to "img" and reprocess it. (Don't ask.)
2528                   *
2529                   * Note that this is handled elsewhere, so it should not be possible to reach this code.
2530                   */
2531                  $this->bail( "Cannot process an IMAGE tag. (Don't ask.)" );
2532                  break;
2534              /*
2535               * > A start tag whose tag name is "textarea"
2536               */
2537              case '+TEXTAREA':
2538                  $this->insert_html_element( $this->state->current_token );
2540                  /*
2541                   * > If the next token is a U+000A LINE FEED (LF) character token, then ignore
2542                   * > that token and move on to the next one. (Newlines at the start of
2543                   * > textarea elements are ignored as an authoring convenience.)
2544                   *
2545                   * This is handled in `get_modifiable_text()`.
2546                   */
2548                  $this->state->frameset_ok = false;
2550                  /*
2551                   * > Switch the insertion mode to "text".
2552                   *
2553                   * As a self-contained node, this behavior is handled in the Tag Processor.
2554                   */
2555                  return true;
2557              /*
2558               * > A start tag whose tag name is "xmp"
2559               */
2560              case '+XMP':
2561                  if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2562                      $this->close_a_p_element();
2563                  }
2565                  $this->reconstruct_active_formatting_elements();
2566                  $this->state->frameset_ok = false;
2568                  /*
2569                   * > Follow the generic raw text element parsing algorithm.
2570                   *
2571                   * As a self-contained node, this behavior is handled in the Tag Processor.
2572                   */
2573                  $this->insert_html_element( $this->state->current_token );
2574                  return true;
2576              /*
2577               * A start tag whose tag name is "iframe"
2578               */
2579              case '+IFRAME':
2580                  $this->state->frameset_ok = false;
2582                  /*
2583                   * > Follow the generic raw text element parsing algorithm.
2584                   *
2585                   * As a self-contained node, this behavior is handled in the Tag Processor.
2586                   */
2587                  $this->insert_html_element( $this->state->current_token );
2588                  return true;
2590              /*
2591               * > A start tag whose tag name is "noembed"
2592               * > A start tag whose tag name is "noscript", if the scripting flag is enabled
2593               *
2594               * The scripting flag is never enabled in this parser.
2595               */
2596              case '+NOEMBED':
2597                  $this->insert_html_element( $this->state->current_token );
2598                  return true;
2600              /*
2601               * > A start tag whose tag name is "select"
2602               */
2603              case '+SELECT':
2604                  $this->reconstruct_active_formatting_elements();
2605                  $this->insert_html_element( $this->state->current_token );
2606                  $this->state->frameset_ok = false;
2608                  switch ( $this->state->insertion_mode ) {
2609                      /*
2610                       * > If the insertion mode is one of "in table", "in caption", "in table body", "in row",
2611                       * > or "in cell", then switch the insertion mode to "in select in table".
2612                       */
2613                      case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE:
2614                      case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION:
2615                      case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY:
2616                      case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW:
2617                      case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL:
2618                          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE;
2619                          break;
2621                      /*
2622                       * > Otherwise, switch the insertion mode to "in select".
2623                       */
2624                      default:
2625                          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT;
2626                          break;
2627                  }
2628                  return true;
2630              /*
2631               * > A start tag whose tag name is one of: "optgroup", "option"
2632               */
2633              case '+OPTGROUP':
2634              case '+OPTION':
2635                  if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
2636                      $this->state->stack_of_open_elements->pop();
2637                  }
2638                  $this->reconstruct_active_formatting_elements();
2639                  $this->insert_html_element( $this->state->current_token );
2640                  return true;
2642              /*
2643               * > A start tag whose tag name is one of: "rb", "rtc"
2644               */
2645              case '+RB':
2646              case '+RTC':
2647                  if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) {
2648                      $this->generate_implied_end_tags();
2650                      if ( $this->state->stack_of_open_elements->current_node_is( 'RUBY' ) ) {
2651                          // @todo Indicate a parse error once it's possible.
2652                      }
2653                  }
2655                  $this->insert_html_element( $this->state->current_token );
2656                  return true;
2658              /*
2659               * > A start tag whose tag name is one of: "rp", "rt"
2660               */
2661              case '+RP':
2662              case '+RT':
2663                  if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) {
2664                      $this->generate_implied_end_tags( 'RTC' );
2666                      $current_node_name = $this->state->stack_of_open_elements->current_node()->node_name;
2667                      if ( 'RTC' === $current_node_name || 'RUBY' === $current_node_name ) {
2668                          // @todo Indicate a parse error once it's possible.
2669                      }
2670                  }
2672                  $this->insert_html_element( $this->state->current_token );
2673                  return true;
2675              /*
2676               * > A start tag whose tag name is "math"
2677               */
2678              case '+MATH':
2679                  $this->reconstruct_active_formatting_elements();
2681                  /*
2682                   * @todo Adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.)
2683                   * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink.)
2684                   *
2685                   * These ought to be handled in the attribute methods.
2686                   */
2687                  $this->state->current_token->namespace = 'math';
2688                  $this->insert_html_element( $this->state->current_token );
2689                  if ( $this->state->current_token->has_self_closing_flag ) {
2690                      $this->state->stack_of_open_elements->pop();
2691                  }
2692                  return true;
2694              /*
2695               * > A start tag whose tag name is "svg"
2696               */
2697              case '+SVG':
2698                  $this->reconstruct_active_formatting_elements();
2700                  /*
2701                   * @todo Adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.)
2702                   * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.)
2703                   *
2704                   * These ought to be handled in the attribute methods.
2705                   */
2706                  $this->state->current_token->namespace = 'svg';
2707                  $this->insert_html_element( $this->state->current_token );
2708                  if ( $this->state->current_token->has_self_closing_flag ) {
2709                      $this->state->stack_of_open_elements->pop();
2710                  }
2711                  return true;
2713              /*
2714               * > A start tag whose tag name is one of: "caption", "col", "colgroup",
2715               * > "frame", "head", "tbody", "td", "tfoot", "th", "thead", "tr"
2716               */
2717              case '+CAPTION':
2718              case '+COL':
2719              case '+COLGROUP':
2720              case '+FRAME':
2721              case '+HEAD':
2722              case '+TBODY':
2723              case '+TD':
2724              case '+TFOOT':
2725              case '+TH':
2726              case '+THEAD':
2727              case '+TR':
2728                  // Parse error. Ignore the token.
2729                  return $this->step();
2730          }
2732          if ( ! parent::is_tag_closer() ) {
2733              /*
2734               * > Any other start tag
2735               */
2736              $this->reconstruct_active_formatting_elements();
2737              $this->insert_html_element( $this->state->current_token );
2738              return true;
2739          } else {
2740              /*
2741               * > Any other end tag
2742               */
2744              /*
2745               * Find the corresponding tag opener in the stack of open elements, if
2746               * it exists before reaching a special element, which provides a kind
2747               * of boundary in the stack. For example, a `</custom-tag>` should not
2748               * close anything beyond its containing `P` or `DIV` element.
2749               */
2750              foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) {
2751                  if ( 'html' === $node->namespace && $token_name === $node->node_name ) {
2752                      break;
2753                  }
2755                  if ( self::is_special( $node ) ) {
2756                      // This is a parse error, ignore the token.
2757                      return $this->step();
2758                  }
2759              }
2761              $this->generate_implied_end_tags( $token_name );
2762              if ( $node !== $this->state->stack_of_open_elements->current_node() ) {
2763                  // @todo Record parse error: this error doesn't impact parsing.
2764              }
2766              foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
2767                  $this->state->stack_of_open_elements->pop();
2768                  if ( $node === $item ) {
2769                      return true;
2770                  }
2771              }
2772          }
2774          $this->bail( 'Should not have been able to reach end of IN BODY processing. Check HTML API code.' );
2775          // This unnecessary return prevents tools from inaccurately reporting type errors.
2776          return false;
2777      }
2779      /**
2780       * Parses next element in the 'in table' insertion mode.
2781       *
2782       * This internal function performs the 'in table' insertion mode
2783       * logic for the generalized WP_HTML_Processor::step() function.
2784       *
2785       * @since 6.7.0
2786       *
2787       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
2788       *
2789       * @see https://html.spec.whatwg.org/#parsing-main-intable
2790       * @see WP_HTML_Processor::step
2791       *
2792       * @return bool Whether an element was found.
2793       */
2794  	private function step_in_table(): bool {
2795          $token_name = $this->get_token_name();
2796          $token_type = $this->get_token_type();
2797          $op_sigil   = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
2798          $op         = "{$op_sigil}{$token_name}";
2800          switch ( $op ) {
2801              /*
2802               * > A character token, if the current node is table,
2803               * > tbody, template, tfoot, thead, or tr element
2804               */
2805              case '#text':
2806                  $current_node      = $this->state->stack_of_open_elements->current_node();
2807                  $current_node_name = $current_node ? $current_node->node_name : null;
2808                  if (
2809                      $current_node_name && (
2810                          'TABLE' === $current_node_name ||
2811                          'TBODY' === $current_node_name ||
2812                          'TEMPLATE' === $current_node_name ||
2813                          'TFOOT' === $current_node_name ||
2814                          'THEAD' === $current_node_name ||
2815                          'TR' === $current_node_name
2816                      )
2817                  ) {
2818                      /*
2819                       * If the text is empty after processing HTML entities and stripping
2820                       * U+0000 NULL bytes then ignore the token.
2821                       */
2822                      if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) {
2823                          return $this->step();
2824                      }
2826                      /*
2827                       * This follows the rules for "in table text" insertion mode.
2828                       *
2829                       * Whitespace-only text nodes are inserted in-place. Otherwise
2830                       * foster parenting is enabled and the nodes would be
2831                       * inserted out-of-place.
2832                       *
2833                       * > If any of the tokens in the pending table character tokens
2834                       * > list are character tokens that are not ASCII whitespace,
2835                       * > then this is a parse error: reprocess the character tokens
2836                       * > in the pending table character tokens list using the rules
2837                       * > given in the "anything else" entry in the "in table"
2838                       * > insertion mode.
2839                       * >
2840                       * > Otherwise, insert the characters given by the pending table
2841                       * > character tokens list.
2842                       *
2843                       * @see https://html.spec.whatwg.org/#parsing-main-intabletext
2844                       */
2845                      if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
2846                          $this->insert_html_element( $this->state->current_token );
2847                          return true;
2848                      }
2850                      // Non-whitespace would trigger fostering, unsupported at this time.
2851                      $this->bail( 'Foster parenting is not supported.' );
2852                      break;
2853                  }
2854                  break;
2856              /*
2857               * > A comment token
2858               */
2859              case '#comment':
2860              case '#funky-comment':
2861              case '#presumptuous-tag':
2862                  $this->insert_html_element( $this->state->current_token );
2863                  return true;
2865              /*
2866               * > A DOCTYPE token
2867               */
2868              case 'html':
2869                  // Parse error: ignore the token.
2870                  return $this->step();
2872              /*
2873               * > A start tag whose tag name is "caption"
2874               */
2875              case '+CAPTION':
2876                  $this->state->stack_of_open_elements->clear_to_table_context();
2877                  $this->state->active_formatting_elements->insert_marker();
2878                  $this->insert_html_element( $this->state->current_token );
2879                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION;
2880                  return true;
2882              /*
2883               * > A start tag whose tag name is "colgroup"
2884               */
2885              case '+COLGROUP':
2886                  $this->state->stack_of_open_elements->clear_to_table_context();
2887                  $this->insert_html_element( $this->state->current_token );
2888                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
2889                  return true;
2891              /*
2892               * > A start tag whose tag name is "col"
2893               */
2894              case '+COL':
2895                  $this->state->stack_of_open_elements->clear_to_table_context();
2897                  /*
2898                   * > Insert an HTML element for a "colgroup" start tag token with no attributes,
2899                   * > then switch the insertion mode to "in column group".
2900                   */
2901                  $this->insert_virtual_node( 'COLGROUP' );
2902                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
2903                  return $this->step( self::REPROCESS_CURRENT_NODE );
2905              /*
2906               * > A start tag whose tag name is one of: "tbody", "tfoot", "thead"
2907               */
2908              case '+TBODY':
2909              case '+TFOOT':
2910              case '+THEAD':
2911                  $this->state->stack_of_open_elements->clear_to_table_context();
2912                  $this->insert_html_element( $this->state->current_token );
2913                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
2914                  return true;
2916              /*
2917               * > A start tag whose tag name is one of: "td", "th", "tr"
2918               */
2919              case '+TD':
2920              case '+TH':
2921              case '+TR':
2922                  $this->state->stack_of_open_elements->clear_to_table_context();
2923                  /*
2924                   * > Insert an HTML element for a "tbody" start tag token with no attributes,
2925                   * > then switch the insertion mode to "in table body".
2926                   */
2927                  $this->insert_virtual_node( 'TBODY' );
2928                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
2929                  return $this->step( self::REPROCESS_CURRENT_NODE );
2931              /*
2932               * > A start tag whose tag name is "table"
2933               *
2934               * This tag in the IN TABLE insertion mode is a parse error.
2935               */
2936              case '+TABLE':
2937                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) {
2938                      return $this->step();
2939                  }
2941                  $this->state->stack_of_open_elements->pop_until( 'TABLE' );
2942                  $this->reset_insertion_mode_appropriately();
2943                  return $this->step( self::REPROCESS_CURRENT_NODE );
2945              /*
2946               * > An end tag whose tag name is "table"
2947               */
2948              case '-TABLE':
2949                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) {
2950                      // @todo Indicate a parse error once it's possible.
2951                      return $this->step();
2952                  }
2954                  $this->state->stack_of_open_elements->pop_until( 'TABLE' );
2955                  $this->reset_insertion_mode_appropriately();
2956                  return true;
2958              /*
2959               * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr"
2960               */
2961              case '-BODY':
2962              case '-CAPTION':
2963              case '-COL':
2964              case '-COLGROUP':
2965              case '-HTML':
2966              case '-TBODY':
2967              case '-TD':
2968              case '-TFOOT':
2969              case '-TH':
2970              case '-THEAD':
2971              case '-TR':
2972                  // Parse error: ignore the token.
2973                  return $this->step();
2975              /*
2976               * > A start tag whose tag name is one of: "style", "script", "template"
2977               * > An end tag whose tag name is "template"
2978               */
2979              case '+STYLE':
2980              case '+SCRIPT':
2981              case '+TEMPLATE':
2982              case '-TEMPLATE':
2983                  /*
2984                   * > Process the token using the rules for the "in head" insertion mode.
2985                   */
2986                  return $this->step_in_head();
2988              /*
2989               * > A start tag whose tag name is "input"
2990               *
2991               * > If the token does not have an attribute with the name "type", or if it does, but
2992               * > that attribute's value is not an ASCII case-insensitive match for the string
2993               * > "hidden", then: act as described in the "anything else" entry below.
2994               */
2995              case '+INPUT':
2996                  $type_attribute = $this->get_attribute( 'type' );
2997                  if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) {
2998                      goto anything_else;
2999                  }
3000                  // @todo Indicate a parse error once it's possible.
3001                  $this->insert_html_element( $this->state->current_token );
3002                  return true;
3004              /*
3005               * > A start tag whose tag name is "form"
3006               *
3007               * This tag in the IN TABLE insertion mode is a parse error.
3008               */
3009              case '+FORM':
3010                  if (
3011                      $this->state->stack_of_open_elements->has_element_in_scope( 'TEMPLATE' ) ||
3012                      isset( $this->state->form_element )
3013                  ) {
3014                      return $this->step();
3015                  }
3017                  // This FORM is special because it immediately closes and cannot have other children.
3018                  $this->insert_html_element( $this->state->current_token );
3019                  $this->state->form_element = $this->state->current_token;
3020                  $this->state->stack_of_open_elements->pop();
3021                  return true;
3022          }
3024          /*
3025           * > Anything else
3026           * > Parse error. Enable foster parenting, process the token using the rules for the
3027           * > "in body" insertion mode, and then disable foster parenting.
3028           *
3029           * @todo Indicate a parse error once it's possible.
3030           */
3031          anything_else:
3032          $this->bail( 'Foster parenting is not supported.' );
3033      }
3035      /**
3036       * Parses next element in the 'in table text' insertion mode.
3037       *
3038       * This internal function performs the 'in table text' insertion mode
3039       * logic for the generalized WP_HTML_Processor::step() function.
3040       *
3041       * @since 6.7.0 Stub implementation.
3042       *
3043       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3044       *
3045       * @see https://html.spec.whatwg.org/#parsing-main-intabletext
3046       * @see WP_HTML_Processor::step
3047       *
3048       * @return bool Whether an element was found.
3049       */
3050  	private function step_in_table_text(): bool {
3051          $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT . ' state.' );
3052      }
3054      /**
3055       * Parses next element in the 'in caption' insertion mode.
3056       *
3057       * This internal function performs the 'in caption' insertion mode
3058       * logic for the generalized WP_HTML_Processor::step() function.
3059       *
3060       * @since 6.7.0
3061       *
3062       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3063       *
3064       * @see https://html.spec.whatwg.org/#parsing-main-incaption
3065       * @see WP_HTML_Processor::step
3066       *
3067       * @return bool Whether an element was found.
3068       */
3069  	private function step_in_caption(): bool {
3070          $tag_name = $this->get_tag();
3071          $op_sigil = $this->is_tag_closer() ? '-' : '+';
3072          $op       = "{$op_sigil}{$tag_name}";
3074          switch ( $op ) {
3075              /*
3076               * > An end tag whose tag name is "caption"
3077               * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr"
3078               * > An end tag whose tag name is "table"
3079               *
3080               * These tag handling rules are identical except for the final instruction.
3081               * Handle them in a single block.
3082               */
3083              case '-CAPTION':
3084              case '+CAPTION':
3085              case '+COL':
3086              case '+COLGROUP':
3087              case '+TBODY':
3088              case '+TD':
3089              case '+TFOOT':
3090              case '+TH':
3091              case '+THEAD':
3092              case '+TR':
3093              case '-TABLE':
3094                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'CAPTION' ) ) {
3095                      // Parse error: ignore the token.
3096                      return $this->step();
3097                  }
3099                  $this->generate_implied_end_tags();
3100                  if ( ! $this->state->stack_of_open_elements->current_node_is( 'CAPTION' ) ) {
3101                      // @todo Indicate a parse error once it's possible.
3102                  }
3104                  $this->state->stack_of_open_elements->pop_until( 'CAPTION' );
3105                  $this->state->active_formatting_elements->clear_up_to_last_marker();
3106                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3108                  // If this is not a CAPTION end tag, the token should be reprocessed.
3109                  if ( '-CAPTION' === $op ) {
3110                      return true;
3111                  }
3112                  return $this->step( self::REPROCESS_CURRENT_NODE );
3114              /**
3115               * > An end tag whose tag name is one of: "body", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr"
3116               */
3117              case '-BODY':
3118              case '-COL':
3119              case '-COLGROUP':
3120              case '-HTML':
3121              case '-TBODY':
3122              case '-TD':
3123              case '-TFOOT':
3124              case '-TH':
3125              case '-THEAD':
3126              case '-TR':
3127                  // Parse error: ignore the token.
3128                  return $this->step();
3129          }
3131          /**
3132           * > Anything else
3133           * >   Process the token using the rules for the "in body" insertion mode.
3134           */
3135          return $this->step_in_body();
3136      }
3138      /**
3139       * Parses next element in the 'in column group' insertion mode.
3140       *
3141       * This internal function performs the 'in column group' insertion mode
3142       * logic for the generalized WP_HTML_Processor::step() function.
3143       *
3144       * @since 6.7.0
3145       *
3146       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3147       *
3148       * @see https://html.spec.whatwg.org/#parsing-main-incolgroup
3149       * @see WP_HTML_Processor::step
3150       *
3151       * @return bool Whether an element was found.
3152       */
3153  	private function step_in_column_group(): bool {
3154          $token_name = $this->get_token_name();
3155          $token_type = $this->get_token_type();
3156          $op_sigil   = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
3157          $op         = "{$op_sigil}{$token_name}";
3159          switch ( $op ) {
3160              /*
3161               * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
3162               * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
3163               */
3164              case '#text':
3165                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
3166                      // Insert the character.
3167                      $this->insert_html_element( $this->state->current_token );
3168                      return true;
3169                  }
3171                  goto in_column_group_anything_else;
3172                  break;
3174              /*
3175               * > A comment token
3176               */
3177              case '#comment':
3178              case '#funky-comment':
3179              case '#presumptuous-tag':
3180                  $this->insert_html_element( $this->state->current_token );
3181                  return true;
3183              /*
3184               * > A DOCTYPE token
3185               */
3186              case 'html':
3187                  // @todo Indicate a parse error once it's possible.
3188                  return $this->step();
3190              /*
3191               * > A start tag whose tag name is "html"
3192               */
3193              case '+HTML':
3194                  return $this->step_in_body();
3196              /*
3197               * > A start tag whose tag name is "col"
3198               */
3199              case '+COL':
3200                  $this->insert_html_element( $this->state->current_token );
3201                  $this->state->stack_of_open_elements->pop();
3202                  return true;
3204              /*
3205               * > An end tag whose tag name is "colgroup"
3206               */
3207              case '-COLGROUP':
3208                  if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) {
3209                      // @todo Indicate a parse error once it's possible.
3210                      return $this->step();
3211                  }
3212                  $this->state->stack_of_open_elements->pop();
3213                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3214                  return true;
3216              /*
3217               * > An end tag whose tag name is "col"
3218               */
3219              case '-COL':
3220                  // Parse error: ignore the token.
3221                  return $this->step();
3223              /*
3224               * > A start tag whose tag name is "template"
3225               * > An end tag whose tag name is "template"
3226               */
3227              case '+TEMPLATE':
3228              case '-TEMPLATE':
3229                  return $this->step_in_head();
3230          }
3232          in_column_group_anything_else:
3233          /*
3234           * > Anything else
3235           */
3236          if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) {
3237              // @todo Indicate a parse error once it's possible.
3238              return $this->step();
3239          }
3240          $this->state->stack_of_open_elements->pop();
3241          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3242          return $this->step( self::REPROCESS_CURRENT_NODE );
3243      }
3245      /**
3246       * Parses next element in the 'in table body' insertion mode.
3247       *
3248       * This internal function performs the 'in table body' insertion mode
3249       * logic for the generalized WP_HTML_Processor::step() function.
3250       *
3251       * @since 6.7.0
3252       *
3253       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3254       *
3255       * @see https://html.spec.whatwg.org/#parsing-main-intbody
3256       * @see WP_HTML_Processor::step
3257       *
3258       * @return bool Whether an element was found.
3259       */
3260  	private function step_in_table_body(): bool {
3261          $tag_name = $this->get_tag();
3262          $op_sigil = $this->is_tag_closer() ? '-' : '+';
3263          $op       = "{$op_sigil}{$tag_name}";
3265          switch ( $op ) {
3266              /*
3267               * > A start tag whose tag name is "tr"
3268               */
3269              case '+TR':
3270                  $this->state->stack_of_open_elements->clear_to_table_body_context();
3271                  $this->insert_html_element( $this->state->current_token );
3272                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3273                  return true;
3275              /*
3276               * > A start tag whose tag name is one of: "th", "td"
3277               */
3278              case '+TH':
3279              case '+TD':
3280                  // @todo Indicate a parse error once it's possible.
3281                  $this->state->stack_of_open_elements->clear_to_table_body_context();
3282                  $this->insert_virtual_node( 'TR' );
3283                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3284                  return $this->step( self::REPROCESS_CURRENT_NODE );
3286              /*
3287               * > An end tag whose tag name is one of: "tbody", "tfoot", "thead"
3288               */
3289              case '-TBODY':
3290              case '-TFOOT':
3291              case '-THEAD':
3292                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3293                      // Parse error: ignore the token.
3294                      return $this->step();
3295                  }
3297                  $this->state->stack_of_open_elements->clear_to_table_body_context();
3298                  $this->state->stack_of_open_elements->pop();
3299                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3300                  return true;
3302              /*
3303               * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead"
3304               * > An end tag whose tag name is "table"
3305               */
3306              case '+CAPTION':
3307              case '+COL':
3308              case '+COLGROUP':
3309              case '+TBODY':
3310              case '+TFOOT':
3311              case '+THEAD':
3312              case '-TABLE':
3313                  if (
3314                      ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TBODY' ) &&
3315                      ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'THEAD' ) &&
3316                      ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TFOOT' )
3317                  ) {
3318                      // Parse error: ignore the token.
3319                      return $this->step();
3320                  }
3321                  $this->state->stack_of_open_elements->clear_to_table_body_context();
3322                  $this->state->stack_of_open_elements->pop();
3323                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3324                  return $this->step( self::REPROCESS_CURRENT_NODE );
3326              /*
3327               * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th", "tr"
3328               */
3329              case '-BODY':
3330              case '-CAPTION':
3331              case '-COL':
3332              case '-COLGROUP':
3333              case '-HTML':
3334              case '-TD':
3335              case '-TH':
3336              case '-TR':
3337                  // Parse error: ignore the token.
3338                  return $this->step();
3339          }
3341          /*
3342           * > Anything else
3343           * > Process the token using the rules for the "in table" insertion mode.
3344           */
3345          return $this->step_in_table();
3346      }
3348      /**
3349       * Parses next element in the 'in row' insertion mode.
3350       *
3351       * This internal function performs the 'in row' insertion mode
3352       * logic for the generalized WP_HTML_Processor::step() function.
3353       *
3354       * @since 6.7.0
3355       *
3356       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3357       *
3358       * @see https://html.spec.whatwg.org/#parsing-main-intr
3359       * @see WP_HTML_Processor::step
3360       *
3361       * @return bool Whether an element was found.
3362       */
3363  	private function step_in_row(): bool {
3364          $tag_name = $this->get_tag();
3365          $op_sigil = $this->is_tag_closer() ? '-' : '+';
3366          $op       = "{$op_sigil}{$tag_name}";
3368          switch ( $op ) {
3369              /*
3370               * > A start tag whose tag name is one of: "th", "td"
3371               */
3372              case '+TH':
3373              case '+TD':
3374                  $this->state->stack_of_open_elements->clear_to_table_row_context();
3375                  $this->insert_html_element( $this->state->current_token );
3376                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL;
3377                  $this->state->active_formatting_elements->insert_marker();
3378                  return true;
3380              /*
3381               * > An end tag whose tag name is "tr"
3382               */
3383              case '-TR':
3384                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) {
3385                      // Parse error: ignore the token.
3386                      return $this->step();
3387                  }
3389                  $this->state->stack_of_open_elements->clear_to_table_row_context();
3390                  $this->state->stack_of_open_elements->pop();
3391                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3392                  return true;
3394              /*
3395               * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead", "tr"
3396               * > An end tag whose tag name is "table"
3397               */
3398              case '+CAPTION':
3399              case '+COL':
3400              case '+COLGROUP':
3401              case '+TBODY':
3402              case '+TFOOT':
3403              case '+THEAD':
3404              case '+TR':
3405              case '-TABLE':
3406                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) {
3407                      // Parse error: ignore the token.
3408                      return $this->step();
3409                  }
3411                  $this->state->stack_of_open_elements->clear_to_table_row_context();
3412                  $this->state->stack_of_open_elements->pop();
3413                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3414                  return $this->step( self::REPROCESS_CURRENT_NODE );
3416              /*
3417               * > An end tag whose tag name is one of: "tbody", "tfoot", "thead"
3418               */
3419              case '-TBODY':
3420              case '-TFOOT':
3421              case '-THEAD':
3422                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3423                      // Parse error: ignore the token.
3424                      return $this->step();
3425                  }
3427                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) {
3428                      // Ignore the token.
3429                      return $this->step();
3430                  }
3432                  $this->state->stack_of_open_elements->clear_to_table_row_context();
3433                  $this->state->stack_of_open_elements->pop();
3434                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3435                  return $this->step( self::REPROCESS_CURRENT_NODE );
3437              /*
3438               * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th"
3439               */
3440              case '-BODY':
3441              case '-CAPTION':
3442              case '-COL':
3443              case '-COLGROUP':
3444              case '-HTML':
3445              case '-TD':
3446              case '-TH':
3447                  // Parse error: ignore the token.
3448                  return $this->step();
3449          }
3451          /*
3452           * > Anything else
3453           * >   Process the token using the rules for the "in table" insertion mode.
3454           */
3455          return $this->step_in_table();
3456      }
3458      /**
3459       * Parses next element in the 'in cell' insertion mode.
3460       *
3461       * This internal function performs the 'in cell' insertion mode
3462       * logic for the generalized WP_HTML_Processor::step() function.
3463       *
3464       * @since 6.7.0
3465       *
3466       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3467       *
3468       * @see https://html.spec.whatwg.org/#parsing-main-intd
3469       * @see WP_HTML_Processor::step
3470       *
3471       * @return bool Whether an element was found.
3472       */
3473  	private function step_in_cell(): bool {
3474          $tag_name = $this->get_tag();
3475          $op_sigil = $this->is_tag_closer() ? '-' : '+';
3476          $op       = "{$op_sigil}{$tag_name}";
3478          switch ( $op ) {
3479              /*
3480               * > An end tag whose tag name is one of: "td", "th"
3481               */
3482              case '-TD':
3483              case '-TH':
3484                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3485                      // Parse error: ignore the token.
3486                      return $this->step();
3487                  }
3489                  $this->generate_implied_end_tags();
3491                  /*
3492                   * @todo This needs to check if the current node is an HTML element, meaning that
3493                   *       when SVG and MathML support is added, this needs to differentiate between an
3494                   *       HTML element of the given name, such as `<center>`, and a foreign element of
3495                   *       the same given name.
3496                   */
3497                  if ( ! $this->state->stack_of_open_elements->current_node_is( $tag_name ) ) {
3498                      // @todo Indicate a parse error once it's possible.
3499                  }
3501                  $this->state->stack_of_open_elements->pop_until( $tag_name );
3502                  $this->state->active_formatting_elements->clear_up_to_last_marker();
3503                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3504                  return true;
3506              /*
3507               * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td",
3508               * > "tfoot", "th", "thead", "tr"
3509               */
3510              case '+CAPTION':
3511              case '+COL':
3512              case '+COLGROUP':
3513              case '+TBODY':
3514              case '+TD':
3515              case '+TFOOT':
3516              case '+TH':
3517              case '+THEAD':
3518              case '+TR':
3519                  /*
3520                   * > Assert: The stack of open elements has a td or th element in table scope.
3521                   *
3522                   * Nothing to do here, except to verify in tests that this never appears.
3523                   */
3525                  $this->close_cell();
3526                  return $this->step( self::REPROCESS_CURRENT_NODE );
3528              /*
3529               * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html"
3530               */
3531              case '-BODY':
3532              case '-CAPTION':
3533              case '-COL':
3534              case '-COLGROUP':
3535              case '-HTML':
3536                  // Parse error: ignore the token.
3537                  return $this->step();
3539              /*
3540               * > An end tag whose tag name is one of: "table", "tbody", "tfoot", "thead", "tr"
3541               */
3542              case '-TABLE':
3543              case '-TBODY':
3544              case '-TFOOT':
3545              case '-THEAD':
3546              case '-TR':
3547                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3548                      // Parse error: ignore the token.
3549                      return $this->step();
3550                  }
3551                  $this->close_cell();
3552                  return $this->step( self::REPROCESS_CURRENT_NODE );
3553          }
3555          /*
3556           * > Anything else
3557           * >   Process the token using the rules for the "in body" insertion mode.
3558           */
3559          return $this->step_in_body();
3560      }
3562      /**
3563       * Parses next element in the 'in select' insertion mode.
3564       *
3565       * This internal function performs the 'in select' insertion mode
3566       * logic for the generalized WP_HTML_Processor::step() function.
3567       *
3568       * @since 6.7.0
3569       *
3570       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3571       *
3572       * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect
3573       * @see WP_HTML_Processor::step
3574       *
3575       * @return bool Whether an element was found.
3576       */
3577  	private function step_in_select(): bool {
3578          $token_name = $this->get_token_name();
3579          $token_type = $this->get_token_type();
3580          $op_sigil   = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
3581          $op         = "{$op_sigil}{$token_name}";
3583          switch ( $op ) {
3584              /*
3585               * > Any other character token
3586               */
3587              case '#text':
3588                  /*
3589                   * > A character token that is U+0000 NULL
3590                   *
3591                   * If a text node only comprises null bytes then it should be
3592                   * entirely ignored and should not return to calling code.
3593                   */
3594                  if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) {
3595                      // Parse error: ignore the token.
3596                      return $this->step();
3597                  }
3599                  $this->insert_html_element( $this->state->current_token );
3600                  return true;
3602              /*
3603               * > A comment token
3604               */
3605              case '#comment':
3606              case '#funky-comment':
3607              case '#presumptuous-tag':
3608                  $this->insert_html_element( $this->state->current_token );
3609                  return true;
3611              /*
3612               * > A DOCTYPE token
3613               */
3614              case 'html':
3615                  // Parse error: ignore the token.
3616                  return $this->step();
3618              /*
3619               * > A start tag whose tag name is "html"
3620               */
3621              case '+HTML':
3622                  return $this->step_in_body();
3624              /*
3625               * > A start tag whose tag name is "option"
3626               */
3627              case '+OPTION':
3628                  if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
3629                      $this->state->stack_of_open_elements->pop();
3630                  }
3631                  $this->insert_html_element( $this->state->current_token );
3632                  return true;
3634              /*
3635               * > A start tag whose tag name is "optgroup"
3636               * > A start tag whose tag name is "hr"
3637               *
3638               * These rules are identical except for the treatment of the self-closing flag and
3639               * the subsequent pop of the HR void element, all of which is handled elsewhere in the processor.
3640               */
3641              case '+OPTGROUP':
3642              case '+HR':
3643                  if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
3644                      $this->state->stack_of_open_elements->pop();
3645                  }
3647                  if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) {
3648                      $this->state->stack_of_open_elements->pop();
3649                  }
3651                  $this->insert_html_element( $this->state->current_token );
3652                  return true;
3654              /*
3655               * > An end tag whose tag name is "optgroup"
3656               */
3657              case '-OPTGROUP':
3658                  $current_node = $this->state->stack_of_open_elements->current_node();
3659                  if ( $current_node && 'OPTION' === $current_node->node_name ) {
3660                      foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) {
3661                          break;
3662                      }
3663                      if ( $parent && 'OPTGROUP' === $parent->node_name ) {
3664                          $this->state->stack_of_open_elements->pop();
3665                      }
3666                  }
3668                  if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) {
3669                      $this->state->stack_of_open_elements->pop();
3670                      return true;
3671                  }
3673                  // Parse error: ignore the token.
3674                  return $this->step();
3676              /*
3677               * > An end tag whose tag name is "option"
3678               */
3679              case '-OPTION':
3680                  if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
3681                      $this->state->stack_of_open_elements->pop();
3682                      return true;
3683                  }
3685                  // Parse error: ignore the token.
3686                  return $this->step();
3688              /*
3689               * > An end tag whose tag name is "select"
3690               * > A start tag whose tag name is "select"
3691               *
3692               * > It just gets treated like an end tag.
3693               */
3694              case '-SELECT':
3695              case '+SELECT':
3696                  if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) {
3697                      // Parse error: ignore the token.
3698                      return $this->step();
3699                  }
3700                  $this->state->stack_of_open_elements->pop_until( 'SELECT' );
3701                  $this->reset_insertion_mode_appropriately();
3702                  return true;
3704              /*
3705               * > A start tag whose tag name is one of: "input", "keygen", "textarea"
3706               *
3707               * All three of these tags are considered a parse error when found in this insertion mode.
3708               */
3709              case '+INPUT':
3710              case '+KEYGEN':
3711              case '+TEXTAREA':
3712                  if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) {
3713                      // Ignore the token.
3714                      return $this->step();
3715                  }
3716                  $this->state->stack_of_open_elements->pop_until( 'SELECT' );
3717                  $this->reset_insertion_mode_appropriately();
3718                  return $this->step( self::REPROCESS_CURRENT_NODE );
3720              /*
3721               * > A start tag whose tag name is one of: "script", "template"
3722               * > An end tag whose tag name is "template"
3723               */
3724              case '+SCRIPT':
3725              case '+TEMPLATE':
3726              case '-TEMPLATE':
3727                  return $this->step_in_head();
3728          }
3730          /*
3731           * > Anything else
3732           * >   Parse error: ignore the token.
3733           */
3734          return $this->step();
3735      }
3737      /**
3738       * Parses next element in the 'in select in table' insertion mode.
3739       *
3740       * This internal function performs the 'in select in table' insertion mode
3741       * logic for the generalized WP_HTML_Processor::step() function.
3742       *
3743       * @since 6.7.0
3744       *
3745       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3746       *
3747       * @see https://html.spec.whatwg.org/#parsing-main-inselectintable
3748       * @see WP_HTML_Processor::step
3749       *
3750       * @return bool Whether an element was found.
3751       */
3752  	private function step_in_select_in_table(): bool {
3753          $token_name = $this->get_token_name();
3754          $token_type = $this->get_token_type();
3755          $op_sigil   = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
3756          $op         = "{$op_sigil}{$token_name}";
3758          switch ( $op ) {
3759              /*
3760               * > A start tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th"
3761               */
3762              case '+CAPTION':
3763              case '+TABLE':
3764              case '+TBODY':
3765              case '+TFOOT':
3766              case '+THEAD':
3767              case '+TR':
3768              case '+TD':
3769              case '+TH':
3770                  // @todo Indicate a parse error once it's possible.
3771                  $this->state->stack_of_open_elements->pop_until( 'SELECT' );
3772                  $this->reset_insertion_mode_appropriately();
3773                  return $this->step( self::REPROCESS_CURRENT_NODE );
3775              /*
3776               * > An end tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th"
3777               */
3778              case '-CAPTION':
3779              case '-TABLE':
3780              case '-TBODY':
3781              case '-TFOOT':
3782              case '-THEAD':
3783              case '-TR':
3784              case '-TD':
3785              case '-TH':
3786                  // @todo Indicate a parse error once it's possible.
3787                  if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $token_name ) ) {
3788                      return $this->step();
3789                  }
3790                  $this->state->stack_of_open_elements->pop_until( 'SELECT' );
3791                  $this->reset_insertion_mode_appropriately();
3792                  return $this->step( self::REPROCESS_CURRENT_NODE );
3793          }
3795          /*
3796           * > Anything else
3797           */
3798          return $this->step_in_select();
3799      }
3801      /**
3802       * Parses next element in the 'in template' insertion mode.
3803       *
3804       * This internal function performs the 'in template' insertion mode
3805       * logic for the generalized WP_HTML_Processor::step() function.
3806       *
3807       * @since 6.7.0 Stub implementation.
3808       *
3809       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3810       *
3811       * @see https://html.spec.whatwg.org/#parsing-main-intemplate
3812       * @see WP_HTML_Processor::step
3813       *
3814       * @return bool Whether an element was found.
3815       */
3816  	private function step_in_template(): bool {
3817          $token_name = $this->get_token_name();
3818          $token_type = $this->get_token_type();
3819          $is_closer  = $this->is_tag_closer();
3820          $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
3821          $op         = "{$op_sigil}{$token_name}";
3823          switch ( $op ) {
3824              /*
3825               * > A character token
3826               * > A comment token
3827               * > A DOCTYPE token
3828               */
3829              case '#text':
3830              case '#comment':
3831              case '#funky-comment':
3832              case '#presumptuous-tag':
3833              case 'html':
3834                  return $this->step_in_body();
3836              /*
3837               * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link",
3838               * > "meta", "noframes", "script", "style", "template", "title"
3839               * > An end tag whose tag name is "template"
3840               */
3841              case '+BASE':
3842              case '+BASEFONT':
3843              case '+BGSOUND':
3844              case '+LINK':
3845              case '+META':
3846              case '+NOFRAMES':
3847              case '+SCRIPT':
3848              case '+STYLE':
3849              case '+TEMPLATE':
3850              case '+TITLE':
3851              case '-TEMPLATE':
3852                  return $this->step_in_head();
3854              /*
3855               * > A start tag whose tag name is one of: "caption", "colgroup", "tbody", "tfoot", "thead"
3856               */
3857              case '+CAPTION':
3858              case '+COLGROUP':
3859              case '+TBODY':
3860              case '+TFOOT':
3861              case '+THEAD':
3862                  array_pop( $this->state->stack_of_template_insertion_modes );
3863                  $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3864                  $this->state->insertion_mode                      = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3865                  return $this->step( self::REPROCESS_CURRENT_NODE );
3867              /*
3868               * > A start tag whose tag name is "col"
3869               */
3870              case '+COL':
3871                  array_pop( $this->state->stack_of_template_insertion_modes );
3872                  $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
3873                  $this->state->insertion_mode                      = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
3874                  return $this->step( self::REPROCESS_CURRENT_NODE );
3876              /*
3877               * > A start tag whose tag name is "tr"
3878               */
3879              case '+TR':
3880                  array_pop( $this->state->stack_of_template_insertion_modes );
3881                  $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3882                  $this->state->insertion_mode                      = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3883                  return $this->step( self::REPROCESS_CURRENT_NODE );
3885              /*
3886               * > A start tag whose tag name is one of: "td", "th"
3887               */
3888              case '+TD':
3889              case '+TH':
3890                  array_pop( $this->state->stack_of_template_insertion_modes );
3891                  $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3892                  $this->state->insertion_mode                      = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3893                  return $this->step( self::REPROCESS_CURRENT_NODE );
3894          }
3896          /*
3897           * > Any other start tag
3898           */
3899          if ( ! $is_closer ) {
3900              array_pop( $this->state->stack_of_template_insertion_modes );
3901              $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
3902              $this->state->insertion_mode                      = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
3903              return $this->step( self::REPROCESS_CURRENT_NODE );
3904          }
3906          /*
3907           * > Any other end tag
3908           */
3909          if ( $is_closer ) {
3910              // Parse error: ignore the token.
3911              return $this->step();
3912          }
3914          /*
3915           * > An end-of-file token
3916           */
3917          if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
3918              // Stop parsing.
3919              return false;
3920          }
3922          // @todo Indicate a parse error once it's possible.
3923          $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' );
3924          $this->state->active_formatting_elements->clear_up_to_last_marker();
3925          array_pop( $this->state->stack_of_template_insertion_modes );
3926          $this->reset_insertion_mode_appropriately();
3927          return $this->step( self::REPROCESS_CURRENT_NODE );
3928      }
3930      /**
3931       * Parses next element in the 'after body' insertion mode.
3932       *
3933       * This internal function performs the 'after body' insertion mode
3934       * logic for the generalized WP_HTML_Processor::step() function.
3935       *
3936       * @since 6.7.0 Stub implementation.
3937       *
3938       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3939       *
3940       * @see https://html.spec.whatwg.org/#parsing-main-afterbody
3941       * @see WP_HTML_Processor::step
3942       *
3943       * @return bool Whether an element was found.
3944       */
3945  	private function step_after_body(): bool {
3946          $tag_name   = $this->get_token_name();
3947          $token_type = $this->get_token_type();
3948          $op_sigil   = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
3949          $op         = "{$op_sigil}{$tag_name}";
3951          switch ( $op ) {
3952              /*
3953               * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
3954               * >   U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
3955               *
3956               * > Process the token using the rules for the "in body" insertion mode.
3957               */
3958              case '#text':
3959                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
3960                      return $this->step_in_body();
3961                  }
3962                  goto after_body_anything_else;
3963                  break;
3965              /*
3966               * > A comment token
3967               */
3968              case '#comment':
3969              case '#funky-comment':
3970              case '#presumptuous-tag':
3971                  $this->bail( 'Content outside of BODY is unsupported.' );
3972                  break;
3974              /*
3975               * > A DOCTYPE token
3976               */
3977              case 'html':
3978                  // Parse error: ignore the token.
3979                  return $this->step();
3981              /*
3982               * > A start tag whose tag name is "html"
3983               */
3984              case '+HTML':
3985                  return $this->step_in_body();
3987              /*
3988               * > An end tag whose tag name is "html"
3989               *
3990               * > If the parser was created as part of the HTML fragment parsing algorithm,
3991               * > this is a parse error; ignore the token. (fragment case)
3992               * >
3993               * > Otherwise, switch the insertion mode to "after after body".
3994               */
3995              case '-HTML':
3996                  if ( isset( $this->context_node ) ) {
3997                      return $this->step();
3998                  }
4000                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY;
4001                  return true;
4002          }
4004          /*
4005           * > Parse error. Switch the insertion mode to "in body" and reprocess the token.
4006           */
4007          after_body_anything_else:
4008          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
4009          return $this->step( self::REPROCESS_CURRENT_NODE );
4010      }
4012      /**
4013       * Parses next element in the 'in frameset' insertion mode.
4014       *
4015       * This internal function performs the 'in frameset' insertion mode
4016       * logic for the generalized WP_HTML_Processor::step() function.
4017       *
4018       * @since 6.7.0 Stub implementation.
4019       *
4020       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4021       *
4022       * @see https://html.spec.whatwg.org/#parsing-main-inframeset
4023       * @see WP_HTML_Processor::step
4024       *
4025       * @return bool Whether an element was found.
4026       */
4027  	private function step_in_frameset(): bool {
4028          $tag_name   = $this->get_token_name();
4029          $token_type = $this->get_token_type();
4030          $op_sigil   = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4031          $op         = "{$op_sigil}{$tag_name}";
4033          switch ( $op ) {
4034              /*
4035               * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4036               * >   U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4037               * >
4038               * > Insert the character.
4039               *
4040               * This algorithm effectively strips non-whitespace characters from text and inserts
4041               * them under HTML. This is not supported at this time.
4042               */
4043              case '#text':
4044                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4045                      return $this->step_in_body();
4046                  }
4047                  $this->bail( 'Non-whitespace characters cannot be handled in frameset.' );
4048                  break;
4050              /*
4051               * > A comment token
4052               */
4053              case '#comment':
4054              case '#funky-comment':
4055              case '#presumptuous-tag':
4056                  $this->insert_html_element( $this->state->current_token );
4057                  return true;
4059              /*
4060               * > A DOCTYPE token
4061               */
4062              case 'html':
4063                  // Parse error: ignore the token.
4064                  return $this->step();
4066              /*
4067               * > A start tag whose tag name is "html"
4068               */
4069              case '+HTML':
4070                  return $this->step_in_body();
4072              /*
4073               * > A start tag whose tag name is "frameset"
4074               */
4075              case '+FRAMESET':
4076                  $this->insert_html_element( $this->state->current_token );
4077                  return true;
4079              /*
4080               * > An end tag whose tag name is "frameset"
4081               */
4082              case '-FRAMESET':
4083                  /*
4084                   * > If the current node is the root html element, then this is a parse error;
4085                   * > ignore the token. (fragment case)
4086                   */
4087                  if ( $this->state->stack_of_open_elements->current_node_is( 'HTML' ) ) {
4088                      return $this->step();
4089                  }
4091                  /*
4092                   * > Otherwise, pop the current node from the stack of open elements.
4093                   */
4094                  $this->state->stack_of_open_elements->pop();
4096                  /*
4097                   * > If the parser was not created as part of the HTML fragment parsing algorithm
4098                   * > (fragment case), and the current node is no longer a frameset element, then
4099                   * > switch the insertion mode to "after frameset".
4100                   */
4101                  if ( ! isset( $this->context_node ) && ! $this->state->stack_of_open_elements->current_node_is( 'FRAMESET' ) ) {
4102                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET;
4103                  }
4105                  return true;
4107              /*
4108               * > A start tag whose tag name is "frame"
4109               *
4110               * > Insert an HTML element for the token. Immediately pop the
4111               * > current node off the stack of open elements.
4112               * >
4113               * > Acknowledge the token's self-closing flag, if it is set.
4114               */
4115              case '+FRAME':
4116                  $this->insert_html_element( $this->state->current_token );
4117                  $this->state->stack_of_open_elements->pop();
4118                  return true;
4120              /*
4121               * > A start tag whose tag name is "noframes"
4122               */
4123              case '+NOFRAMES':
4124                  return $this->step_in_head();
4125          }
4127          // Parse error: ignore the token.
4128          return $this->step();
4129      }
4131      /**
4132       * Parses next element in the 'after frameset' insertion mode.
4133       *
4134       * This internal function performs the 'after frameset' insertion mode
4135       * logic for the generalized WP_HTML_Processor::step() function.
4136       *
4137       * @since 6.7.0 Stub implementation.
4138       *
4139       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4140       *
4141       * @see https://html.spec.whatwg.org/#parsing-main-afterframeset
4142       * @see WP_HTML_Processor::step
4143       *
4144       * @return bool Whether an element was found.
4145       */
4146  	private function step_after_frameset(): bool {
4147          $tag_name   = $this->get_token_name();
4148          $token_type = $this->get_token_type();
4149          $op_sigil   = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4150          $op         = "{$op_sigil}{$tag_name}";
4152          switch ( $op ) {
4153              /*
4154               * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4155               * >   U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4156               * >
4157               * > Insert the character.
4158               *
4159               * This algorithm effectively strips non-whitespace characters from text and inserts
4160               * them under HTML. This is not supported at this time.
4161               */
4162              case '#text':
4163                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4164                      return $this->step_in_body();
4165                  }
4166                  $this->bail( 'Non-whitespace characters cannot be handled in after frameset' );
4167                  break;
4169              /*
4170               * > A comment token
4171               */
4172              case '#comment':
4173              case '#funky-comment':
4174              case '#presumptuous-tag':
4175                  $this->insert_html_element( $this->state->current_token );
4176                  return true;
4178              /*
4179               * > A DOCTYPE token
4180               */
4181              case 'html':
4182                  // Parse error: ignore the token.
4183                  return $this->step();
4185              /*
4186               * > A start tag whose tag name is "html"
4187               */
4188              case '+HTML':
4189                  return $this->step_in_body();
4191              /*
4192               * > An end tag whose tag name is "html"
4193               */
4194              case '-HTML':
4195                  $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET;
4196                  return true;
4198              /*
4199               * > A start tag whose tag name is "noframes"
4200               */
4201              case '+NOFRAMES':
4202                  return $this->step_in_head();
4203          }
4205          // Parse error: ignore the token.
4206          return $this->step();
4207      }
4209      /**
4210       * Parses next element in the 'after after body' insertion mode.
4211       *
4212       * This internal function performs the 'after after body' insertion mode
4213       * logic for the generalized WP_HTML_Processor::step() function.
4214       *
4215       * @since 6.7.0 Stub implementation.
4216       *
4217       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4218       *
4219       * @see https://html.spec.whatwg.org/#the-after-after-body-insertion-mode
4220       * @see WP_HTML_Processor::step
4221       *
4222       * @return bool Whether an element was found.
4223       */
4224  	private function step_after_after_body(): bool {
4225          $tag_name   = $this->get_token_name();
4226          $token_type = $this->get_token_type();
4227          $op_sigil   = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4228          $op         = "{$op_sigil}{$tag_name}";
4230          switch ( $op ) {
4231              /*
4232               * > A comment token
4233               */
4234              case '#comment':
4235              case '#funky-comment':
4236              case '#presumptuous-tag':
4237                  $this->bail( 'Content outside of HTML is unsupported.' );
4238                  break;
4240              /*
4241               * > A DOCTYPE token
4242               * > A start tag whose tag name is "html"
4243               *
4244               * > Process the token using the rules for the "in body" insertion mode.
4245               */
4246              case 'html':
4247              case '+HTML':
4248                  return $this->step_in_body();
4250              /*
4251               * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4252               * >   U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4253               * >
4254               * > Process the token using the rules for the "in body" insertion mode.
4255               */
4256              case '#text':
4257                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4258                      return $this->step_in_body();
4259                  }
4260                  goto after_after_body_anything_else;
4261                  break;
4262          }
4264          /*
4265           * > Parse error. Switch the insertion mode to "in body" and reprocess the token.
4266           */
4267          after_after_body_anything_else:
4268          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
4269          return $this->step( self::REPROCESS_CURRENT_NODE );
4270      }
4272      /**
4273       * Parses next element in the 'after after frameset' insertion mode.
4274       *
4275       * This internal function performs the 'after after frameset' insertion mode
4276       * logic for the generalized WP_HTML_Processor::step() function.
4277       *
4278       * @since 6.7.0 Stub implementation.
4279       *
4280       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4281       *
4282       * @see https://html.spec.whatwg.org/#the-after-after-frameset-insertion-mode
4283       * @see WP_HTML_Processor::step
4284       *
4285       * @return bool Whether an element was found.
4286       */
4287  	private function step_after_after_frameset(): bool {
4288          $tag_name   = $this->get_token_name();
4289          $token_type = $this->get_token_type();
4290          $op_sigil   = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4291          $op         = "{$op_sigil}{$tag_name}";
4293          switch ( $op ) {
4294              /*
4295               * > A comment token
4296               */
4297              case '#comment':
4298              case '#funky-comment':
4299              case '#presumptuous-tag':
4300                  $this->bail( 'Content outside of HTML is unsupported.' );
4301                  break;
4303              /*
4304               * > A DOCTYPE token
4305               * > A start tag whose tag name is "html"
4306               *
4307               * > Process the token using the rules for the "in body" insertion mode.
4308               */
4309              case 'html':
4310              case '+HTML':
4311                  return $this->step_in_body();
4313              /*
4314               * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4315               * >   U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4316               * >
4317               * > Process the token using the rules for the "in body" insertion mode.
4318               *
4319               * This algorithm effectively strips non-whitespace characters from text and inserts
4320               * them under HTML. This is not supported at this time.
4321               */
4322              case '#text':
4323                  if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4324                      return $this->step_in_body();
4325                  }
4326                  $this->bail( 'Non-whitespace characters cannot be handled in after after frameset.' );
4327                  break;
4329              /*
4330               * > A start tag whose tag name is "noframes"
4331               */
4332              case '+NOFRAMES':
4333                  return $this->step_in_head();
4334          }
4336          // Parse error: ignore the token.
4337          return $this->step();
4338      }
4340      /**
4341       * Parses next element in the 'in foreign content' insertion mode.
4342       *
4343       * This internal function performs the 'in foreign content' insertion mode
4344       * logic for the generalized WP_HTML_Processor::step() function.
4345       *
4346       * @since 6.7.0 Stub implementation.
4347       *
4348       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4349       *
4350       * @see https://html.spec.whatwg.org/#parsing-main-inforeign
4351       * @see WP_HTML_Processor::step
4352       *
4353       * @return bool Whether an element was found.
4354       */
4355  	private function step_in_foreign_content(): bool {
4356          $tag_name   = $this->get_token_name();
4357          $token_type = $this->get_token_type();
4358          $op_sigil   = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4359          $op         = "{$op_sigil}{$tag_name}";
4361          /*
4362           * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size"
4363           *
4364           * This section drawn out above the switch to more easily incorporate
4365           * the additional rules based on the presence of the attributes.
4366           */
4367          if (
4368              '+FONT' === $op &&
4369              (
4370                  null !== $this->get_attribute( 'color' ) ||
4371                  null !== $this->get_attribute( 'face' ) ||
4372                  null !== $this->get_attribute( 'size' )
4373              )
4374          ) {
4375              $op = '+FONT with attributes';
4376          }
4378          switch ( $op ) {
4379              case '#text':
4380                  /*
4381                   * > A character token that is U+0000 NULL
4382                   *
4383                   * This is handled by `get_modifiable_text()`.
4384                   */
4386                  /*
4387                   * Whitespace-only text does not affect the frameset-ok flag.
4388                   * It is probably inter-element whitespace, but it may also
4389                   * contain character references which decode only to whitespace.
4390                   */
4391                  if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) {
4392                      $this->state->frameset_ok = false;
4393                  }
4395                  $this->insert_foreign_element( $this->state->current_token, false );
4396                  return true;
4398              /*
4399               * CDATA sections are alternate wrappers for text content and therefore
4400               * ought to follow the same rules as text nodes.
4401               */
4402              case '#cdata-section':
4403                  /*
4404                   * NULL bytes and whitespace do not change the frameset-ok flag.
4405                   */
4406                  $current_token        = $this->bookmarks[ $this->state->current_token->bookmark_name ];
4407                  $cdata_content_start  = $current_token->start + 9;
4408                  $cdata_content_length = $current_token->length - 12;
4409                  if ( strspn( $this->html, "\0 \t\n\f\r", $cdata_content_start, $cdata_content_length ) !== $cdata_content_length ) {
4410                      $this->state->frameset_ok = false;
4411                  }
4413                  $this->insert_foreign_element( $this->state->current_token, false );
4414                  return true;
4416              /*
4417               * > A comment token
4418               */
4419              case '#comment':
4420              case '#funky-comment':
4421              case '#presumptuous-tag':
4422                  $this->insert_foreign_element( $this->state->current_token, false );
4423                  return true;
4425              /*
4426               * > A DOCTYPE token
4427               */
4428              case 'html':
4429                  // Parse error: ignore the token.
4430                  return $this->step();
4432              /*
4433               * > A start tag whose tag name is "b", "big", "blockquote", "body", "br", "center",
4434               * > "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5",
4435               * > "h6", "head", "hr", "i", "img", "li", "listing", "menu", "meta", "nobr", "ol",
4436               * > "p", "pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup",
4437               * > "table", "tt", "u", "ul", "var"
4438               *
4439               * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size"
4440               *
4441               * > An end tag whose tag name is "br", "p"
4442               *
4443               * Closing BR tags are always reported by the Tag Processor as opening tags.
4444               */
4445              case '+B':
4446              case '+BIG':
4447              case '+BLOCKQUOTE':
4448              case '+BODY':
4449              case '+BR':
4450              case '+CENTER':
4451              case '+CODE':
4452              case '+DD':
4453              case '+DIV':
4454              case '+DL':
4455              case '+DT':
4456              case '+EM':
4457              case '+EMBED':
4458              case '+H1':
4459              case '+H2':
4460              case '+H3':
4461              case '+H4':
4462              case '+H5':
4463              case '+H6':
4464              case '+HEAD':
4465              case '+HR':
4466              case '+I':
4467              case '+IMG':
4468              case '+LI':
4469              case '+LISTING':
4470              case '+MENU':
4471              case '+META':
4472              case '+NOBR':
4473              case '+OL':
4474              case '+P':
4475              case '+PRE':
4476              case '+RUBY':
4477              case '+S':
4478              case '+SMALL':
4479              case '+SPAN':
4480              case '+STRONG':
4481              case '+STRIKE':
4482              case '+SUB':
4483              case '+SUP':
4484              case '+TABLE':
4485              case '+TT':
4486              case '+U':
4487              case '+UL':
4488              case '+VAR':
4489              case '+FONT with attributes':
4490              case '-BR':
4491              case '-P':
4492                  // @todo Indicate a parse error once it's possible.
4493                  foreach ( $this->state->stack_of_open_elements->walk_up() as $current_node ) {
4494                      if (
4495                          'math' === $current_node->integration_node_type ||
4496                          'html' === $current_node->integration_node_type ||
4497                          'html' === $current_node->namespace
4498                      ) {
4499                          break;
4500                      }
4502                      $this->state->stack_of_open_elements->pop();
4503                  }
4504                  return $this->step( self::REPROCESS_CURRENT_NODE );
4505          }
4507          /*
4508           * > Any other start tag
4509           */
4510          if ( ! $this->is_tag_closer() ) {
4511              $this->insert_foreign_element( $this->state->current_token, false );
4513              /*
4514               * > If the token has its self-closing flag set, then run
4515               * > the appropriate steps from the following list:
4516               * >
4517               * >   ↪ the token's tag name is "script", and the new current node is in the SVG namespace
4518               * >         Acknowledge the token's self-closing flag, and then act as
4519               * >         described in the steps for a "script" end tag below.
4520               * >
4521               * >   ↪ Otherwise
4522               * >         Pop the current node off the stack of open elements and
4523               * >         acknowledge the token's self-closing flag.
4524               *
4525               * Since the rules for SCRIPT below indicate to pop the element off of the stack of
4526               * open elements, which is the same for the Otherwise condition, there's no need to
4527               * separate these checks. The difference comes when a parser operates with the scripting
4528               * flag enabled, and executes the script, which this parser does not support.
4529               */
4530              if ( $this->state->current_token->has_self_closing_flag ) {
4531                  $this->state->stack_of_open_elements->pop();
4532              }
4533              return true;
4534          }
4536          /*
4537           * > An end tag whose name is "script", if the current node is an SVG script element.
4538           */
4539          if ( $this->is_tag_closer() && 'SCRIPT' === $this->state->current_token->node_name && 'svg' === $this->state->current_token->namespace ) {
4540              $this->state->stack_of_open_elements->pop();
4541              return true;
4542          }
4544          /*
4545           * > Any other end tag
4546           */
4547          if ( $this->is_tag_closer() ) {
4548              $node = $this->state->stack_of_open_elements->current_node();
4549              if ( $tag_name !== $node->node_name ) {
4550                  // @todo Indicate a parse error once it's possible.
4551              }
4552              in_foreign_content_end_tag_loop:
4553              if ( $node === $this->state->stack_of_open_elements->at( 1 ) ) {
4554                  return true;
4555              }
4557              /*
4558               * > If node's tag name, converted to ASCII lowercase, is the same as the tag name
4559               * > of the token, pop elements from the stack of open elements until node has
4560               * > been popped from the stack, and then return.
4561               */
4562              if ( 0 === strcasecmp( $node->node_name, $tag_name ) ) {
4563                  foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
4564                      $this->state->stack_of_open_elements->pop();
4565                      if ( $node === $item ) {
4566                          return true;
4567                      }
4568                  }
4569              }
4571              foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) {
4572                  $node = $item;
4573                  break;
4574              }
4576              if ( 'html' !== $node->namespace ) {
4577                  goto in_foreign_content_end_tag_loop;
4578              }
4580              switch ( $this->state->insertion_mode ) {
4581                  case WP_HTML_Processor_State::INSERTION_MODE_INITIAL:
4582                      return $this->step_initial();
4584                  case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML:
4585                      return $this->step_before_html();
4587                  case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD:
4588                      return $this->step_before_head();
4590                  case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD:
4591                      return $this->step_in_head();
4593                  case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT:
4594                      return $this->step_in_head_noscript();
4596                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD:
4597                      return $this->step_after_head();
4599                  case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY:
4600                      return $this->step_in_body();
4602                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE:
4603                      return $this->step_in_table();
4605                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT:
4606                      return $this->step_in_table_text();
4608                  case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION:
4609                      return $this->step_in_caption();
4611                  case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP:
4612                      return $this->step_in_column_group();
4614                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY:
4615                      return $this->step_in_table_body();
4617                  case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW:
4618                      return $this->step_in_row();
4620                  case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL:
4621                      return $this->step_in_cell();
4623                  case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT:
4624                      return $this->step_in_select();
4626                  case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE:
4627                      return $this->step_in_select_in_table();
4629                  case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE:
4630                      return $this->step_in_template();
4632                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY:
4633                      return $this->step_after_body();
4635                  case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET:
4636                      return $this->step_in_frameset();
4638                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET:
4639                      return $this->step_after_frameset();
4641                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY:
4642                      return $this->step_after_after_body();
4644                  case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET:
4645                      return $this->step_after_after_frameset();
4647                  // This should be unreachable but PHP doesn't have total type checking on switch.
4648                  default:
4649                      $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." );
4650              }
4651          }
4653          $this->bail( 'Should not have been able to reach end of IN FOREIGN CONTENT processing. Check HTML API code.' );
4654          // This unnecessary return prevents tools from inaccurately reporting type errors.
4655          return false;
4656      }
4658      /*
4659       * Internal helpers
4660       */
4662      /**
4663       * Creates a new bookmark for the currently-matched token and returns the generated name.
4664       *
4665       * @since 6.4.0
4666       * @since 6.5.0 Renamed from bookmark_tag() to bookmark_token().
4667       *
4668       * @throws Exception When unable to allocate requested bookmark.
4669       *
4670       * @return string|false Name of created bookmark, or false if unable to create.
4671       */
4672  	private function bookmark_token() {
4673          if ( ! parent::set_bookmark( ++$this->bookmark_counter ) ) {
4674              $this->last_error = self::ERROR_EXCEEDED_MAX_BOOKMARKS;
4675              throw new Exception( 'could not allocate bookmark' );
4676          }
4678          return "{$this->bookmark_counter}";
4679      }
4681      /*
4682       * HTML semantic overrides for Tag Processor
4683       */
4685      /**
4686       * Indicates the namespace of the current token, or "html" if there is none.
4687       *
4688       * @return string One of "html", "math", or "svg".
4689       */
4690  	public function get_namespace(): string {
4691          if ( ! isset( $this->current_element ) ) {
4692              return parent::get_namespace();
4693          }
4695          return $this->current_element->token->namespace;
4696      }
4698      /**
4699       * Returns the uppercase name of the matched tag.
4700       *
4701       * The semantic rules for HTML specify that certain tags be reprocessed
4702       * with a different tag name. Because of this, the tag name presented
4703       * by the HTML Processor may differ from the one reported by the HTML
4704       * Tag Processor, which doesn't apply these semantic rules.
4705       *
4706       * Example:
4707       *
4708       *     $processor = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' );
4709       *     $processor->next_tag() === true;
4710       *     $processor->get_tag() === 'DIV';
4711       *
4712       *     $processor->next_tag() === false;
4713       *     $processor->get_tag() === null;
4714       *
4715       * @since 6.4.0
4716       *
4717       * @return string|null Name of currently matched tag in input HTML, or `null` if none found.
4718       */
4719  	public function get_tag(): ?string {
4720          if ( null !== $this->last_error ) {
4721              return null;
4722          }
4724          if ( $this->is_virtual() ) {
4725              return $this->current_element->token->node_name;
4726          }
4728          $tag_name = parent::get_tag();
4730          /*
4731           * > A start tag whose tag name is "image"
4732           * > Change the token's tag name to "img" and reprocess it. (Don't ask.)
4733           */
4734          return ( 'IMAGE' === $tag_name && 'html' === $this->get_namespace() )
4735              ? 'IMG'
4736              : $tag_name;
4737      }
4739      /**
4740       * Indicates if the currently matched tag contains the self-closing flag.
4741       *
4742       * No HTML elements ought to have the self-closing flag and for those, the self-closing
4743       * flag will be ignored. For void elements this is benign because they "self close"
4744       * automatically. For non-void HTML elements though problems will appear if someone
4745       * intends to use a self-closing element in place of that element with an empty body.
4746       * For HTML foreign elements and custom elements the self-closing flag determines if
4747       * they self-close or not.
4748       *
4749       * This function does not determine if a tag is self-closing,
4750       * but only if the self-closing flag is present in the syntax.
4751       *
4752       * @since 6.6.0 Subclassed for the HTML Processor.
4753       *
4754       * @return bool Whether the currently matched tag contains the self-closing flag.
4755       */
4756  	public function has_self_closing_flag(): bool {
4757          return $this->is_virtual() ? false : parent::has_self_closing_flag();
4758      }
4760      /**
4761       * Returns the node name represented by the token.
4762       *
4763       * This matches the DOM API value `nodeName`. Some values
4764       * are static, such as `#text` for a text node, while others
4765       * are dynamically generated from the token itself.
4766       *
4767       * Dynamic names:
4768       *  - Uppercase tag name for tag matches.
4769       *  - `html` for DOCTYPE declarations.
4770       *
4771       * Note that if the Tag Processor is not matched on a token
4772       * then this function will return `null`, either because it
4773       * hasn't yet found a token or because it reached the end
4774       * of the document without matching a token.
4775       *
4776       * @since 6.6.0 Subclassed for the HTML Processor.
4777       *
4778       * @return string|null Name of the matched token.
4779       */
4780  	public function get_token_name(): ?string {
4781          return $this->is_virtual()
4782              ? $this->current_element->token->node_name
4783              : parent::get_token_name();
4784      }
4786      /**
4787       * Indicates the kind of matched token, if any.
4788       *
4789       * This differs from `get_token_name()` in that it always
4790       * returns a static string indicating the type, whereas
4791       * `get_token_name()` may return values derived from the
4792       * token itself, such as a tag name or processing
4793       * instruction tag.
4794       *
4795       * Possible values:
4796       *  - `#tag` when matched on a tag.
4797       *  - `#text` when matched on a text node.
4798       *  - `#cdata-section` when matched on a CDATA node.
4799       *  - `#comment` when matched on a comment.
4800       *  - `#doctype` when matched on a DOCTYPE declaration.
4801       *  - `#presumptuous-tag` when matched on an empty tag closer.
4802       *  - `#funky-comment` when matched on a funky comment.
4803       *
4804       * @since 6.6.0 Subclassed for the HTML Processor.
4805       *
4806       * @return string|null What kind of token is matched, or null.
4807       */
4808  	public function get_token_type(): ?string {
4809          if ( $this->is_virtual() ) {
4810              /*
4811               * This logic comes from the Tag Processor.
4812               *
4813               * @todo It would be ideal not to repeat this here, but it's not clearly
4814               *       better to allow passing a token name to `get_token_type()`.
4815               */
4816              $node_name     = $this->current_element->token->node_name;
4817              $starting_char = $node_name[0];
4818              if ( 'A' <= $starting_char && 'Z' >= $starting_char ) {
4819                  return '#tag';
4820              }
4822              if ( 'html' === $node_name ) {
4823                  return '#doctype';
4824              }
4826              return $node_name;
4827          }
4829          return parent::get_token_type();
4830      }
4832      /**
4833       * Returns the value of a requested attribute from a matched tag opener if that attribute exists.
4834       *
4835       * Example:
4836       *
4837       *     $p = WP_HTML_Processor::create_fragment( '<div enabled class="test" data-test-id="14">Test</div>' );
4838       *     $p->next_token() === true;
4839       *     $p->get_attribute( 'data-test-id' ) === '14';
4840       *     $p->get_attribute( 'enabled' ) === true;
4841       *     $p->get_attribute( 'aria-label' ) === null;
4842       *
4843       *     $p->next_tag() === false;
4844       *     $p->get_attribute( 'class' ) === null;
4845       *
4846       * @since 6.6.0 Subclassed for HTML Processor.
4847       *
4848       * @param string $name Name of attribute whose value is requested.
4849       * @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`.
4850       */
4851  	public function get_attribute( $name ) {
4852          return $this->is_virtual() ? null : parent::get_attribute( $name );
4853      }
4855      /**
4856       * Updates or creates a new attribute on the currently matched tag with the passed value.
4857       *
4858       * For boolean attributes special handling is provided:
4859       *  - When `true` is passed as the value, then only the attribute name is added to the tag.
4860       *  - When `false` is passed, the attribute gets removed if it existed before.
4861       *
4862       * For string attributes, the value is escaped using the `esc_attr` function.
4863       *
4864       * @since 6.6.0 Subclassed for the HTML Processor.
4865       *
4866       * @param string      $name  The attribute name to target.
4867       * @param string|bool $value The new attribute value.
4868       * @return bool Whether an attribute value was set.
4869       */
4870  	public function set_attribute( $name, $value ): bool {
4871          return $this->is_virtual() ? false : parent::set_attribute( $name, $value );
4872      }
4874      /**
4875       * Remove an attribute from the currently-matched tag.
4876       *
4877       * @since 6.6.0 Subclassed for HTML Processor.
4878       *
4879       * @param string $name The attribute name to remove.
4880       * @return bool Whether an attribute was removed.
4881       */
4882  	public function remove_attribute( $name ): bool {
4883          return $this->is_virtual() ? false : parent::remove_attribute( $name );
4884      }
4886      /**
4887       * Gets lowercase names of all attributes matching a given prefix in the current tag.
4888       *
4889       * Note that matching is case-insensitive. This is in accordance with the spec:
4890       *
4891       * > There must never be two or more attributes on
4892       * > the same start tag whose names are an ASCII
4893       * > case-insensitive match for each other.
4894       *     - HTML 5 spec
4895       *
4896       * Example:
4897       *
4898       *     $p = new WP_HTML_Tag_Processor( '<div data-ENABLED class="test" DATA-test-id="14">Test</div>' );
4899       *     $p->next_tag( array( 'class_name' => 'test' ) ) === true;
4900       *     $p->get_attribute_names_with_prefix( 'data-' ) === array( 'data-enabled', 'data-test-id' );
4901       *
4902       *     $p->next_tag() === false;
4903       *     $p->get_attribute_names_with_prefix( 'data-' ) === null;
4904       *
4905       * @since 6.6.0 Subclassed for the HTML Processor.
4906       *
4907       * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive
4908       *
4909       * @param string $prefix Prefix of requested attribute names.
4910       * @return array|null List of attribute names, or `null` when no tag opener is matched.
4911       */
4912  	public function get_attribute_names_with_prefix( $prefix ): ?array {
4913          return $this->is_virtual() ? null : parent::get_attribute_names_with_prefix( $prefix );
4914      }
4916      /**
4917       * Adds a new class name to the currently matched tag.
4918       *
4919       * @since 6.6.0 Subclassed for the HTML Processor.
4920       *
4921       * @param string $class_name The class name to add.
4922       * @return bool Whether the class was set to be added.
4923       */
4924  	public function add_class( $class_name ): bool {
4925          return $this->is_virtual() ? false : parent::add_class( $class_name );
4926      }
4928      /**
4929       * Removes a class name from the currently matched tag.
4930       *
4931       * @since 6.6.0 Subclassed for the HTML Processor.
4932       *
4933       * @param string $class_name The class name to remove.
4934       * @return bool Whether the class was set to be removed.
4935       */
4936  	public function remove_class( $class_name ): bool {
4937          return $this->is_virtual() ? false : parent::remove_class( $class_name );
4938      }
4940      /**
4941       * Returns if a matched tag contains the given ASCII case-insensitive class name.
4942       *
4943       * @since 6.6.0 Subclassed for the HTML Processor.
4944       *
4945       * @todo When reconstructing active formatting elements with attributes, find a way
4946       *       to indicate if the virtually-reconstructed formatting elements contain the
4947       *       wanted class name.
4948       *
4949       * @param string $wanted_class Look for this CSS class name, ASCII case-insensitive.
4950       * @return bool|null Whether the matched tag contains the given class name, or null if not matched.
4951       */
4952  	public function has_class( $wanted_class ): ?bool {
4953          return $this->is_virtual() ? null : parent::has_class( $wanted_class );
4954      }
4956      /**
4957       * Generator for a foreach loop to step through each class name for the matched tag.
4958       *
4959       * This generator function is designed to be used inside a "foreach" loop.
4960       *
4961       * Example:
4962       *
4963       *     $p = WP_HTML_Processor::create_fragment( "<div class='free &lt;egg&lt;\tlang-en'>" );
4964       *     $p->next_tag();
4965       *     foreach ( $p->class_list() as $class_name ) {
4966       *         echo "{$class_name} ";
4967       *     }
4968       *     // Outputs: "free <egg> lang-en "
4969       *
4970       * @since 6.6.0 Subclassed for the HTML Processor.
4971       */
4972  	public function class_list() {
4973          return $this->is_virtual() ? null : parent::class_list();
4974      }
4976      /**
4977       * Returns the modifiable text for a matched token, or an empty string.
4978       *
4979       * Modifiable text is text content that may be read and changed without
4980       * changing the HTML structure of the document around it. This includes
4981       * the contents of `#text` nodes in the HTML as well as the inner
4982       * contents of HTML comments, Processing Instructions, and others, even
4983       * though these nodes aren't part of a parsed DOM tree. They also contain
4984       * the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any
4985       * other section in an HTML document which cannot contain HTML markup (DATA).
4986       *
4987       * If a token has no modifiable text then an empty string is returned to
4988       * avoid needless crashing or type errors. An empty string does not mean
4989       * that a token has modifiable text, and a token with modifiable text may
4990       * have an empty string (e.g. a comment with no contents).
4991       *
4992       * @since 6.6.0 Subclassed for the HTML Processor.
4993       *
4994       * @return string
4995       */
4996  	public function get_modifiable_text(): string {
4997          return $this->is_virtual() ? '' : parent::get_modifiable_text();
4998      }
5000      /**
5001       * Indicates what kind of comment produced the comment node.
5002       *
5003       * Because there are different kinds of HTML syntax which produce
5004       * comments, the Tag Processor tracks and exposes this as a type
5005       * for the comment. Nominally only regular HTML comments exist as
5006       * they are commonly known, but a number of unrelated syntax errors
5007       * also produce comments.
5008       *
5010       * @see self::COMMENT_AS_CDATA_LOOKALIKE
5011       * @see self::COMMENT_AS_INVALID_HTML
5012       * @see self::COMMENT_AS_HTML_COMMENT
5013       * @see self::COMMENT_AS_PI_NODE_LOOKALIKE
5014       *
5015       * @since 6.6.0 Subclassed for the HTML Processor.
5016       *
5017       * @return string|null
5018       */
5019  	public function get_comment_type(): ?string {
5020          return $this->is_virtual() ? null : parent::get_comment_type();
5021      }
5023      /**
5024       * Removes a bookmark that is no longer needed.
5025       *
5026       * Releasing a bookmark frees up the small
5027       * performance overhead it requires.
5028       *
5029       * @since 6.4.0
5030       *
5031       * @param string $bookmark_name Name of the bookmark to remove.
5032       * @return bool Whether the bookmark already existed before removal.
5033       */
5034  	public function release_bookmark( $bookmark_name ): bool {
5035          return parent::release_bookmark( "_{$bookmark_name}" );
5036      }
5038      /**
5039       * Moves the internal cursor in the HTML Processor to a given bookmark's location.
5040       *
5041       * Be careful! Seeking backwards to a previous location resets the parser to the
5042       * start of the document and reparses the entire contents up until it finds the
5043       * sought-after bookmarked location.
5044       *
5045       * In order to prevent accidental infinite loops, there's a
5046       * maximum limit on the number of times seek() can be called.
5047       *
5048       * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
5049       *
5050       * @since 6.4.0
5051       *
5052       * @param string $bookmark_name Jump to the place in the document identified by this bookmark name.
5053       * @return bool Whether the internal cursor was successfully moved to the bookmark's location.
5054       */
5055  	public function seek( $bookmark_name ): bool {
5056          // Flush any pending updates to the document before beginning.
5057          $this->get_updated_html();
5059          $actual_bookmark_name = "_{$bookmark_name}";
5060          $processor_started_at = $this->state->current_token
5061              ? $this->bookmarks[ $this->state->current_token->bookmark_name ]->start
5062              : 0;
5063          $bookmark_starts_at   = $this->bookmarks[ $actual_bookmark_name ]->start;
5064          $direction            = $bookmark_starts_at > $processor_started_at ? 'forward' : 'backward';
5066          /*
5067           * If seeking backwards, it's possible that the sought-after bookmark exists within an element
5068           * which has been closed before the current cursor; in other words, it has already been removed
5069           * from the stack of open elements. This means that it's insufficient to simply pop off elements
5070           * from the stack of open elements which appear after the bookmarked location and then jump to
5071           * that location, as the elements which were open before won't be re-opened.
5072           *
5073           * In order to maintain consistency, the HTML Processor rewinds to the start of the document
5074           * and reparses everything until it finds the sought-after bookmark.
5075           *
5076           * There are potentially better ways to do this: cache the parser state for each bookmark and
5077           * restore it when seeking; store an immutable and idempotent register of where elements open
5078           * and close.
5079           *
5080           * If caching the parser state it will be essential to properly maintain the cached stack of
5081           * open elements and active formatting elements when modifying the document. This could be a
5082           * tedious and time-consuming process as well, and so for now will not be performed.
5083           *
5084           * It may be possible to track bookmarks for where elements open and close, and in doing so
5085           * be able to quickly recalculate breadcrumbs for any element in the document. It may even
5086           * be possible to remove the stack of open elements and compute it on the fly this way.
5087           * If doing this, the parser would need to track the opening and closing locations for all
5088           * tokens in the breadcrumb path for any and all bookmarks. By utilizing bookmarks themselves
5089           * this list could be automatically maintained while modifying the document. Finding the
5090           * breadcrumbs would then amount to traversing that list from the start until the token
5091           * being inspected. Once an element closes, if there are no bookmarks pointing to locations
5092           * within that element, then all of these locations may be forgotten to save on memory use
5093           * and computation time.
5094           */
5095          if ( 'backward' === $direction ) {
5096              /*
5097               * Instead of clearing the parser state and starting fresh, calling the stack methods
5098               * maintains the proper flags in the parser.
5099               */
5100              foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
5101                  if ( 'context-node' === $item->bookmark_name ) {
5102                      break;
5103                  }
5105                  $this->state->stack_of_open_elements->remove_node( $item );
5106              }
5108              foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
5109                  if ( 'context-node' === $item->bookmark_name ) {
5110                      break;
5111                  }
5113                  $this->state->active_formatting_elements->remove_node( $item );
5114              }
5116              parent::seek( 'context-node' );
5117              $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
5118              $this->state->frameset_ok    = true;
5119              $this->element_queue         = array();
5120              $this->current_element       = null;
5122              if ( isset( $this->context_node ) ) {
5123                  $this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 );
5124              } else {
5125                  $this->breadcrumbs = array();
5126              }
5127          }
5129          // When moving forwards, reparse the document until reaching the same location as the original bookmark.
5130          if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
5131              return true;
5132          }
5134          while ( $this->next_token() ) {
5135              if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
5136                  while ( isset( $this->current_element ) && WP_HTML_Stack_Event::POP === $this->current_element->operation ) {
5137                      $this->current_element = array_shift( $this->element_queue );
5138                  }
5139                  return true;
5140              }
5141          }
5143          return false;
5144      }
5146      /**
5147       * Sets a bookmark in the HTML document.
5148       *
5149       * Bookmarks represent specific places or tokens in the HTML
5150       * document, such as a tag opener or closer. When applying
5151       * edits to a document, such as setting an attribute, the
5152       * text offsets of that token may shift; the bookmark is
5153       * kept updated with those shifts and remains stable unless
5154       * the entire span of text in which the token sits is removed.
5155       *
5156       * Release bookmarks when they are no longer needed.
5157       *
5158       * Example:
5159       *
5160       *     <main><h2>Surprising fact you may not know!</h2></main>
5161       *           ^  ^
5162       *            \-|-- this `H2` opener bookmark tracks the token
5163       *
5164       *     <main class="clickbait"><h2>Surprising fact you may no…
5165       *                             ^  ^
5166       *                              \-|-- it shifts with edits
5167       *
5168       * Bookmarks provide the ability to seek to a previously-scanned
5169       * place in the HTML document. This avoids the need to re-scan
5170       * the entire document.
5171       *
5172       * Example:
5173       *
5174       *     <ul><li>One</li><li>Two</li><li>Three</li></ul>
5175       *                                 ^^^^
5176       *                                 want to note this last item
5177       *
5178       *     $p = new WP_HTML_Tag_Processor( $html );
5179       *     $in_list = false;
5180       *     while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) {
5181       *         if ( 'UL' === $p->get_tag() ) {
5182       *             if ( $p->is_tag_closer() ) {
5183       *                 $in_list = false;
5184       *                 $p->set_bookmark( 'resume' );
5185       *                 if ( $p->seek( 'last-li' ) ) {
5186       *                     $p->add_class( 'last-li' );
5187       *                 }
5188       *                 $p->seek( 'resume' );
5189       *                 $p->release_bookmark( 'last-li' );
5190       *                 $p->release_bookmark( 'resume' );
5191       *             } else {
5192       *                 $in_list = true;
5193       *             }
5194       *         }
5195       *
5196       *         if ( 'LI' === $p->get_tag() ) {
5197       *             $p->set_bookmark( 'last-li' );
5198       *         }
5199       *     }
5200       *
5201       * Bookmarks intentionally hide the internal string offsets
5202       * to which they refer. They are maintained internally as
5203       * updates are applied to the HTML document and therefore
5204       * retain their "position" - the location to which they
5205       * originally pointed. The inability to use bookmarks with
5206       * functions like `substr` is therefore intentional to guard
5207       * against accidentally breaking the HTML.
5208       *
5209       * Because bookmarks allocate memory and require processing
5210       * for every applied update, they are limited and require
5211       * a name. They should not be created with programmatically-made
5212       * names, such as "li_{$index}" with some loop. As a general
5213       * rule they should only be created with string-literal names
5214       * like "start-of-section" or "last-paragraph".
5215       *
5216       * Bookmarks are a powerful tool to enable complicated behavior.
5217       * Consider double-checking that you need this tool if you are
5218       * reaching for it, as inappropriate use could lead to broken
5219       * HTML structure or unwanted processing overhead.
5220       *
5221       * @since 6.4.0
5222       *
5223       * @param string $bookmark_name Identifies this particular bookmark.
5224       * @return bool Whether the bookmark was successfully created.
5225       */
5226  	public function set_bookmark( $bookmark_name ): bool {
5227          return parent::set_bookmark( "_{$bookmark_name}" );
5228      }
5230      /**
5231       * Checks whether a bookmark with the given name exists.
5232       *
5233       * @since 6.5.0
5234       *
5235       * @param string $bookmark_name Name to identify a bookmark that potentially exists.
5236       * @return bool Whether that bookmark exists.
5237       */
5238  	public function has_bookmark( $bookmark_name ): bool {
5239          return parent::has_bookmark( "_{$bookmark_name}" );
5240      }
5242      /*
5243       * HTML Parsing Algorithms
5244       */
5246      /**
5247       * Closes a P element.
5248       *
5249       * @since 6.4.0
5250       *
5251       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
5252       *
5253       * @see https://html.spec.whatwg.org/#close-a-p-element
5254       */
5255  	private function close_a_p_element(): void {
5256          $this->generate_implied_end_tags( 'P' );
5257          $this->state->stack_of_open_elements->pop_until( 'P' );
5258      }
5260      /**
5261       * Closes elements that have implied end tags.
5262       *
5263       * @since 6.4.0
5264       * @since 6.7.0 Full spec support.
5265       *
5266       * @see https://html.spec.whatwg.org/#generate-implied-end-tags
5267       *
5268       * @param string|null $except_for_this_element Perform as if this element doesn't exist in the stack of open elements.
5269       */
5270  	private function generate_implied_end_tags( ?string $except_for_this_element = null ): void {
5271          $elements_with_implied_end_tags = array(
5272              'DD',
5273              'DT',
5274              'LI',
5275              'OPTGROUP',
5276              'OPTION',
5277              'P',
5278              'RB',
5279              'RP',
5280              'RT',
5281              'RTC',
5282          );
5284          $no_exclusions = ! isset( $except_for_this_element );
5286          while (
5287              ( $no_exclusions || ! $this->state->stack_of_open_elements->current_node_is( $except_for_this_element ) ) &&
5288              in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true )
5289          ) {
5290              $this->state->stack_of_open_elements->pop();
5291          }
5292      }
5294      /**
5295       * Closes elements that have implied end tags, thoroughly.
5296       *
5297       * See the HTML specification for an explanation why this is
5298       * different from generating end tags in the normal sense.
5299       *
5300       * @since 6.4.0
5301       * @since 6.7.0 Full spec support.
5302       *
5303       * @see WP_HTML_Processor::generate_implied_end_tags
5304       * @see https://html.spec.whatwg.org/#generate-implied-end-tags
5305       */
5306  	private function generate_implied_end_tags_thoroughly(): void {
5307          $elements_with_implied_end_tags = array(
5308              'CAPTION',
5309              'COLGROUP',
5310              'DD',
5311              'DT',
5312              'LI',
5313              'OPTGROUP',
5314              'OPTION',
5315              'P',
5316              'RB',
5317              'RP',
5318              'RT',
5319              'RTC',
5320              'TBODY',
5321              'TD',
5322              'TFOOT',
5323              'TH',
5324              'THEAD',
5325              'TR',
5326          );
5328          while ( in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true ) ) {
5329              $this->state->stack_of_open_elements->pop();
5330          }
5331      }
5333      /**
5334       * Returns the adjusted current node.
5335       *
5336       * > The adjusted current node is the context element if the parser was created as
5337       * > part of the HTML fragment parsing algorithm and the stack of open elements
5338       * > has only one element in it (fragment case); otherwise, the adjusted current
5339       * > node is the current node.
5340       *
5341       * @see https://html.spec.whatwg.org/#adjusted-current-node
5342       *
5343       * @since 6.7.0
5344       *
5345       * @return WP_HTML_Token|null The adjusted current node.
5346       */
5347  	private function get_adjusted_current_node(): ?WP_HTML_Token {
5348          if ( isset( $this->context_node ) && 1 === $this->state->stack_of_open_elements->count() ) {
5349              return $this->context_node;
5350          }
5352          return $this->state->stack_of_open_elements->current_node();
5353      }
5355      /**
5356       * Reconstructs the active formatting elements.
5357       *
5358       * > This has the effect of reopening all the formatting elements that were opened
5359       * > in the current body, cell, or caption (whichever is youngest) that haven't
5360       * > been explicitly closed.
5361       *
5362       * @since 6.4.0
5363       *
5364       * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
5365       *
5366       * @see https://html.spec.whatwg.org/#reconstruct-the-active-formatting-elements
5367       *
5368       * @return bool Whether any formatting elements needed to be reconstructed.
5369       */
5370  	private function reconstruct_active_formatting_elements(): bool {
5371          /*
5372           * > If there are no entries in the list of active formatting elements, then there is nothing
5373           * > to reconstruct; stop this algorithm.
5374           */
5375          if ( 0 === $this->state->active_formatting_elements->count() ) {
5376              return false;
5377          }
5379          $last_entry = $this->state->active_formatting_elements->current_node();
5380          if (
5382              /*
5383               * > If the last (most recently added) entry in the list of active formatting elements is a marker;
5384               * > stop this algorithm.
5385               */
5386              'marker' === $last_entry->node_name ||
5388              /*
5389               * > If the last (most recently added) entry in the list of active formatting elements is an
5390               * > element that is in the stack of open elements, then there is nothing to reconstruct;
5391               * > stop this algorithm.
5392               */
5393              $this->state->stack_of_open_elements->contains_node( $last_entry )
5394          ) {
5395              return false;
5396          }
5398          $this->bail( 'Cannot reconstruct active formatting elements when advancing and rewinding is required.' );
5399      }
5401      /**
5402       * Runs the reset the insertion mode appropriately algorithm.
5403       *
5404       * @since 6.7.0
5405       *
5406       * @see https://html.spec.whatwg.org/multipage/parsing.html#reset-the-insertion-mode-appropriately
5407       */
5408  	private function reset_insertion_mode_appropriately(): void {
5409          // Set the first node.
5410          $first_node = null;
5411          foreach ( $this->state->stack_of_open_elements->walk_down() as $first_node ) {
5412              break;
5413          }
5415          /*
5416           * > 1. Let _last_ be false.
5417           */
5418          $last = false;
5419          foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) {
5420              /*
5421               * > 2. Let _node_ be the last node in the stack of open elements.
5422               * > 3. _Loop_: If _node_ is the first node in the stack of open elements, then set _last_
5423               * >            to true, and, if the parser was created as part of the HTML fragment parsing
5424               * >            algorithm (fragment case), set node to the context element passed to
5425               * >            that algorithm.
5426               * > …
5427               */
5428              if ( $node === $first_node ) {
5429                  $last = true;
5430                  if ( isset( $this->context_node ) ) {
5431                      $node = $this->context_node;
5432                  }
5433              }
5435              // All of the following rules are for matching HTML elements.
5436              if ( 'html' !== $node->namespace ) {
5437                  continue;
5438              }
5440              switch ( $node->node_name ) {
5441                  /*
5442                   * > 4. If node is a `select` element, run these substeps:
5443                   * >   1. If _last_ is true, jump to the step below labeled done.
5444                   * >   2. Let _ancestor_ be _node_.
5445                   * >   3. _Loop_: If _ancestor_ is the first node in the stack of open elements,
5446                   * >      jump to the step below labeled done.
5447                   * >   4. Let ancestor be the node before ancestor in the stack of open elements.
5448                   * >   …
5449                   * >   7. Jump back to the step labeled _loop_.
5450                   * >   8. _Done_: Switch the insertion mode to "in select" and return.
5451                   */
5452                  case 'SELECT':
5453                      if ( ! $last ) {
5454                          foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $ancestor ) {
5455                              if ( 'html' !== $ancestor->namespace ) {
5456                                  continue;
5457                              }
5459                              switch ( $ancestor->node_name ) {
5460                                  /*
5461                                   * > 5. If _ancestor_ is a `template` node, jump to the step below
5462                                   * >    labeled _done_.
5463                                   */
5464                                  case 'TEMPLATE':
5465                                      break 2;
5467                                  /*
5468                                   * > 6. If _ancestor_ is a `table` node, switch the insertion mode to
5469                                   * >    "in select in table" and return.
5470                                   */
5471                                  case 'TABLE':
5472                                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE;
5473                                      return;
5474                              }
5475                          }
5476                      }
5477                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT;
5478                      return;
5480                  /*
5481                   * > 5. If _node_ is a `td` or `th` element and _last_ is false, then switch the
5482                   * >    insertion mode to "in cell" and return.
5483                   */
5484                  case 'TD':
5485                  case 'TH':
5486                      if ( ! $last ) {
5487                          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL;
5488                          return;
5489                      }
5490                      break;
5492                      /*
5493                      * > 6. If _node_ is a `tr` element, then switch the insertion mode to "in row"
5494                      * >    and return.
5495                      */
5496                  case 'TR':
5497                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
5498                      return;
5500                  /*
5501                   * > 7. If _node_ is a `tbody`, `thead`, or `tfoot` element, then switch the
5502                   * >    insertion mode to "in table body" and return.
5503                   */
5504                  case 'TBODY':
5505                  case 'THEAD':
5506                  case 'TFOOT':
5507                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
5508                      return;
5510                  /*
5511                   * > 8. If _node_ is a `caption` element, then switch the insertion mode to
5512                   * >    "in caption" and return.
5513                   */
5514                  case 'CAPTION':
5515                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION;
5516                      return;
5518                  /*
5519                   * > 9. If _node_ is a `colgroup` element, then switch the insertion mode to
5520                   * >    "in column group" and return.
5521                   */
5522                  case 'COLGROUP':
5523                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
5524                      return;
5526                  /*
5527                   * > 10. If _node_ is a `table` element, then switch the insertion mode to
5528                   * >     "in table" and return.
5529                   */
5530                  case 'TABLE':
5531                      $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
5532                      return;
5534                  /*
5535                   * > 11. If _node_ is a `template` element, then switch the insertion mode to the
5536                   * >     current template insertion mode and return.
5537                   */
5538                  case 'TEMPLATE':
5539                      $this->state->insertion_mode = end( $this->state->stack_of_template_insertion_modes );
5540                      return;
5542                  /*
5543                   * > 12. If _node_ is a `head` element and _last_ is false, then switch the
5544                   * >     insertion mode to "in head" and return.
5545                   */
5546                  case 'HEAD':
5547                      if ( ! $last ) {
5548                          $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
5549                          return;
5550                      }
5551                      break;
5553                  /*
5554                   * > 13. If _node_ is a `body` element, then switch the insertion mode to "in body"
5555                   * >     and return.
5556                   */