[ Index ] |
PHP Cross Reference of WordPress Trunk (Updated Daily) |
[Summary view] [Print] [Text view]
1 <?php 2 /** 3 * HTML API: WP_HTML_Processor class 4 * 5 * @package WordPress 6 * @subpackage HTML-API 7 * @since 6.4.0 8 */ 9 10 /** 11 * Core class used to safely parse and modify an HTML document. 12 * 13 * The HTML Processor class properly parses and modifies HTML5 documents. 14 * 15 * It supports a subset of the HTML5 specification, and when it encounters 16 * unsupported markup, it aborts early to avoid unintentionally breaking 17 * the document. The HTML Processor should never break an HTML document. 18 * 19 * While the `WP_HTML_Tag_Processor` is a valuable tool for modifying 20 * attributes on individual HTML tags, the HTML Processor is more capable 21 * and useful for the following operations: 22 * 23 * - Querying based on nested HTML structure. 24 * 25 * Eventually the HTML Processor will also support: 26 * - Wrapping a tag in surrounding HTML. 27 * - Unwrapping a tag by removing its parent. 28 * - Inserting and removing nodes. 29 * - Reading and changing inner content. 30 * - Navigating up or around HTML structure. 31 * 32 * ## Usage 33 * 34 * Use of this class requires three steps: 35 * 36 * 1. Call a static creator method with your input HTML document. 37 * 2. Find the location in the document you are looking for. 38 * 3. Request changes to the document at that location. 39 * 40 * Example: 41 * 42 * $processor = WP_HTML_Processor::create_fragment( $html ); 43 * if ( $processor->next_tag( array( 'breadcrumbs' => array( 'DIV', 'FIGURE', 'IMG' ) ) ) ) { 44 * $processor->add_class( 'responsive-image' ); 45 * } 46 * 47 * #### Breadcrumbs 48 * 49 * Breadcrumbs represent the stack of open elements from the root 50 * of the document or fragment down to the currently-matched node, 51 * if one is currently selected. Call WP_HTML_Processor::get_breadcrumbs() 52 * to inspect the breadcrumbs for a matched tag. 53 * 54 * Breadcrumbs can specify nested HTML structure and are equivalent 55 * to a CSS selector comprising tag names separated by the child 56 * combinator, such as "DIV > FIGURE > IMG". 57 * 58 * Since all elements find themselves inside a full HTML document 59 * when parsed, the return value from `get_breadcrumbs()` will always 60 * contain any implicit outermost elements. For example, when parsing 61 * with `create_fragment()` in the `BODY` context (the default), any 62 * tag in the given HTML document will contain `array( 'HTML', 'BODY', … )` 63 * in its breadcrumbs. 64 * 65 * Despite containing the implied outermost elements in their breadcrumbs, 66 * tags may be found with the shortest-matching breadcrumb query. That is, 67 * `array( 'IMG' )` matches all IMG elements and `array( 'P', 'IMG' )` 68 * matches all IMG elements directly inside a P element. To ensure that no 69 * partial matches erroneously match it's possible to specify in a query 70 * the full breadcrumb match all the way down from the root HTML element. 71 * 72 * Example: 73 * 74 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>'; 75 * // ----- Matches here. 76 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) ); 77 * 78 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>'; 79 * // ---- Matches here. 80 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'FIGCAPTION', 'EM' ) ) ); 81 * 82 * $html = '<div><img></div><img>'; 83 * // ----- Matches here, because IMG must be a direct child of the implicit BODY. 84 * $processor->next_tag( array( 'breadcrumbs' => array( 'BODY', 'IMG' ) ) ); 85 * 86 * ## HTML Support 87 * 88 * This class implements a small part of the HTML5 specification. 89 * It's designed to operate within its support and abort early whenever 90 * encountering circumstances it can't properly handle. This is 91 * the principle way in which this class remains as simple as possible 92 * without cutting corners and breaking compliance. 93 * 94 * ### Supported elements 95 * 96 * If any unsupported element appears in the HTML input the HTML Processor 97 * will abort early and stop all processing. This draconian measure ensures 98 * that the HTML Processor won't break any HTML it doesn't fully understand. 99 * 100 * The HTML Processor supports all elements other than a specific set: 101 * 102 * - Any element inside a TABLE. 103 * - Any element inside foreign content, including SVG and MATH. 104 * - Any element outside the IN BODY insertion mode, e.g. doctype declarations, meta, links. 105 * 106 * ### Supported markup 107 * 108 * Some kinds of non-normative HTML involve reconstruction of formatting elements and 109 * re-parenting of mis-nested elements. For example, a DIV tag found inside a TABLE 110 * may in fact belong _before_ the table in the DOM. If the HTML Processor encounters 111 * such a case it will stop processing. 112 * 113 * The following list illustrates some common examples of unexpected HTML inputs that 114 * the HTML Processor properly parses and represents: 115 * 116 * - HTML with optional tags omitted, e.g. `<p>one<p>two`. 117 * - HTML with unexpected tag closers, e.g. `<p>one </span> more</p>`. 118 * - Non-void tags with self-closing flag, e.g. `<div/>the DIV is still open.</div>`. 119 * - Heading elements which close open heading elements of another level, e.g. `<h1>Closed by </h2>`. 120 * - Elements containing text that looks like other tags but isn't, e.g. `<title>The <img> is plaintext</title>`. 121 * - SCRIPT and STYLE tags containing text that looks like HTML but isn't, e.g. `<script>document.write('<p>Hi</p>');</script>`. 122 * - SCRIPT content which has been escaped, e.g. `<script><!-- document.write('<script>console.log("hi")</script>') --></script>`. 123 * 124 * ### Unsupported Features 125 * 126 * This parser does not report parse errors. 127 * 128 * Normally, when additional HTML or BODY tags are encountered in a document, if there 129 * are any additional attributes on them that aren't found on the previous elements, 130 * the existing HTML and BODY elements adopt those missing attribute values. This 131 * parser does not add those additional attributes. 132 * 133 * In certain situations, elements are moved to a different part of the document in 134 * a process called "adoption" and "fostering." Because the nodes move to a location 135 * in the document that the parser had already processed, this parser does not support 136 * these situations and will bail. 137 * 138 * @since 6.4.0 139 * 140 * @see WP_HTML_Tag_Processor 141 * @see https://html.spec.whatwg.org/ 142 */ 143 class WP_HTML_Processor extends WP_HTML_Tag_Processor { 144 /** 145 * The maximum number of bookmarks allowed to exist at any given time. 146 * 147 * HTML processing requires more bookmarks than basic tag processing, 148 * so this class constant from the Tag Processor is overwritten. 149 * 150 * @since 6.4.0 151 * 152 * @var int 153 */ 154 const MAX_BOOKMARKS = 100; 155 156 /** 157 * Holds the working state of the parser, including the stack of 158 * open elements and the stack of active formatting elements. 159 * 160 * Initialized in the constructor. 161 * 162 * @since 6.4.0 163 * 164 * @var WP_HTML_Processor_State 165 */ 166 private $state; 167 168 /** 169 * Used to create unique bookmark names. 170 * 171 * This class sets a bookmark for every tag in the HTML document that it encounters. 172 * The bookmark name is auto-generated and increments, starting with `1`. These are 173 * internal bookmarks and are automatically released when the referring WP_HTML_Token 174 * goes out of scope and is garbage-collected. 175 * 176 * @since 6.4.0 177 * 178 * @see WP_HTML_Processor::$release_internal_bookmark_on_destruct 179 * 180 * @var int 181 */ 182 private $bookmark_counter = 0; 183 184 /** 185 * Stores an explanation for why something failed, if it did. 186 * 187 * @see self::get_last_error 188 * 189 * @since 6.4.0 190 * 191 * @var string|null 192 */ 193 private $last_error = null; 194 195 /** 196 * Stores context for why the parser bailed on unsupported HTML, if it did. 197 * 198 * @see self::get_unsupported_exception 199 * 200 * @since 6.7.0 201 * 202 * @var WP_HTML_Unsupported_Exception|null 203 */ 204 private $unsupported_exception = null; 205 206 /** 207 * Releases a bookmark when PHP garbage-collects its wrapping WP_HTML_Token instance. 208 * 209 * This function is created inside the class constructor so that it can be passed to 210 * the stack of open elements and the stack of active formatting elements without 211 * exposing it as a public method on the class. 212 * 213 * @since 6.4.0 214 * 215 * @var Closure|null 216 */ 217 private $release_internal_bookmark_on_destruct = null; 218 219 /** 220 * Stores stack events which arise during parsing of the 221 * HTML document, which will then supply the "match" events. 222 * 223 * @since 6.6.0 224 * 225 * @var WP_HTML_Stack_Event[] 226 */ 227 private $element_queue = array(); 228 229 /** 230 * Stores the current breadcrumbs. 231 * 232 * @since 6.7.0 233 * 234 * @var string[] 235 */ 236 private $breadcrumbs = array(); 237 238 /** 239 * Current stack event, if set, representing a matched token. 240 * 241 * Because the parser may internally point to a place further along in a document 242 * than the nodes which have already been processed (some "virtual" nodes may have 243 * appeared while scanning the HTML document), this will point at the "current" node 244 * being processed. It comes from the front of the element queue. 245 * 246 * @since 6.6.0 247 * 248 * @var WP_HTML_Stack_Event|null 249 */ 250 private $current_element = null; 251 252 /** 253 * Context node if created as a fragment parser. 254 * 255 * @var WP_HTML_Token|null 256 */ 257 private $context_node = null; 258 259 /* 260 * Public Interface Functions 261 */ 262 263 /** 264 * Creates an HTML processor in the fragment parsing mode. 265 * 266 * Use this for cases where you are processing chunks of HTML that 267 * will be found within a bigger HTML document, such as rendered 268 * block output that exists within a post, `the_content` inside a 269 * rendered site layout. 270 * 271 * Fragment parsing occurs within a context, which is an HTML element 272 * that the document will eventually be placed in. It becomes important 273 * when special elements have different rules than others, such as inside 274 * a TEXTAREA or a TITLE tag where things that look like tags are text, 275 * or inside a SCRIPT tag where things that look like HTML syntax are JS. 276 * 277 * The context value should be a representation of the tag into which the 278 * HTML is found. For most cases this will be the body element. The HTML 279 * form is provided because a context element may have attributes that 280 * impact the parse, such as with a SCRIPT tag and its `type` attribute. 281 * 282 * ## Current HTML Support 283 * 284 * - The only supported context is `<body>`, which is the default value. 285 * - The only supported document encoding is `UTF-8`, which is the default value. 286 * 287 * @since 6.4.0 288 * @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances. 289 * 290 * @param string $html Input HTML fragment to process. 291 * @param string $context Context element for the fragment, must be default of `<body>`. 292 * @param string $encoding Text encoding of the document; must be default of 'UTF-8'. 293 * @return static|null The created processor if successful, otherwise null. 294 */ 295 public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) { 296 if ( '<body>' !== $context || 'UTF-8' !== $encoding ) { 297 return null; 298 } 299 300 $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE ); 301 $processor->state->context_node = array( 'BODY', array() ); 302 $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 303 $processor->state->encoding = $encoding; 304 $processor->state->encoding_confidence = 'certain'; 305 306 // @todo Create "fake" bookmarks for non-existent but implied nodes. 307 $processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 ); 308 $processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 ); 309 310 $root_node = new WP_HTML_Token( 311 'root-node', 312 'HTML', 313 false 314 ); 315 316 $processor->state->stack_of_open_elements->push( $root_node ); 317 318 $context_node = new WP_HTML_Token( 319 'context-node', 320 $processor->state->context_node[0], 321 false 322 ); 323 324 $processor->context_node = $context_node; 325 $processor->breadcrumbs = array( 'HTML', $context_node->node_name ); 326 327 return $processor; 328 } 329 330 /** 331 * Creates an HTML processor in the full parsing mode. 332 * 333 * It's likely that a fragment parser is more appropriate, unless sending an 334 * entire HTML document from start to finish. Consider a fragment parser with 335 * a context node of `<body>`. 336 * 337 * Since UTF-8 is the only currently-accepted charset, if working with a 338 * document that isn't UTF-8, it's important to convert the document before 339 * creating the processor: pass in the converted HTML. 340 * 341 * @param string $html Input HTML document to process. 342 * @param string|null $known_definite_encoding Optional. If provided, specifies the charset used 343 * in the input byte stream. Currently must be UTF-8. 344 * @return static|null The created processor if successful, otherwise null. 345 */ 346 public static function create_full_parser( $html, $known_definite_encoding = 'UTF-8' ) { 347 if ( 'UTF-8' !== $known_definite_encoding ) { 348 return null; 349 } 350 351 $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE ); 352 $processor->state->encoding = $known_definite_encoding; 353 $processor->state->encoding_confidence = 'certain'; 354 355 return $processor; 356 } 357 358 /** 359 * Constructor. 360 * 361 * Do not use this method. Use the static creator methods instead. 362 * 363 * @access private 364 * 365 * @since 6.4.0 366 * 367 * @see WP_HTML_Processor::create_fragment() 368 * 369 * @param string $html HTML to process. 370 * @param string|null $use_the_static_create_methods_instead This constructor should not be called manually. 371 */ 372 public function __construct( $html, $use_the_static_create_methods_instead = null ) { 373 parent::__construct( $html ); 374 375 if ( self::CONSTRUCTOR_UNLOCK_CODE !== $use_the_static_create_methods_instead ) { 376 _doing_it_wrong( 377 __METHOD__, 378 sprintf( 379 /* translators: %s: WP_HTML_Processor::create_fragment(). */ 380 __( 'Call %s to create an HTML Processor instead of calling the constructor directly.' ), 381 '<code>WP_HTML_Processor::create_fragment()</code>' 382 ), 383 '6.4.0' 384 ); 385 } 386 387 $this->state = new WP_HTML_Processor_State(); 388 389 $this->state->stack_of_open_elements->set_push_handler( 390 function ( WP_HTML_Token $token ): void { 391 $is_virtual = ! isset( $this->state->current_token ) || $this->is_tag_closer(); 392 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name; 393 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real'; 394 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::PUSH, $provenance ); 395 396 $this->change_parsing_namespace( $token->namespace ); 397 } 398 ); 399 400 $this->state->stack_of_open_elements->set_pop_handler( 401 function ( WP_HTML_Token $token ): void { 402 $is_virtual = ! isset( $this->state->current_token ) || ! $this->is_tag_closer(); 403 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name; 404 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real'; 405 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::POP, $provenance ); 406 $adjusted_current_node = $this->get_adjusted_current_node(); 407 $this->change_parsing_namespace( 408 $adjusted_current_node 409 ? $adjusted_current_node->namespace 410 : 'html' 411 ); 412 } 413 ); 414 415 /* 416 * Create this wrapper so that it's possible to pass 417 * a private method into WP_HTML_Token classes without 418 * exposing it to any public API. 419 */ 420 $this->release_internal_bookmark_on_destruct = function ( string $name ): void { 421 parent::release_bookmark( $name ); 422 }; 423 } 424 425 /** 426 * Stops the parser and terminates its execution when encountering unsupported markup. 427 * 428 * @throws WP_HTML_Unsupported_Exception Halts execution of the parser. 429 * 430 * @since 6.7.0 431 * 432 * @param string $message Explains support is missing in order to parse the current node. 433 */ 434 private function bail( string $message ) { 435 $here = $this->bookmarks[ $this->state->current_token->bookmark_name ]; 436 $token = substr( $this->html, $here->start, $here->length ); 437 438 $open_elements = array(); 439 foreach ( $this->state->stack_of_open_elements->stack as $item ) { 440 $open_elements[] = $item->node_name; 441 } 442 443 $active_formats = array(); 444 foreach ( $this->state->active_formatting_elements->walk_down() as $item ) { 445 $active_formats[] = $item->node_name; 446 } 447 448 $this->last_error = self::ERROR_UNSUPPORTED; 449 450 $this->unsupported_exception = new WP_HTML_Unsupported_Exception( 451 $message, 452 $this->state->current_token->node_name, 453 $here->start, 454 $token, 455 $open_elements, 456 $active_formats 457 ); 458 459 throw $this->unsupported_exception; 460 } 461 462 /** 463 * Returns the last error, if any. 464 * 465 * Various situations lead to parsing failure but this class will 466 * return `false` in all those cases. To determine why something 467 * failed it's possible to request the last error. This can be 468 * helpful to know to distinguish whether a given tag couldn't 469 * be found or if content in the document caused the processor 470 * to give up and abort processing. 471 * 472 * Example 473 * 474 * $processor = WP_HTML_Processor::create_fragment( '<template><strong><button><em><p><em>' ); 475 * false === $processor->next_tag(); 476 * WP_HTML_Processor::ERROR_UNSUPPORTED === $processor->get_last_error(); 477 * 478 * @since 6.4.0 479 * 480 * @see self::ERROR_UNSUPPORTED 481 * @see self::ERROR_EXCEEDED_MAX_BOOKMARKS 482 * 483 * @return string|null The last error, if one exists, otherwise null. 484 */ 485 public function get_last_error(): ?string { 486 return $this->last_error; 487 } 488 489 /** 490 * Returns context for why the parser aborted due to unsupported HTML, if it did. 491 * 492 * This is meant for debugging purposes, not for production use. 493 * 494 * @since 6.7.0 495 * 496 * @see self::$unsupported_exception 497 * 498 * @return WP_HTML_Unsupported_Exception|null 499 */ 500 public function get_unsupported_exception() { 501 return $this->unsupported_exception; 502 } 503 504 /** 505 * Finds the next tag matching the $query. 506 * 507 * @todo Support matching the class name and tag name. 508 * 509 * @since 6.4.0 510 * @since 6.6.0 Visits all tokens, including virtual ones. 511 * 512 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. 513 * 514 * @param array|string|null $query { 515 * Optional. Which tag name to find, having which class, etc. Default is to find any tag. 516 * 517 * @type string|null $tag_name Which tag to find, or `null` for "any tag." 518 * @type string $tag_closers 'visit' to pause at tag closers, 'skip' or unset to only visit openers. 519 * @type int|null $match_offset Find the Nth tag matching all search criteria. 520 * 1 for "first" tag, 3 for "third," etc. 521 * Defaults to first tag. 522 * @type string|null $class_name Tag must contain this whole class name to match. 523 * @type string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`. 524 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`. 525 * } 526 * @return bool Whether a tag was matched. 527 */ 528 public function next_tag( $query = null ): bool { 529 $visit_closers = isset( $query['tag_closers'] ) && 'visit' === $query['tag_closers']; 530 531 if ( null === $query ) { 532 while ( $this->next_token() ) { 533 if ( '#tag' !== $this->get_token_type() ) { 534 continue; 535 } 536 537 if ( ! $this->is_tag_closer() || $visit_closers ) { 538 return true; 539 } 540 } 541 542 return false; 543 } 544 545 if ( is_string( $query ) ) { 546 $query = array( 'breadcrumbs' => array( $query ) ); 547 } 548 549 if ( ! is_array( $query ) ) { 550 _doing_it_wrong( 551 __METHOD__, 552 __( 'Please pass a query array to this function.' ), 553 '6.4.0' 554 ); 555 return false; 556 } 557 558 $needs_class = ( isset( $query['class_name'] ) && is_string( $query['class_name'] ) ) 559 ? $query['class_name'] 560 : null; 561 562 if ( ! ( array_key_exists( 'breadcrumbs', $query ) && is_array( $query['breadcrumbs'] ) ) ) { 563 while ( $this->next_token() ) { 564 if ( '#tag' !== $this->get_token_type() ) { 565 continue; 566 } 567 568 if ( isset( $query['tag_name'] ) && $query['tag_name'] !== $this->get_token_name() ) { 569 continue; 570 } 571 572 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) { 573 continue; 574 } 575 576 if ( ! $this->is_tag_closer() || $visit_closers ) { 577 return true; 578 } 579 } 580 581 return false; 582 } 583 584 $breadcrumbs = $query['breadcrumbs']; 585 $match_offset = isset( $query['match_offset'] ) ? (int) $query['match_offset'] : 1; 586 587 while ( $match_offset > 0 && $this->next_token() ) { 588 if ( '#tag' !== $this->get_token_type() || $this->is_tag_closer() ) { 589 continue; 590 } 591 592 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) { 593 continue; 594 } 595 596 if ( $this->matches_breadcrumbs( $breadcrumbs ) && 0 === --$match_offset ) { 597 return true; 598 } 599 } 600 601 return false; 602 } 603 604 /** 605 * Ensures internal accounting is maintained for HTML semantic rules while 606 * the underlying Tag Processor class is seeking to a bookmark. 607 * 608 * This doesn't currently have a way to represent non-tags and doesn't process 609 * semantic rules for text nodes. For access to the raw tokens consider using 610 * WP_HTML_Tag_Processor instead. 611 * 612 * @since 6.5.0 Added for internal support; do not use. 613 * 614 * @access private 615 * 616 * @return bool 617 */ 618 public function next_token(): bool { 619 $this->current_element = null; 620 621 if ( isset( $this->last_error ) ) { 622 return false; 623 } 624 625 /* 626 * Prime the events if there are none. 627 * 628 * @todo In some cases, probably related to the adoption agency 629 * algorithm, this call to step() doesn't create any new 630 * events. Calling it again creates them. Figure out why 631 * this is and if it's inherent or if it's a bug. Looping 632 * until there are events or until there are no more 633 * tokens works in the meantime and isn't obviously wrong. 634 */ 635 if ( empty( $this->element_queue ) && $this->step() ) { 636 return $this->next_token(); 637 } 638 639 // Process the next event on the queue. 640 $this->current_element = array_shift( $this->element_queue ); 641 if ( ! isset( $this->current_element ) ) { 642 // There are no tokens left, so close all remaining open elements. 643 while ( $this->state->stack_of_open_elements->pop() ) { 644 continue; 645 } 646 647 return empty( $this->element_queue ) ? false : $this->next_token(); 648 } 649 650 $is_pop = WP_HTML_Stack_Event::POP === $this->current_element->operation; 651 652 /* 653 * The root node only exists in the fragment parser, and closing it 654 * indicates that the parse is complete. Stop before popping it from 655 * the breadcrumbs. 656 */ 657 if ( 'root-node' === $this->current_element->token->bookmark_name ) { 658 return $this->next_token(); 659 } 660 661 // Adjust the breadcrumbs for this event. 662 if ( $is_pop ) { 663 array_pop( $this->breadcrumbs ); 664 } else { 665 $this->breadcrumbs[] = $this->current_element->token->node_name; 666 } 667 668 // Avoid sending close events for elements which don't expect a closing. 669 if ( $is_pop && ! $this->expects_closer( $this->current_element->token ) ) { 670 return $this->next_token(); 671 } 672 673 return true; 674 } 675 676 /** 677 * Indicates if the current tag token is a tag closer. 678 * 679 * Example: 680 * 681 * $p = WP_HTML_Processor::create_fragment( '<div></div>' ); 682 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); 683 * $p->is_tag_closer() === false; 684 * 685 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); 686 * $p->is_tag_closer() === true; 687 * 688 * @since 6.6.0 Subclassed for HTML Processor. 689 * 690 * @return bool Whether the current tag is a tag closer. 691 */ 692 public function is_tag_closer(): bool { 693 return $this->is_virtual() 694 ? ( WP_HTML_Stack_Event::POP === $this->current_element->operation && '#tag' === $this->get_token_type() ) 695 : parent::is_tag_closer(); 696 } 697 698 /** 699 * Indicates if the currently-matched token is virtual, created by a stack operation 700 * while processing HTML, rather than a token found in the HTML text itself. 701 * 702 * @since 6.6.0 703 * 704 * @return bool Whether the current token is virtual. 705 */ 706 private function is_virtual(): bool { 707 return ( 708 isset( $this->current_element->provenance ) && 709 'virtual' === $this->current_element->provenance 710 ); 711 } 712 713 /** 714 * Indicates if the currently-matched tag matches the given breadcrumbs. 715 * 716 * A "*" represents a single tag wildcard, where any tag matches, but not no tags. 717 * 718 * At some point this function _may_ support a `**` syntax for matching any number 719 * of unspecified tags in the breadcrumb stack. This has been intentionally left 720 * out, however, to keep this function simple and to avoid introducing backtracking, 721 * which could open up surprising performance breakdowns. 722 * 723 * Example: 724 * 725 * $processor = WP_HTML_Processor::create_fragment( '<div><span><figure><img></figure></span></div>' ); 726 * $processor->next_tag( 'img' ); 727 * true === $processor->matches_breadcrumbs( array( 'figure', 'img' ) ); 728 * true === $processor->matches_breadcrumbs( array( 'span', 'figure', 'img' ) ); 729 * false === $processor->matches_breadcrumbs( array( 'span', 'img' ) ); 730 * true === $processor->matches_breadcrumbs( array( 'span', '*', 'img' ) ); 731 * 732 * @since 6.4.0 733 * 734 * @param string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`. 735 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`. 736 * @return bool Whether the currently-matched tag is found at the given nested structure. 737 */ 738 public function matches_breadcrumbs( $breadcrumbs ): bool { 739 // Everything matches when there are zero constraints. 740 if ( 0 === count( $breadcrumbs ) ) { 741 return true; 742 } 743 744 // Start at the last crumb. 745 $crumb = end( $breadcrumbs ); 746 747 if ( '*' !== $crumb && $this->get_tag() !== strtoupper( $crumb ) ) { 748 return false; 749 } 750 751 for ( $i = count( $this->breadcrumbs ) - 1; $i >= 0; $i-- ) { 752 $node = $this->breadcrumbs[ $i ]; 753 $crumb = strtoupper( current( $breadcrumbs ) ); 754 755 if ( '*' !== $crumb && $node !== $crumb ) { 756 return false; 757 } 758 759 if ( false === prev( $breadcrumbs ) ) { 760 return true; 761 } 762 } 763 764 return false; 765 } 766 767 /** 768 * Indicates if the currently-matched node expects a closing 769 * token, or if it will self-close on the next step. 770 * 771 * Most HTML elements expect a closer, such as a P element or 772 * a DIV element. Others, like an IMG element are void and don't 773 * have a closing tag. Special elements, such as SCRIPT and STYLE, 774 * are treated just like void tags. Text nodes and self-closing 775 * foreign content will also act just like a void tag, immediately 776 * closing as soon as the processor advances to the next token. 777 * 778 * @since 6.6.0 779 * 780 * @param WP_HTML_Token|null $node Optional. Node to examine, if provided. 781 * Default is to examine current node. 782 * @return bool|null Whether to expect a closer for the currently-matched node, 783 * or `null` if not matched on any token. 784 */ 785 public function expects_closer( WP_HTML_Token $node = null ): ?bool { 786 $token_name = $node->node_name ?? $this->get_token_name(); 787 788 if ( ! isset( $token_name ) ) { 789 return null; 790 } 791 792 $token_namespace = $node->namespace ?? $this->get_namespace(); 793 $token_has_self_closing = $node->has_self_closing_flag ?? $this->has_self_closing_flag(); 794 795 return ! ( 796 // Comments, text nodes, and other atomic tokens. 797 '#' === $token_name[0] || 798 // Doctype declarations. 799 'html' === $token_name || 800 // Void elements. 801 self::is_void( $token_name ) || 802 // Special atomic elements. 803 ( 'html' === $token_namespace && in_array( $token_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) ) || 804 // Self-closing elements in foreign content. 805 ( 'html' !== $token_namespace && $token_has_self_closing ) 806 ); 807 } 808 809 /** 810 * Steps through the HTML document and stop at the next tag, if any. 811 * 812 * @since 6.4.0 813 * 814 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. 815 * 816 * @see self::PROCESS_NEXT_NODE 817 * @see self::REPROCESS_CURRENT_NODE 818 * 819 * @param string $node_to_process Whether to parse the next node or reprocess the current node. 820 * @return bool Whether a tag was matched. 821 */ 822 public function step( $node_to_process = self::PROCESS_NEXT_NODE ): bool { 823 // Refuse to proceed if there was a previous error. 824 if ( null !== $this->last_error ) { 825 return false; 826 } 827 828 if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) { 829 /* 830 * Void elements still hop onto the stack of open elements even though 831 * there's no corresponding closing tag. This is important for managing 832 * stack-based operations such as "navigate to parent node" or checking 833 * on an element's breadcrumbs. 834 * 835 * When moving on to the next node, therefore, if the bottom-most element 836 * on the stack is a void element, it must be closed. 837 */ 838 $top_node = $this->state->stack_of_open_elements->current_node(); 839 if ( isset( $top_node ) && ! $this->expects_closer( $top_node ) ) { 840 $this->state->stack_of_open_elements->pop(); 841 } 842 } 843 844 if ( self::PROCESS_NEXT_NODE === $node_to_process ) { 845 parent::next_token(); 846 if ( WP_HTML_Tag_Processor::STATE_TEXT_NODE === $this->parser_state ) { 847 parent::subdivide_text_appropriately(); 848 } 849 } 850 851 // Finish stepping when there are no more tokens in the document. 852 if ( 853 WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state || 854 WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state 855 ) { 856 return false; 857 } 858 859 $adjusted_current_node = $this->get_adjusted_current_node(); 860 $is_closer = $this->is_tag_closer(); 861 $is_start_tag = WP_HTML_Tag_Processor::STATE_MATCHED_TAG === $this->parser_state && ! $is_closer; 862 $token_name = $this->get_token_name(); 863 864 if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) { 865 $this->state->current_token = new WP_HTML_Token( 866 $this->bookmark_token(), 867 $token_name, 868 $this->has_self_closing_flag(), 869 $this->release_internal_bookmark_on_destruct 870 ); 871 } 872 873 $parse_in_current_insertion_mode = ( 874 0 === $this->state->stack_of_open_elements->count() || 875 'html' === $adjusted_current_node->namespace || 876 ( 877 'math' === $adjusted_current_node->integration_node_type && 878 ( 879 ( $is_start_tag && ! in_array( $token_name, array( 'MGLYPH', 'MALIGNMARK' ), true ) ) || 880 '#text' === $token_name 881 ) 882 ) || 883 ( 884 'math' === $adjusted_current_node->namespace && 885 'ANNOTATION-XML' === $adjusted_current_node->node_name && 886 $is_start_tag && 'SVG' === $token_name 887 ) || 888 ( 889 'html' === $adjusted_current_node->integration_node_type && 890 ( $is_start_tag || '#text' === $token_name ) 891 ) 892 ); 893 894 try { 895 if ( ! $parse_in_current_insertion_mode ) { 896 return $this->step_in_foreign_content(); 897 } 898 899 switch ( $this->state->insertion_mode ) { 900 case WP_HTML_Processor_State::INSERTION_MODE_INITIAL: 901 return $this->step_initial(); 902 903 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML: 904 return $this->step_before_html(); 905 906 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD: 907 return $this->step_before_head(); 908 909 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD: 910 return $this->step_in_head(); 911 912 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT: 913 return $this->step_in_head_noscript(); 914 915 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD: 916 return $this->step_after_head(); 917 918 case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY: 919 return $this->step_in_body(); 920 921 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE: 922 return $this->step_in_table(); 923 924 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT: 925 return $this->step_in_table_text(); 926 927 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION: 928 return $this->step_in_caption(); 929 930 case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP: 931 return $this->step_in_column_group(); 932 933 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY: 934 return $this->step_in_table_body(); 935 936 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW: 937 return $this->step_in_row(); 938 939 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL: 940 return $this->step_in_cell(); 941 942 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT: 943 return $this->step_in_select(); 944 945 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE: 946 return $this->step_in_select_in_table(); 947 948 case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE: 949 return $this->step_in_template(); 950 951 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY: 952 return $this->step_after_body(); 953 954 case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET: 955 return $this->step_in_frameset(); 956 957 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET: 958 return $this->step_after_frameset(); 959 960 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY: 961 return $this->step_after_after_body(); 962 963 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET: 964 return $this->step_after_after_frameset(); 965 966 // This should be unreachable but PHP doesn't have total type checking on switch. 967 default: 968 $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." ); 969 } 970 } catch ( WP_HTML_Unsupported_Exception $e ) { 971 /* 972 * Exceptions are used in this class to escape deep call stacks that 973 * otherwise might involve messier calling and return conventions. 974 */ 975 return false; 976 } 977 } 978 979 /** 980 * Computes the HTML breadcrumbs for the currently-matched node, if matched. 981 * 982 * Breadcrumbs start at the outermost parent and descend toward the matched element. 983 * They always include the entire path from the root HTML node to the matched element. 984 * 985 * @todo It could be more efficient to expose a generator-based version of this function 986 * to avoid creating the array copy on tag iteration. If this is done, it would likely 987 * be more useful to walk up the stack when yielding instead of starting at the top. 988 * 989 * Example 990 * 991 * $processor = WP_HTML_Processor::create_fragment( '<p><strong><em><img></em></strong></p>' ); 992 * $processor->next_tag( 'IMG' ); 993 * $processor->get_breadcrumbs() === array( 'HTML', 'BODY', 'P', 'STRONG', 'EM', 'IMG' ); 994 * 995 * @since 6.4.0 996 * 997 * @return string[]|null Array of tag names representing path to matched node, if matched, otherwise NULL. 998 */ 999 public function get_breadcrumbs(): ?array { 1000 return $this->breadcrumbs; 1001 } 1002 1003 /** 1004 * Returns the nesting depth of the current location in the document. 1005 * 1006 * Example: 1007 * 1008 * $processor = WP_HTML_Processor::create_fragment( '<div><p></p></div>' ); 1009 * // The processor starts in the BODY context, meaning it has depth from the start: HTML > BODY. 1010 * 2 === $processor->get_current_depth(); 1011 * 1012 * // Opening the DIV element increases the depth. 1013 * $processor->next_token(); 1014 * 3 === $processor->get_current_depth(); 1015 * 1016 * // Opening the P element increases the depth. 1017 * $processor->next_token(); 1018 * 4 === $processor->get_current_depth(); 1019 * 1020 * // The P element is closed during `next_token()` so the depth is decreased to reflect that. 1021 * $processor->next_token(); 1022 * 3 === $processor->get_current_depth(); 1023 * 1024 * @since 6.6.0 1025 * 1026 * @return int Nesting-depth of current location in the document. 1027 */ 1028 public function get_current_depth(): int { 1029 return count( $this->breadcrumbs ); 1030 } 1031 1032 /** 1033 * Parses next element in the 'initial' insertion mode. 1034 * 1035 * This internal function performs the 'initial' insertion mode 1036 * logic for the generalized WP_HTML_Processor::step() function. 1037 * 1038 * @since 6.7.0 1039 * 1040 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1041 * 1042 * @see https://html.spec.whatwg.org/#the-initial-insertion-mode 1043 * @see WP_HTML_Processor::step 1044 * 1045 * @return bool Whether an element was found. 1046 */ 1047 private function step_initial(): bool { 1048 $token_name = $this->get_token_name(); 1049 $token_type = $this->get_token_type(); 1050 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 1051 $op = "{$op_sigil}{$token_name}"; 1052 1053 switch ( $op ) { 1054 /* 1055 * > A character token that is one of U+0009 CHARACTER TABULATION, 1056 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1057 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1058 * 1059 * Parse error: ignore the token. 1060 */ 1061 case '#text': 1062 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1063 return $this->step(); 1064 } 1065 goto initial_anything_else; 1066 break; 1067 1068 /* 1069 * > A comment token 1070 */ 1071 case '#comment': 1072 case '#funky-comment': 1073 case '#presumptuous-tag': 1074 $this->insert_html_element( $this->state->current_token ); 1075 return true; 1076 1077 /* 1078 * > A DOCTYPE token 1079 */ 1080 case 'html': 1081 $doctype = $this->get_doctype_info(); 1082 if ( null !== $doctype && 'quirks' === $doctype->indicated_compatability_mode ) { 1083 $this->compat_mode = WP_HTML_Tag_Processor::QUIRKS_MODE; 1084 } 1085 1086 /* 1087 * > Then, switch the insertion mode to "before html". 1088 */ 1089 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML; 1090 $this->insert_html_element( $this->state->current_token ); 1091 return true; 1092 } 1093 1094 /* 1095 * > Anything else 1096 */ 1097 initial_anything_else: 1098 $this->compat_mode = WP_HTML_Tag_Processor::QUIRKS_MODE; 1099 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML; 1100 return $this->step( self::REPROCESS_CURRENT_NODE ); 1101 } 1102 1103 /** 1104 * Parses next element in the 'before html' insertion mode. 1105 * 1106 * This internal function performs the 'before html' insertion mode 1107 * logic for the generalized WP_HTML_Processor::step() function. 1108 * 1109 * @since 6.7.0 1110 * 1111 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1112 * 1113 * @see https://html.spec.whatwg.org/#the-before-html-insertion-mode 1114 * @see WP_HTML_Processor::step 1115 * 1116 * @return bool Whether an element was found. 1117 */ 1118 private function step_before_html(): bool { 1119 $token_name = $this->get_token_name(); 1120 $token_type = $this->get_token_type(); 1121 $is_closer = parent::is_tag_closer(); 1122 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1123 $op = "{$op_sigil}{$token_name}"; 1124 1125 switch ( $op ) { 1126 /* 1127 * > A DOCTYPE token 1128 */ 1129 case 'html': 1130 // Parse error: ignore the token. 1131 return $this->step(); 1132 1133 /* 1134 * > A comment token 1135 */ 1136 case '#comment': 1137 case '#funky-comment': 1138 case '#presumptuous-tag': 1139 $this->insert_html_element( $this->state->current_token ); 1140 return true; 1141 1142 /* 1143 * > A character token that is one of U+0009 CHARACTER TABULATION, 1144 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1145 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1146 * 1147 * Parse error: ignore the token. 1148 */ 1149 case '#text': 1150 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1151 return $this->step(); 1152 } 1153 goto before_html_anything_else; 1154 break; 1155 1156 /* 1157 * > A start tag whose tag name is "html" 1158 */ 1159 case '+HTML': 1160 $this->insert_html_element( $this->state->current_token ); 1161 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD; 1162 return true; 1163 1164 /* 1165 * > An end tag whose tag name is one of: "head", "body", "html", "br" 1166 * 1167 * Closing BR tags are always reported by the Tag Processor as opening tags. 1168 */ 1169 case '-HEAD': 1170 case '-BODY': 1171 case '-HTML': 1172 /* 1173 * > Act as described in the "anything else" entry below. 1174 */ 1175 goto before_html_anything_else; 1176 break; 1177 } 1178 1179 /* 1180 * > Any other end tag 1181 */ 1182 if ( $is_closer ) { 1183 // Parse error: ignore the token. 1184 return $this->step(); 1185 } 1186 1187 /* 1188 * > Anything else. 1189 * 1190 * > Create an html element whose node document is the Document object. 1191 * > Append it to the Document object. Put this element in the stack of open elements. 1192 * > Switch the insertion mode to "before head", then reprocess the token. 1193 */ 1194 before_html_anything_else: 1195 $this->insert_virtual_node( 'HTML' ); 1196 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD; 1197 return $this->step( self::REPROCESS_CURRENT_NODE ); 1198 } 1199 1200 /** 1201 * Parses next element in the 'before head' insertion mode. 1202 * 1203 * This internal function performs the 'before head' insertion mode 1204 * logic for the generalized WP_HTML_Processor::step() function. 1205 * 1206 * @since 6.7.0 Stub implementation. 1207 * 1208 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1209 * 1210 * @see https://html.spec.whatwg.org/#the-before-head-insertion-mode 1211 * @see WP_HTML_Processor::step 1212 * 1213 * @return bool Whether an element was found. 1214 */ 1215 private function step_before_head(): bool { 1216 $token_name = $this->get_token_name(); 1217 $token_type = $this->get_token_type(); 1218 $is_closer = parent::is_tag_closer(); 1219 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1220 $op = "{$op_sigil}{$token_name}"; 1221 1222 switch ( $op ) { 1223 /* 1224 * > A character token that is one of U+0009 CHARACTER TABULATION, 1225 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1226 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1227 * 1228 * Parse error: ignore the token. 1229 */ 1230 case '#text': 1231 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1232 return $this->step(); 1233 } 1234 goto before_head_anything_else; 1235 break; 1236 1237 /* 1238 * > A comment token 1239 */ 1240 case '#comment': 1241 case '#funky-comment': 1242 case '#presumptuous-tag': 1243 $this->insert_html_element( $this->state->current_token ); 1244 return true; 1245 1246 /* 1247 * > A DOCTYPE token 1248 */ 1249 case 'html': 1250 // Parse error: ignore the token. 1251 return $this->step(); 1252 1253 /* 1254 * > A start tag whose tag name is "html" 1255 */ 1256 case '+HTML': 1257 return $this->step_in_body(); 1258 1259 /* 1260 * > A start tag whose tag name is "head" 1261 */ 1262 case '+HEAD': 1263 $this->insert_html_element( $this->state->current_token ); 1264 $this->state->head_element = $this->state->current_token; 1265 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1266 return true; 1267 1268 /* 1269 * > An end tag whose tag name is one of: "head", "body", "html", "br" 1270 * > Act as described in the "anything else" entry below. 1271 * 1272 * Closing BR tags are always reported by the Tag Processor as opening tags. 1273 */ 1274 case '-HEAD': 1275 case '-BODY': 1276 case '-HTML': 1277 goto before_head_anything_else; 1278 break; 1279 } 1280 1281 if ( $is_closer ) { 1282 // Parse error: ignore the token. 1283 return $this->step(); 1284 } 1285 1286 /* 1287 * > Anything else 1288 * 1289 * > Insert an HTML element for a "head" start tag token with no attributes. 1290 */ 1291 before_head_anything_else: 1292 $this->state->head_element = $this->insert_virtual_node( 'HEAD' ); 1293 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1294 return $this->step( self::REPROCESS_CURRENT_NODE ); 1295 } 1296 1297 /** 1298 * Parses next element in the 'in head' insertion mode. 1299 * 1300 * This internal function performs the 'in head' insertion mode 1301 * logic for the generalized WP_HTML_Processor::step() function. 1302 * 1303 * @since 6.7.0 1304 * 1305 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1306 * 1307 * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhead 1308 * @see WP_HTML_Processor::step 1309 * 1310 * @return bool Whether an element was found. 1311 */ 1312 private function step_in_head(): bool { 1313 $token_name = $this->get_token_name(); 1314 $token_type = $this->get_token_type(); 1315 $is_closer = parent::is_tag_closer(); 1316 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1317 $op = "{$op_sigil}{$token_name}"; 1318 1319 switch ( $op ) { 1320 case '#text': 1321 /* 1322 * > A character token that is one of U+0009 CHARACTER TABULATION, 1323 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1324 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1325 */ 1326 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1327 // Insert the character. 1328 $this->insert_html_element( $this->state->current_token ); 1329 return true; 1330 } 1331 1332 goto in_head_anything_else; 1333 break; 1334 1335 /* 1336 * > A comment token 1337 */ 1338 case '#comment': 1339 case '#funky-comment': 1340 case '#presumptuous-tag': 1341 $this->insert_html_element( $this->state->current_token ); 1342 return true; 1343 1344 /* 1345 * > A DOCTYPE token 1346 */ 1347 case 'html': 1348 // Parse error: ignore the token. 1349 return $this->step(); 1350 1351 /* 1352 * > A start tag whose tag name is "html" 1353 */ 1354 case '+HTML': 1355 return $this->step_in_body(); 1356 1357 /* 1358 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link" 1359 */ 1360 case '+BASE': 1361 case '+BASEFONT': 1362 case '+BGSOUND': 1363 case '+LINK': 1364 $this->insert_html_element( $this->state->current_token ); 1365 return true; 1366 1367 /* 1368 * > A start tag whose tag name is "meta" 1369 */ 1370 case '+META': 1371 $this->insert_html_element( $this->state->current_token ); 1372 1373 /* 1374 * > If the active speculative HTML parser is null, then: 1375 * > - If the element has a charset attribute, and getting an encoding from 1376 * > its value results in an encoding, and the confidence is currently 1377 * > tentative, then change the encoding to the resulting encoding. 1378 */ 1379 $charset = $this->get_attribute( 'charset' ); 1380 if ( is_string( $charset ) && 'tentative' === $this->state->encoding_confidence ) { 1381 $this->bail( 'Cannot yet process META tags with charset to determine encoding.' ); 1382 } 1383 1384 /* 1385 * > - Otherwise, if the element has an http-equiv attribute whose value is 1386 * > an ASCII case-insensitive match for the string "Content-Type", and 1387 * > the element has a content attribute, and applying the algorithm for 1388 * > extracting a character encoding from a meta element to that attribute's 1389 * > value returns an encoding, and the confidence is currently tentative, 1390 * > then change the encoding to the extracted encoding. 1391 */ 1392 $http_equiv = $this->get_attribute( 'http-equiv' ); 1393 $content = $this->get_attribute( 'content' ); 1394 if ( 1395 is_string( $http_equiv ) && 1396 is_string( $content ) && 1397 0 === strcasecmp( $http_equiv, 'Content-Type' ) && 1398 'tentative' === $this->state->encoding_confidence 1399 ) { 1400 $this->bail( 'Cannot yet process META tags with http-equiv Content-Type to determine encoding.' ); 1401 } 1402 1403 return true; 1404 1405 /* 1406 * > A start tag whose tag name is "title" 1407 */ 1408 case '+TITLE': 1409 $this->insert_html_element( $this->state->current_token ); 1410 return true; 1411 1412 /* 1413 * > A start tag whose tag name is "noscript", if the scripting flag is enabled 1414 * > A start tag whose tag name is one of: "noframes", "style" 1415 * 1416 * The scripting flag is never enabled in this parser. 1417 */ 1418 case '+NOFRAMES': 1419 case '+STYLE': 1420 $this->insert_html_element( $this->state->current_token ); 1421 return true; 1422 1423 /* 1424 * > A start tag whose tag name is "noscript", if the scripting flag is disabled 1425 */ 1426 case '+NOSCRIPT': 1427 $this->insert_html_element( $this->state->current_token ); 1428 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT; 1429 return true; 1430 1431 /* 1432 * > A start tag whose tag name is "script" 1433 * 1434 * @todo Could the adjusted insertion location be anything other than the current location? 1435 */ 1436 case '+SCRIPT': 1437 $this->insert_html_element( $this->state->current_token ); 1438 return true; 1439 1440 /* 1441 * > An end tag whose tag name is "head" 1442 */ 1443 case '-HEAD': 1444 $this->state->stack_of_open_elements->pop(); 1445 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD; 1446 return true; 1447 1448 /* 1449 * > An end tag whose tag name is one of: "body", "html", "br" 1450 * 1451 * BR tags are always reported by the Tag Processor as opening tags. 1452 */ 1453 case '-BODY': 1454 case '-HTML': 1455 /* 1456 * > Act as described in the "anything else" entry below. 1457 */ 1458 goto in_head_anything_else; 1459 break; 1460 1461 /* 1462 * > A start tag whose tag name is "template" 1463 * 1464 * @todo Could the adjusted insertion location be anything other than the current location? 1465 */ 1466 case '+TEMPLATE': 1467 $this->state->active_formatting_elements->insert_marker(); 1468 $this->state->frameset_ok = false; 1469 1470 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE; 1471 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE; 1472 1473 $this->insert_html_element( $this->state->current_token ); 1474 return true; 1475 1476 /* 1477 * > An end tag whose tag name is "template" 1478 */ 1479 case '-TEMPLATE': 1480 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 1481 // @todo Indicate a parse error once it's possible. 1482 return $this->step(); 1483 } 1484 1485 $this->generate_implied_end_tags_thoroughly(); 1486 if ( ! $this->state->stack_of_open_elements->current_node_is( 'TEMPLATE' ) ) { 1487 // @todo Indicate a parse error once it's possible. 1488 } 1489 1490 $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' ); 1491 $this->state->active_formatting_elements->clear_up_to_last_marker(); 1492 array_pop( $this->state->stack_of_template_insertion_modes ); 1493 $this->reset_insertion_mode_appropriately(); 1494 return true; 1495 } 1496 1497 /* 1498 * > A start tag whose tag name is "head" 1499 * > Any other end tag 1500 */ 1501 if ( '+HEAD' === $op || $is_closer ) { 1502 // Parse error: ignore the token. 1503 return $this->step(); 1504 } 1505 1506 /* 1507 * > Anything else 1508 */ 1509 in_head_anything_else: 1510 $this->state->stack_of_open_elements->pop(); 1511 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD; 1512 return $this->step( self::REPROCESS_CURRENT_NODE ); 1513 } 1514 1515 /** 1516 * Parses next element in the 'in head noscript' insertion mode. 1517 * 1518 * This internal function performs the 'in head noscript' insertion mode 1519 * logic for the generalized WP_HTML_Processor::step() function. 1520 * 1521 * @since 6.7.0 Stub implementation. 1522 * 1523 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1524 * 1525 * @see https://html.spec.whatwg.org/#parsing-main-inheadnoscript 1526 * @see WP_HTML_Processor::step 1527 * 1528 * @return bool Whether an element was found. 1529 */ 1530 private function step_in_head_noscript(): bool { 1531 $token_name = $this->get_token_name(); 1532 $token_type = $this->get_token_type(); 1533 $is_closer = parent::is_tag_closer(); 1534 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1535 $op = "{$op_sigil}{$token_name}"; 1536 1537 switch ( $op ) { 1538 /* 1539 * > A character token that is one of U+0009 CHARACTER TABULATION, 1540 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1541 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1542 * 1543 * Parse error: ignore the token. 1544 */ 1545 case '#text': 1546 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1547 return $this->step_in_head(); 1548 } 1549 1550 goto in_head_noscript_anything_else; 1551 break; 1552 1553 /* 1554 * > A DOCTYPE token 1555 */ 1556 case 'html': 1557 // Parse error: ignore the token. 1558 return $this->step(); 1559 1560 /* 1561 * > A start tag whose tag name is "html" 1562 */ 1563 case '+HTML': 1564 return $this->step_in_body(); 1565 1566 /* 1567 * > An end tag whose tag name is "noscript" 1568 */ 1569 case '-NOSCRIPT': 1570 $this->state->stack_of_open_elements->pop(); 1571 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1572 return true; 1573 1574 /* 1575 * > A comment token 1576 * > 1577 * > A start tag whose tag name is one of: "basefont", "bgsound", 1578 * > "link", "meta", "noframes", "style" 1579 */ 1580 case '#comment': 1581 case '#funky-comment': 1582 case '#presumptuous-tag': 1583 case '+BASEFONT': 1584 case '+BGSOUND': 1585 case '+LINK': 1586 case '+META': 1587 case '+NOFRAMES': 1588 case '+STYLE': 1589 return $this->step_in_head(); 1590 1591 /* 1592 * > An end tag whose tag name is "br" 1593 * 1594 * This should never happen, as the Tag Processor prevents showing a BR closing tag. 1595 */ 1596 } 1597 1598 /* 1599 * > A start tag whose tag name is one of: "head", "noscript" 1600 * > Any other end tag 1601 */ 1602 if ( '+HEAD' === $op || '+NOSCRIPT' === $op || $is_closer ) { 1603 // Parse error: ignore the token. 1604 return $this->step(); 1605 } 1606 1607 /* 1608 * > Anything else 1609 * 1610 * Anything here is a parse error. 1611 */ 1612 in_head_noscript_anything_else: 1613 $this->state->stack_of_open_elements->pop(); 1614 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1615 return $this->step( self::REPROCESS_CURRENT_NODE ); 1616 } 1617 1618 /** 1619 * Parses next element in the 'after head' insertion mode. 1620 * 1621 * This internal function performs the 'after head' insertion mode 1622 * logic for the generalized WP_HTML_Processor::step() function. 1623 * 1624 * @since 6.7.0 Stub implementation. 1625 * 1626 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1627 * 1628 * @see https://html.spec.whatwg.org/#the-after-head-insertion-mode 1629 * @see WP_HTML_Processor::step 1630 * 1631 * @return bool Whether an element was found. 1632 */ 1633 private function step_after_head(): bool { 1634 $token_name = $this->get_token_name(); 1635 $token_type = $this->get_token_type(); 1636 $is_closer = parent::is_tag_closer(); 1637 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1638 $op = "{$op_sigil}{$token_name}"; 1639 1640 switch ( $op ) { 1641 /* 1642 * > A character token that is one of U+0009 CHARACTER TABULATION, 1643 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1644 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1645 */ 1646 case '#text': 1647 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1648 // Insert the character. 1649 $this->insert_html_element( $this->state->current_token ); 1650 return true; 1651 } 1652 goto after_head_anything_else; 1653 break; 1654 1655 /* 1656 * > A comment token 1657 */ 1658 case '#comment': 1659 case '#funky-comment': 1660 case '#presumptuous-tag': 1661 $this->insert_html_element( $this->state->current_token ); 1662 return true; 1663 1664 /* 1665 * > A DOCTYPE token 1666 */ 1667 case 'html': 1668 // Parse error: ignore the token. 1669 return $this->step(); 1670 1671 /* 1672 * > A start tag whose tag name is "html" 1673 */ 1674 case '+HTML': 1675 return $this->step_in_body(); 1676 1677 /* 1678 * > A start tag whose tag name is "body" 1679 */ 1680 case '+BODY': 1681 $this->insert_html_element( $this->state->current_token ); 1682 $this->state->frameset_ok = false; 1683 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 1684 return true; 1685 1686 /* 1687 * > A start tag whose tag name is "frameset" 1688 */ 1689 case '+FRAMESET': 1690 $this->insert_html_element( $this->state->current_token ); 1691 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET; 1692 return true; 1693 1694 /* 1695 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", 1696 * > "link", "meta", "noframes", "script", "style", "template", "title" 1697 * 1698 * Anything here is a parse error. 1699 */ 1700 case '+BASE': 1701 case '+BASEFONT': 1702 case '+BGSOUND': 1703 case '+LINK': 1704 case '+META': 1705 case '+NOFRAMES': 1706 case '+SCRIPT': 1707 case '+STYLE': 1708 case '+TEMPLATE': 1709 case '+TITLE': 1710 /* 1711 * > Push the node pointed to by the head element pointer onto the stack of open elements. 1712 * > Process the token using the rules for the "in head" insertion mode. 1713 * > Remove the node pointed to by the head element pointer from the stack of open elements. (It might not be the current node at this point.) 1714 */ 1715 $this->bail( 'Cannot process elements after HEAD which reopen the HEAD element.' ); 1716 /* 1717 * Do not leave this break in when adding support; it's here to prevent 1718 * WPCS from getting confused at the switch structure without a return, 1719 * because it doesn't know that `bail()` always throws. 1720 */ 1721 break; 1722 1723 /* 1724 * > An end tag whose tag name is "template" 1725 */ 1726 case '-TEMPLATE': 1727 return $this->step_in_head(); 1728 1729 /* 1730 * > An end tag whose tag name is one of: "body", "html", "br" 1731 * 1732 * Closing BR tags are always reported by the Tag Processor as opening tags. 1733 */ 1734 case '-BODY': 1735 case '-HTML': 1736 /* 1737 * > Act as described in the "anything else" entry below. 1738 */ 1739 goto after_head_anything_else; 1740 break; 1741 } 1742 1743 /* 1744 * > A start tag whose tag name is "head" 1745 * > Any other end tag 1746 */ 1747 if ( '+HEAD' === $op || $is_closer ) { 1748 // Parse error: ignore the token. 1749 return $this->step(); 1750 } 1751 1752 /* 1753 * > Anything else 1754 * > Insert an HTML element for a "body" start tag token with no attributes. 1755 */ 1756 after_head_anything_else: 1757 $this->insert_virtual_node( 'BODY' ); 1758 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 1759 return $this->step( self::REPROCESS_CURRENT_NODE ); 1760 } 1761 1762 /** 1763 * Parses next element in the 'in body' insertion mode. 1764 * 1765 * This internal function performs the 'in body' insertion mode 1766 * logic for the generalized WP_HTML_Processor::step() function. 1767 * 1768 * @since 6.4.0 1769 * 1770 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1771 * 1772 * @see https://html.spec.whatwg.org/#parsing-main-inbody 1773 * @see WP_HTML_Processor::step 1774 * 1775 * @return bool Whether an element was found. 1776 */ 1777 private function step_in_body(): bool { 1778 $token_name = $this->get_token_name(); 1779 $token_type = $this->get_token_type(); 1780 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 1781 $op = "{$op_sigil}{$token_name}"; 1782 1783 switch ( $op ) { 1784 case '#text': 1785 /* 1786 * > A character token that is U+0000 NULL 1787 * 1788 * Any successive sequence of NULL bytes is ignored and won't 1789 * trigger active format reconstruction. Therefore, if the text 1790 * only comprises NULL bytes then the token should be ignored 1791 * here, but if there are any other characters in the stream 1792 * the active formats should be reconstructed. 1793 */ 1794 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) { 1795 // Parse error: ignore the token. 1796 return $this->step(); 1797 } 1798 1799 $this->reconstruct_active_formatting_elements(); 1800 1801 /* 1802 * Whitespace-only text does not affect the frameset-ok flag. 1803 * It is probably inter-element whitespace, but it may also 1804 * contain character references which decode only to whitespace. 1805 */ 1806 if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) { 1807 $this->state->frameset_ok = false; 1808 } 1809 1810 $this->insert_html_element( $this->state->current_token ); 1811 return true; 1812 1813 case '#comment': 1814 case '#funky-comment': 1815 case '#presumptuous-tag': 1816 $this->insert_html_element( $this->state->current_token ); 1817 return true; 1818 1819 /* 1820 * > A DOCTYPE token 1821 * > Parse error. Ignore the token. 1822 */ 1823 case 'html': 1824 return $this->step(); 1825 1826 /* 1827 * > A start tag whose tag name is "html" 1828 */ 1829 case '+HTML': 1830 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 1831 /* 1832 * > Otherwise, for each attribute on the token, check to see if the attribute 1833 * > is already present on the top element of the stack of open elements. If 1834 * > it is not, add the attribute and its corresponding value to that element. 1835 * 1836 * This parser does not currently support this behavior: ignore the token. 1837 */ 1838 } 1839 1840 // Ignore the token. 1841 return $this->step(); 1842 1843 /* 1844 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link", 1845 * > "meta", "noframes", "script", "style", "template", "title" 1846 * > 1847 * > An end tag whose tag name is "template" 1848 */ 1849 case '+BASE': 1850 case '+BASEFONT': 1851 case '+BGSOUND': 1852 case '+LINK': 1853 case '+META': 1854 case '+NOFRAMES': 1855 case '+SCRIPT': 1856 case '+STYLE': 1857 case '+TEMPLATE': 1858 case '+TITLE': 1859 case '-TEMPLATE': 1860 return $this->step_in_head(); 1861 1862 /* 1863 * > A start tag whose tag name is "body" 1864 * 1865 * This tag in the IN BODY insertion mode is a parse error. 1866 */ 1867 case '+BODY': 1868 if ( 1869 1 === $this->state->stack_of_open_elements->count() || 1870 'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) || 1871 $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) 1872 ) { 1873 // Ignore the token. 1874 return $this->step(); 1875 } 1876 1877 /* 1878 * > Otherwise, set the frameset-ok flag to "not ok"; then, for each attribute 1879 * > on the token, check to see if the attribute is already present on the body 1880 * > element (the second element) on the stack of open elements, and if it is 1881 * > not, add the attribute and its corresponding value to that element. 1882 * 1883 * This parser does not currently support this behavior: ignore the token. 1884 */ 1885 $this->state->frameset_ok = false; 1886 return $this->step(); 1887 1888 /* 1889 * > A start tag whose tag name is "frameset" 1890 * 1891 * This tag in the IN BODY insertion mode is a parse error. 1892 */ 1893 case '+FRAMESET': 1894 if ( 1895 1 === $this->state->stack_of_open_elements->count() || 1896 'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) || 1897 false === $this->state->frameset_ok 1898 ) { 1899 // Ignore the token. 1900 return $this->step(); 1901 } 1902 1903 /* 1904 * > Otherwise, run the following steps: 1905 */ 1906 $this->bail( 'Cannot process non-ignored FRAMESET tags.' ); 1907 break; 1908 1909 /* 1910 * > An end tag whose tag name is "body" 1911 */ 1912 case '-BODY': 1913 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) { 1914 // Parse error: ignore the token. 1915 return $this->step(); 1916 } 1917 1918 /* 1919 * > Otherwise, if there is a node in the stack of open elements that is not either a 1920 * > dd element, a dt element, an li element, an optgroup element, an option element, 1921 * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody 1922 * > element, a td element, a tfoot element, a th element, a thread element, a tr 1923 * > element, the body element, or the html element, then this is a parse error. 1924 * 1925 * There is nothing to do for this parse error, so don't check for it. 1926 */ 1927 1928 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY; 1929 return true; 1930 1931 /* 1932 * > An end tag whose tag name is "html" 1933 */ 1934 case '-HTML': 1935 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) { 1936 // Parse error: ignore the token. 1937 return $this->step(); 1938 } 1939 1940 /* 1941 * > Otherwise, if there is a node in the stack of open elements that is not either a 1942 * > dd element, a dt element, an li element, an optgroup element, an option element, 1943 * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody 1944 * > element, a td element, a tfoot element, a th element, a thread element, a tr 1945 * > element, the body element, or the html element, then this is a parse error. 1946 * 1947 * There is nothing to do for this parse error, so don't check for it. 1948 */ 1949 1950 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY; 1951 return $this->step( self::REPROCESS_CURRENT_NODE ); 1952 1953 /* 1954 * > A start tag whose tag name is one of: "address", "article", "aside", 1955 * > "blockquote", "center", "details", "dialog", "dir", "div", "dl", 1956 * > "fieldset", "figcaption", "figure", "footer", "header", "hgroup", 1957 * > "main", "menu", "nav", "ol", "p", "search", "section", "summary", "ul" 1958 */ 1959 case '+ADDRESS': 1960 case '+ARTICLE': 1961 case '+ASIDE': 1962 case '+BLOCKQUOTE': 1963 case '+CENTER': 1964 case '+DETAILS': 1965 case '+DIALOG': 1966 case '+DIR': 1967 case '+DIV': 1968 case '+DL': 1969 case '+FIELDSET': 1970 case '+FIGCAPTION': 1971 case '+FIGURE': 1972 case '+FOOTER': 1973 case '+HEADER': 1974 case '+HGROUP': 1975 case '+MAIN': 1976 case '+MENU': 1977 case '+NAV': 1978 case '+OL': 1979 case '+P': 1980 case '+SEARCH': 1981 case '+SECTION': 1982 case '+SUMMARY': 1983 case '+UL': 1984 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 1985 $this->close_a_p_element(); 1986 } 1987 1988 $this->insert_html_element( $this->state->current_token ); 1989 return true; 1990 1991 /* 1992 * > A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" 1993 */ 1994 case '+H1': 1995 case '+H2': 1996 case '+H3': 1997 case '+H4': 1998 case '+H5': 1999 case '+H6': 2000 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2001 $this->close_a_p_element(); 2002 } 2003 2004 if ( 2005 in_array( 2006 $this->state->stack_of_open_elements->current_node()->node_name, 2007 array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), 2008 true 2009 ) 2010 ) { 2011 // @todo Indicate a parse error once it's possible. 2012 $this->state->stack_of_open_elements->pop(); 2013 } 2014 2015 $this->insert_html_element( $this->state->current_token ); 2016 return true; 2017 2018 /* 2019 * > A start tag whose tag name is one of: "pre", "listing" 2020 */ 2021 case '+PRE': 2022 case '+LISTING': 2023 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2024 $this->close_a_p_element(); 2025 } 2026 2027 /* 2028 * > If the next token is a U+000A LINE FEED (LF) character token, 2029 * > then ignore that token and move on to the next one. (Newlines 2030 * > at the start of pre blocks are ignored as an authoring convenience.) 2031 * 2032 * This is handled in `get_modifiable_text()`. 2033 */ 2034 2035 $this->insert_html_element( $this->state->current_token ); 2036 $this->state->frameset_ok = false; 2037 return true; 2038 2039 /* 2040 * > A start tag whose tag name is "form" 2041 */ 2042 case '+FORM': 2043 $stack_contains_template = $this->state->stack_of_open_elements->contains( 'TEMPLATE' ); 2044 2045 if ( isset( $this->state->form_element ) && ! $stack_contains_template ) { 2046 // Parse error: ignore the token. 2047 return $this->step(); 2048 } 2049 2050 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2051 $this->close_a_p_element(); 2052 } 2053 2054 $this->insert_html_element( $this->state->current_token ); 2055 if ( ! $stack_contains_template ) { 2056 $this->state->form_element = $this->state->current_token; 2057 } 2058 2059 return true; 2060 2061 /* 2062 * > A start tag whose tag name is "li" 2063 * > A start tag whose tag name is one of: "dd", "dt" 2064 */ 2065 case '+DD': 2066 case '+DT': 2067 case '+LI': 2068 $this->state->frameset_ok = false; 2069 $node = $this->state->stack_of_open_elements->current_node(); 2070 $is_li = 'LI' === $token_name; 2071 2072 in_body_list_loop: 2073 /* 2074 * The logic for LI and DT/DD is the same except for one point: LI elements _only_ 2075 * close other LI elements, but a DT or DD element closes _any_ open DT or DD element. 2076 */ 2077 if ( $is_li ? 'LI' === $node->node_name : ( 'DD' === $node->node_name || 'DT' === $node->node_name ) ) { 2078 $node_name = $is_li ? 'LI' : $node->node_name; 2079 $this->generate_implied_end_tags( $node_name ); 2080 if ( ! $this->state->stack_of_open_elements->current_node_is( $node_name ) ) { 2081 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2082 } 2083 2084 $this->state->stack_of_open_elements->pop_until( $node_name ); 2085 goto in_body_list_done; 2086 } 2087 2088 if ( 2089 'ADDRESS' !== $node->node_name && 2090 'DIV' !== $node->node_name && 2091 'P' !== $node->node_name && 2092 self::is_special( $node ) 2093 ) { 2094 /* 2095 * > If node is in the special category, but is not an address, div, 2096 * > or p element, then jump to the step labeled done below. 2097 */ 2098 goto in_body_list_done; 2099 } else { 2100 /* 2101 * > Otherwise, set node to the previous entry in the stack of open elements 2102 * > and return to the step labeled loop. 2103 */ 2104 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) { 2105 $node = $item; 2106 break; 2107 } 2108 goto in_body_list_loop; 2109 } 2110 2111 in_body_list_done: 2112 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2113 $this->close_a_p_element(); 2114 } 2115 2116 $this->insert_html_element( $this->state->current_token ); 2117 return true; 2118 2119 case '+PLAINTEXT': 2120 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2121 $this->close_a_p_element(); 2122 } 2123 2124 /* 2125 * @todo This may need to be handled in the Tag Processor and turn into 2126 * a single self-contained tag like TEXTAREA, whose modifiable text 2127 * is the rest of the input document as plaintext. 2128 */ 2129 $this->bail( 'Cannot process PLAINTEXT elements.' ); 2130 break; 2131 2132 /* 2133 * > A start tag whose tag name is "button" 2134 */ 2135 case '+BUTTON': 2136 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'BUTTON' ) ) { 2137 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2138 $this->generate_implied_end_tags(); 2139 $this->state->stack_of_open_elements->pop_until( 'BUTTON' ); 2140 } 2141 2142 $this->reconstruct_active_formatting_elements(); 2143 $this->insert_html_element( $this->state->current_token ); 2144 $this->state->frameset_ok = false; 2145 2146 return true; 2147 2148 /* 2149 * > An end tag whose tag name is one of: "address", "article", "aside", "blockquote", 2150 * > "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset", 2151 * > "figcaption", "figure", "footer", "header", "hgroup", "listing", "main", 2152 * > "menu", "nav", "ol", "pre", "search", "section", "summary", "ul" 2153 */ 2154 case '-ADDRESS': 2155 case '-ARTICLE': 2156 case '-ASIDE': 2157 case '-BLOCKQUOTE': 2158 case '-BUTTON': 2159 case '-CENTER': 2160 case '-DETAILS': 2161 case '-DIALOG': 2162 case '-DIR': 2163 case '-DIV': 2164 case '-DL': 2165 case '-FIELDSET': 2166 case '-FIGCAPTION': 2167 case '-FIGURE': 2168 case '-FOOTER': 2169 case '-HEADER': 2170 case '-HGROUP': 2171 case '-LISTING': 2172 case '-MAIN': 2173 case '-MENU': 2174 case '-NAV': 2175 case '-OL': 2176 case '-PRE': 2177 case '-SEARCH': 2178 case '-SECTION': 2179 case '-SUMMARY': 2180 case '-UL': 2181 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) { 2182 // @todo Report parse error. 2183 // Ignore the token. 2184 return $this->step(); 2185 } 2186 2187 $this->generate_implied_end_tags(); 2188 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2189 // @todo Record parse error: this error doesn't impact parsing. 2190 } 2191 $this->state->stack_of_open_elements->pop_until( $token_name ); 2192 return true; 2193 2194 /* 2195 * > An end tag whose tag name is "form" 2196 */ 2197 case '-FORM': 2198 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 2199 $node = $this->state->form_element; 2200 $this->state->form_element = null; 2201 2202 /* 2203 * > If node is null or if the stack of open elements does not have node 2204 * > in scope, then this is a parse error; return and ignore the token. 2205 * 2206 * @todo It's necessary to check if the form token itself is in scope, not 2207 * simply whether any FORM is in scope. 2208 */ 2209 if ( 2210 null === $node || 2211 ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' ) 2212 ) { 2213 // Parse error: ignore the token. 2214 return $this->step(); 2215 } 2216 2217 $this->generate_implied_end_tags(); 2218 if ( $node !== $this->state->stack_of_open_elements->current_node() ) { 2219 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2220 $this->bail( 'Cannot close a FORM when other elements remain open as this would throw off the breadcrumbs for the following tokens.' ); 2221 } 2222 2223 $this->state->stack_of_open_elements->remove_node( $node ); 2224 } else { 2225 /* 2226 * > If the stack of open elements does not have a form element in scope, 2227 * > then this is a parse error; return and ignore the token. 2228 * 2229 * Note that unlike in the clause above, this is checking for any FORM in scope. 2230 */ 2231 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' ) ) { 2232 // Parse error: ignore the token. 2233 return $this->step(); 2234 } 2235 2236 $this->generate_implied_end_tags(); 2237 2238 if ( ! $this->state->stack_of_open_elements->current_node_is( 'FORM' ) ) { 2239 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2240 } 2241 2242 $this->state->stack_of_open_elements->pop_until( 'FORM' ); 2243 return true; 2244 } 2245 break; 2246 2247 /* 2248 * > An end tag whose tag name is "p" 2249 */ 2250 case '-P': 2251 if ( ! $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2252 $this->insert_html_element( $this->state->current_token ); 2253 } 2254 2255 $this->close_a_p_element(); 2256 return true; 2257 2258 /* 2259 * > An end tag whose tag name is "li" 2260 * > An end tag whose tag name is one of: "dd", "dt" 2261 */ 2262 case '-DD': 2263 case '-DT': 2264 case '-LI': 2265 if ( 2266 /* 2267 * An end tag whose tag name is "li": 2268 * If the stack of open elements does not have an li element in list item scope, 2269 * then this is a parse error; ignore the token. 2270 */ 2271 ( 2272 'LI' === $token_name && 2273 ! $this->state->stack_of_open_elements->has_element_in_list_item_scope( 'LI' ) 2274 ) || 2275 /* 2276 * An end tag whose tag name is one of: "dd", "dt": 2277 * If the stack of open elements does not have an element in scope that is an 2278 * HTML element with the same tag name as that of the token, then this is a 2279 * parse error; ignore the token. 2280 */ 2281 ( 2282 'LI' !== $token_name && 2283 ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) 2284 ) 2285 ) { 2286 /* 2287 * This is a parse error, ignore the token. 2288 * 2289 * @todo Indicate a parse error once it's possible. 2290 */ 2291 return $this->step(); 2292 } 2293 2294 $this->generate_implied_end_tags( $token_name ); 2295 2296 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2297 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2298 } 2299 2300 $this->state->stack_of_open_elements->pop_until( $token_name ); 2301 return true; 2302 2303 /* 2304 * > An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" 2305 */ 2306 case '-H1': 2307 case '-H2': 2308 case '-H3': 2309 case '-H4': 2310 case '-H5': 2311 case '-H6': 2312 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( '(internal: H1 through H6 - do not use)' ) ) { 2313 /* 2314 * This is a parse error; ignore the token. 2315 * 2316 * @todo Indicate a parse error once it's possible. 2317 */ 2318 return $this->step(); 2319 } 2320 2321 $this->generate_implied_end_tags(); 2322 2323 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2324 // @todo Record parse error: this error doesn't impact parsing. 2325 } 2326 2327 $this->state->stack_of_open_elements->pop_until( '(internal: H1 through H6 - do not use)' ); 2328 return true; 2329 2330 /* 2331 * > A start tag whose tag name is "a" 2332 */ 2333 case '+A': 2334 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { 2335 switch ( $item->node_name ) { 2336 case 'marker': 2337 break 2; 2338 2339 case 'A': 2340 $this->run_adoption_agency_algorithm(); 2341 $this->state->active_formatting_elements->remove_node( $item ); 2342 $this->state->stack_of_open_elements->remove_node( $item ); 2343 break 2; 2344 } 2345 } 2346 2347 $this->reconstruct_active_formatting_elements(); 2348 $this->insert_html_element( $this->state->current_token ); 2349 $this->state->active_formatting_elements->push( $this->state->current_token ); 2350 return true; 2351 2352 /* 2353 * > A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i", 2354 * > "s", "small", "strike", "strong", "tt", "u" 2355 */ 2356 case '+B': 2357 case '+BIG': 2358 case '+CODE': 2359 case '+EM': 2360 case '+FONT': 2361 case '+I': 2362 case '+S': 2363 case '+SMALL': 2364 case '+STRIKE': 2365 case '+STRONG': 2366 case '+TT': 2367 case '+U': 2368 $this->reconstruct_active_formatting_elements(); 2369 $this->insert_html_element( $this->state->current_token ); 2370 $this->state->active_formatting_elements->push( $this->state->current_token ); 2371 return true; 2372 2373 /* 2374 * > A start tag whose tag name is "nobr" 2375 */ 2376 case '+NOBR': 2377 $this->reconstruct_active_formatting_elements(); 2378 2379 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'NOBR' ) ) { 2380 // Parse error. 2381 $this->run_adoption_agency_algorithm(); 2382 $this->reconstruct_active_formatting_elements(); 2383 } 2384 2385 $this->insert_html_element( $this->state->current_token ); 2386 $this->state->active_formatting_elements->push( $this->state->current_token ); 2387 return true; 2388 2389 /* 2390 * > An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i", 2391 * > "nobr", "s", "small", "strike", "strong", "tt", "u" 2392 */ 2393 case '-A': 2394 case '-B': 2395 case '-BIG': 2396 case '-CODE': 2397 case '-EM': 2398 case '-FONT': 2399 case '-I': 2400 case '-NOBR': 2401 case '-S': 2402 case '-SMALL': 2403 case '-STRIKE': 2404 case '-STRONG': 2405 case '-TT': 2406 case '-U': 2407 $this->run_adoption_agency_algorithm(); 2408 return true; 2409 2410 /* 2411 * > A start tag whose tag name is one of: "applet", "marquee", "object" 2412 */ 2413 case '+APPLET': 2414 case '+MARQUEE': 2415 case '+OBJECT': 2416 $this->reconstruct_active_formatting_elements(); 2417 $this->insert_html_element( $this->state->current_token ); 2418 $this->state->active_formatting_elements->insert_marker(); 2419 $this->state->frameset_ok = false; 2420 return true; 2421 2422 /* 2423 * > A end tag token whose tag name is one of: "applet", "marquee", "object" 2424 */ 2425 case '-APPLET': 2426 case '-MARQUEE': 2427 case '-OBJECT': 2428 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) { 2429 // Parse error: ignore the token. 2430 return $this->step(); 2431 } 2432 2433 $this->generate_implied_end_tags(); 2434 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2435 // This is a parse error. 2436 } 2437 2438 $this->state->stack_of_open_elements->pop_until( $token_name ); 2439 $this->state->active_formatting_elements->clear_up_to_last_marker(); 2440 return true; 2441 2442 /* 2443 * > A start tag whose tag name is "table" 2444 */ 2445 case '+TABLE': 2446 /* 2447 * > If the Document is not set to quirks mode, and the stack of open elements 2448 * > has a p element in button scope, then close a p element. 2449 */ 2450 if ( 2451 WP_HTML_Tag_Processor::QUIRKS_MODE !== $this->compat_mode && 2452 $this->state->stack_of_open_elements->has_p_in_button_scope() 2453 ) { 2454 $this->close_a_p_element(); 2455 } 2456 2457 $this->insert_html_element( $this->state->current_token ); 2458 $this->state->frameset_ok = false; 2459 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 2460 return true; 2461 2462 /* 2463 * > An end tag whose tag name is "br" 2464 * 2465 * This is prevented from happening because the Tag Processor 2466 * reports all closing BR tags as if they were opening tags. 2467 */ 2468 2469 /* 2470 * > A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr" 2471 */ 2472 case '+AREA': 2473 case '+BR': 2474 case '+EMBED': 2475 case '+IMG': 2476 case '+KEYGEN': 2477 case '+WBR': 2478 $this->reconstruct_active_formatting_elements(); 2479 $this->insert_html_element( $this->state->current_token ); 2480 $this->state->frameset_ok = false; 2481 return true; 2482 2483 /* 2484 * > A start tag whose tag name is "input" 2485 */ 2486 case '+INPUT': 2487 $this->reconstruct_active_formatting_elements(); 2488 $this->insert_html_element( $this->state->current_token ); 2489 2490 /* 2491 * > If the token does not have an attribute with the name "type", or if it does, 2492 * > but that attribute's value is not an ASCII case-insensitive match for the 2493 * > string "hidden", then: set the frameset-ok flag to "not ok". 2494 */ 2495 $type_attribute = $this->get_attribute( 'type' ); 2496 if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) { 2497 $this->state->frameset_ok = false; 2498 } 2499 2500 return true; 2501 2502 /* 2503 * > A start tag whose tag name is one of: "param", "source", "track" 2504 */ 2505 case '+PARAM': 2506 case '+SOURCE': 2507 case '+TRACK': 2508 $this->insert_html_element( $this->state->current_token ); 2509 return true; 2510 2511 /* 2512 * > A start tag whose tag name is "hr" 2513 */ 2514 case '+HR': 2515 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2516 $this->close_a_p_element(); 2517 } 2518 $this->insert_html_element( $this->state->current_token ); 2519 $this->state->frameset_ok = false; 2520 return true; 2521 2522 /* 2523 * > A start tag whose tag name is "image" 2524 */ 2525 case '+IMAGE': 2526 /* 2527 * > Parse error. Change the token's tag name to "img" and reprocess it. (Don't ask.) 2528 * 2529 * Note that this is handled elsewhere, so it should not be possible to reach this code. 2530 */ 2531 $this->bail( "Cannot process an IMAGE tag. (Don't ask.)" ); 2532 break; 2533 2534 /* 2535 * > A start tag whose tag name is "textarea" 2536 */ 2537 case '+TEXTAREA': 2538 $this->insert_html_element( $this->state->current_token ); 2539 2540 /* 2541 * > If the next token is a U+000A LINE FEED (LF) character token, then ignore 2542 * > that token and move on to the next one. (Newlines at the start of 2543 * > textarea elements are ignored as an authoring convenience.) 2544 * 2545 * This is handled in `get_modifiable_text()`. 2546 */ 2547 2548 $this->state->frameset_ok = false; 2549 2550 /* 2551 * > Switch the insertion mode to "text". 2552 * 2553 * As a self-contained node, this behavior is handled in the Tag Processor. 2554 */ 2555 return true; 2556 2557 /* 2558 * > A start tag whose tag name is "xmp" 2559 */ 2560 case '+XMP': 2561 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2562 $this->close_a_p_element(); 2563 } 2564 2565 $this->reconstruct_active_formatting_elements(); 2566 $this->state->frameset_ok = false; 2567 2568 /* 2569 * > Follow the generic raw text element parsing algorithm. 2570 * 2571 * As a self-contained node, this behavior is handled in the Tag Processor. 2572 */ 2573 $this->insert_html_element( $this->state->current_token ); 2574 return true; 2575 2576 /* 2577 * A start tag whose tag name is "iframe" 2578 */ 2579 case '+IFRAME': 2580 $this->state->frameset_ok = false; 2581 2582 /* 2583 * > Follow the generic raw text element parsing algorithm. 2584 * 2585 * As a self-contained node, this behavior is handled in the Tag Processor. 2586 */ 2587 $this->insert_html_element( $this->state->current_token ); 2588 return true; 2589 2590 /* 2591 * > A start tag whose tag name is "noembed" 2592 * > A start tag whose tag name is "noscript", if the scripting flag is enabled 2593 * 2594 * The scripting flag is never enabled in this parser. 2595 */ 2596 case '+NOEMBED': 2597 $this->insert_html_element( $this->state->current_token ); 2598 return true; 2599 2600 /* 2601 * > A start tag whose tag name is "select" 2602 */ 2603 case '+SELECT': 2604 $this->reconstruct_active_formatting_elements(); 2605 $this->insert_html_element( $this->state->current_token ); 2606 $this->state->frameset_ok = false; 2607 2608 switch ( $this->state->insertion_mode ) { 2609 /* 2610 * > If the insertion mode is one of "in table", "in caption", "in table body", "in row", 2611 * > or "in cell", then switch the insertion mode to "in select in table". 2612 */ 2613 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE: 2614 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION: 2615 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY: 2616 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW: 2617 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL: 2618 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE; 2619 break; 2620 2621 /* 2622 * > Otherwise, switch the insertion mode to "in select". 2623 */ 2624 default: 2625 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT; 2626 break; 2627 } 2628 return true; 2629 2630 /* 2631 * > A start tag whose tag name is one of: "optgroup", "option" 2632 */ 2633 case '+OPTGROUP': 2634 case '+OPTION': 2635 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 2636 $this->state->stack_of_open_elements->pop(); 2637 } 2638 $this->reconstruct_active_formatting_elements(); 2639 $this->insert_html_element( $this->state->current_token ); 2640 return true; 2641 2642 /* 2643 * > A start tag whose tag name is one of: "rb", "rtc" 2644 */ 2645 case '+RB': 2646 case '+RTC': 2647 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) { 2648 $this->generate_implied_end_tags(); 2649 2650 if ( $this->state->stack_of_open_elements->current_node_is( 'RUBY' ) ) { 2651 // @todo Indicate a parse error once it's possible. 2652 } 2653 } 2654 2655 $this->insert_html_element( $this->state->current_token ); 2656 return true; 2657 2658 /* 2659 * > A start tag whose tag name is one of: "rp", "rt" 2660 */ 2661 case '+RP': 2662 case '+RT': 2663 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) { 2664 $this->generate_implied_end_tags( 'RTC' ); 2665 2666 $current_node_name = $this->state->stack_of_open_elements->current_node()->node_name; 2667 if ( 'RTC' === $current_node_name || 'RUBY' === $current_node_name ) { 2668 // @todo Indicate a parse error once it's possible. 2669 } 2670 } 2671 2672 $this->insert_html_element( $this->state->current_token ); 2673 return true; 2674 2675 /* 2676 * > A start tag whose tag name is "math" 2677 */ 2678 case '+MATH': 2679 $this->reconstruct_active_formatting_elements(); 2680 2681 /* 2682 * @todo Adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.) 2683 * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink.) 2684 * 2685 * These ought to be handled in the attribute methods. 2686 */ 2687 $this->state->current_token->namespace = 'math'; 2688 $this->insert_html_element( $this->state->current_token ); 2689 if ( $this->state->current_token->has_self_closing_flag ) { 2690 $this->state->stack_of_open_elements->pop(); 2691 } 2692 return true; 2693 2694 /* 2695 * > A start tag whose tag name is "svg" 2696 */ 2697 case '+SVG': 2698 $this->reconstruct_active_formatting_elements(); 2699 2700 /* 2701 * @todo Adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.) 2702 * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.) 2703 * 2704 * These ought to be handled in the attribute methods. 2705 */ 2706 $this->state->current_token->namespace = 'svg'; 2707 $this->insert_html_element( $this->state->current_token ); 2708 if ( $this->state->current_token->has_self_closing_flag ) { 2709 $this->state->stack_of_open_elements->pop(); 2710 } 2711 return true; 2712 2713 /* 2714 * > A start tag whose tag name is one of: "caption", "col", "colgroup", 2715 * > "frame", "head", "tbody", "td", "tfoot", "th", "thead", "tr" 2716 */ 2717 case '+CAPTION': 2718 case '+COL': 2719 case '+COLGROUP': 2720 case '+FRAME': 2721 case '+HEAD': 2722 case '+TBODY': 2723 case '+TD': 2724 case '+TFOOT': 2725 case '+TH': 2726 case '+THEAD': 2727 case '+TR': 2728 // Parse error. Ignore the token. 2729 return $this->step(); 2730 } 2731 2732 if ( ! parent::is_tag_closer() ) { 2733 /* 2734 * > Any other start tag 2735 */ 2736 $this->reconstruct_active_formatting_elements(); 2737 $this->insert_html_element( $this->state->current_token ); 2738 return true; 2739 } else { 2740 /* 2741 * > Any other end tag 2742 */ 2743 2744 /* 2745 * Find the corresponding tag opener in the stack of open elements, if 2746 * it exists before reaching a special element, which provides a kind 2747 * of boundary in the stack. For example, a `</custom-tag>` should not 2748 * close anything beyond its containing `P` or `DIV` element. 2749 */ 2750 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) { 2751 if ( 'html' === $node->namespace && $token_name === $node->node_name ) { 2752 break; 2753 } 2754 2755 if ( self::is_special( $node ) ) { 2756 // This is a parse error, ignore the token. 2757 return $this->step(); 2758 } 2759 } 2760 2761 $this->generate_implied_end_tags( $token_name ); 2762 if ( $node !== $this->state->stack_of_open_elements->current_node() ) { 2763 // @todo Record parse error: this error doesn't impact parsing. 2764 } 2765 2766 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { 2767 $this->state->stack_of_open_elements->pop(); 2768 if ( $node === $item ) { 2769 return true; 2770 } 2771 } 2772 } 2773 2774 $this->bail( 'Should not have been able to reach end of IN BODY processing. Check HTML API code.' ); 2775 // This unnecessary return prevents tools from inaccurately reporting type errors. 2776 return false; 2777 } 2778 2779 /** 2780 * Parses next element in the 'in table' insertion mode. 2781 * 2782 * This internal function performs the 'in table' insertion mode 2783 * logic for the generalized WP_HTML_Processor::step() function. 2784 * 2785 * @since 6.7.0 2786 * 2787 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 2788 * 2789 * @see https://html.spec.whatwg.org/#parsing-main-intable 2790 * @see WP_HTML_Processor::step 2791 * 2792 * @return bool Whether an element was found. 2793 */ 2794 private function step_in_table(): bool { 2795 $token_name = $this->get_token_name(); 2796 $token_type = $this->get_token_type(); 2797 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 2798 $op = "{$op_sigil}{$token_name}"; 2799 2800 switch ( $op ) { 2801 /* 2802 * > A character token, if the current node is table, 2803 * > tbody, template, tfoot, thead, or tr element 2804 */ 2805 case '#text': 2806 $current_node = $this->state->stack_of_open_elements->current_node(); 2807 $current_node_name = $current_node ? $current_node->node_name : null; 2808 if ( 2809 $current_node_name && ( 2810 'TABLE' === $current_node_name || 2811 'TBODY' === $current_node_name || 2812 'TEMPLATE' === $current_node_name || 2813 'TFOOT' === $current_node_name || 2814 'THEAD' === $current_node_name || 2815 'TR' === $current_node_name 2816 ) 2817 ) { 2818 /* 2819 * If the text is empty after processing HTML entities and stripping 2820 * U+0000 NULL bytes then ignore the token. 2821 */ 2822 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) { 2823 return $this->step(); 2824 } 2825 2826 /* 2827 * This follows the rules for "in table text" insertion mode. 2828 * 2829 * Whitespace-only text nodes are inserted in-place. Otherwise 2830 * foster parenting is enabled and the nodes would be 2831 * inserted out-of-place. 2832 * 2833 * > If any of the tokens in the pending table character tokens 2834 * > list are character tokens that are not ASCII whitespace, 2835 * > then this is a parse error: reprocess the character tokens 2836 * > in the pending table character tokens list using the rules 2837 * > given in the "anything else" entry in the "in table" 2838 * > insertion mode. 2839 * > 2840 * > Otherwise, insert the characters given by the pending table 2841 * > character tokens list. 2842 * 2843 * @see https://html.spec.whatwg.org/#parsing-main-intabletext 2844 */ 2845 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 2846 $this->insert_html_element( $this->state->current_token ); 2847 return true; 2848 } 2849 2850 // Non-whitespace would trigger fostering, unsupported at this time. 2851 $this->bail( 'Foster parenting is not supported.' ); 2852 break; 2853 } 2854 break; 2855 2856 /* 2857 * > A comment token 2858 */ 2859 case '#comment': 2860 case '#funky-comment': 2861 case '#presumptuous-tag': 2862 $this->insert_html_element( $this->state->current_token ); 2863 return true; 2864 2865 /* 2866 * > A DOCTYPE token 2867 */ 2868 case 'html': 2869 // Parse error: ignore the token. 2870 return $this->step(); 2871 2872 /* 2873 * > A start tag whose tag name is "caption" 2874 */ 2875 case '+CAPTION': 2876 $this->state->stack_of_open_elements->clear_to_table_context(); 2877 $this->state->active_formatting_elements->insert_marker(); 2878 $this->insert_html_element( $this->state->current_token ); 2879 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION; 2880 return true; 2881 2882 /* 2883 * > A start tag whose tag name is "colgroup" 2884 */ 2885 case '+COLGROUP': 2886 $this->state->stack_of_open_elements->clear_to_table_context(); 2887 $this->insert_html_element( $this->state->current_token ); 2888 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 2889 return true; 2890 2891 /* 2892 * > A start tag whose tag name is "col" 2893 */ 2894 case '+COL': 2895 $this->state->stack_of_open_elements->clear_to_table_context(); 2896 2897 /* 2898 * > Insert an HTML element for a "colgroup" start tag token with no attributes, 2899 * > then switch the insertion mode to "in column group". 2900 */ 2901 $this->insert_virtual_node( 'COLGROUP' ); 2902 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 2903 return $this->step( self::REPROCESS_CURRENT_NODE ); 2904 2905 /* 2906 * > A start tag whose tag name is one of: "tbody", "tfoot", "thead" 2907 */ 2908 case '+TBODY': 2909 case '+TFOOT': 2910 case '+THEAD': 2911 $this->state->stack_of_open_elements->clear_to_table_context(); 2912 $this->insert_html_element( $this->state->current_token ); 2913 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 2914 return true; 2915 2916 /* 2917 * > A start tag whose tag name is one of: "td", "th", "tr" 2918 */ 2919 case '+TD': 2920 case '+TH': 2921 case '+TR': 2922 $this->state->stack_of_open_elements->clear_to_table_context(); 2923 /* 2924 * > Insert an HTML element for a "tbody" start tag token with no attributes, 2925 * > then switch the insertion mode to "in table body". 2926 */ 2927 $this->insert_virtual_node( 'TBODY' ); 2928 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 2929 return $this->step( self::REPROCESS_CURRENT_NODE ); 2930 2931 /* 2932 * > A start tag whose tag name is "table" 2933 * 2934 * This tag in the IN TABLE insertion mode is a parse error. 2935 */ 2936 case '+TABLE': 2937 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) { 2938 return $this->step(); 2939 } 2940 2941 $this->state->stack_of_open_elements->pop_until( 'TABLE' ); 2942 $this->reset_insertion_mode_appropriately(); 2943 return $this->step( self::REPROCESS_CURRENT_NODE ); 2944 2945 /* 2946 * > An end tag whose tag name is "table" 2947 */ 2948 case '-TABLE': 2949 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) { 2950 // @todo Indicate a parse error once it's possible. 2951 return $this->step(); 2952 } 2953 2954 $this->state->stack_of_open_elements->pop_until( 'TABLE' ); 2955 $this->reset_insertion_mode_appropriately(); 2956 return true; 2957 2958 /* 2959 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" 2960 */ 2961 case '-BODY': 2962 case '-CAPTION': 2963 case '-COL': 2964 case '-COLGROUP': 2965 case '-HTML': 2966 case '-TBODY': 2967 case '-TD': 2968 case '-TFOOT': 2969 case '-TH': 2970 case '-THEAD': 2971 case '-TR': 2972 // Parse error: ignore the token. 2973 return $this->step(); 2974 2975 /* 2976 * > A start tag whose tag name is one of: "style", "script", "template" 2977 * > An end tag whose tag name is "template" 2978 */ 2979 case '+STYLE': 2980 case '+SCRIPT': 2981 case '+TEMPLATE': 2982 case '-TEMPLATE': 2983 /* 2984 * > Process the token using the rules for the "in head" insertion mode. 2985 */ 2986 return $this->step_in_head(); 2987 2988 /* 2989 * > A start tag whose tag name is "input" 2990 * 2991 * > If the token does not have an attribute with the name "type", or if it does, but 2992 * > that attribute's value is not an ASCII case-insensitive match for the string 2993 * > "hidden", then: act as described in the "anything else" entry below. 2994 */ 2995 case '+INPUT': 2996 $type_attribute = $this->get_attribute( 'type' ); 2997 if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) { 2998 goto anything_else; 2999 } 3000 // @todo Indicate a parse error once it's possible. 3001 $this->insert_html_element( $this->state->current_token ); 3002 return true; 3003 3004 /* 3005 * > A start tag whose tag name is "form" 3006 * 3007 * This tag in the IN TABLE insertion mode is a parse error. 3008 */ 3009 case '+FORM': 3010 if ( 3011 $this->state->stack_of_open_elements->has_element_in_scope( 'TEMPLATE' ) || 3012 isset( $this->state->form_element ) 3013 ) { 3014 return $this->step(); 3015 } 3016 3017 // This FORM is special because it immediately closes and cannot have other children. 3018 $this->insert_html_element( $this->state->current_token ); 3019 $this->state->form_element = $this->state->current_token; 3020 $this->state->stack_of_open_elements->pop(); 3021 return true; 3022 } 3023 3024 /* 3025 * > Anything else 3026 * > Parse error. Enable foster parenting, process the token using the rules for the 3027 * > "in body" insertion mode, and then disable foster parenting. 3028 * 3029 * @todo Indicate a parse error once it's possible. 3030 */ 3031 anything_else: 3032 $this->bail( 'Foster parenting is not supported.' ); 3033 } 3034 3035 /** 3036 * Parses next element in the 'in table text' insertion mode. 3037 * 3038 * This internal function performs the 'in table text' insertion mode 3039 * logic for the generalized WP_HTML_Processor::step() function. 3040 * 3041 * @since 6.7.0 Stub implementation. 3042 * 3043 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3044 * 3045 * @see https://html.spec.whatwg.org/#parsing-main-intabletext 3046 * @see WP_HTML_Processor::step 3047 * 3048 * @return bool Whether an element was found. 3049 */ 3050 private function step_in_table_text(): bool { 3051 $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT . ' state.' ); 3052 } 3053 3054 /** 3055 * Parses next element in the 'in caption' insertion mode. 3056 * 3057 * This internal function performs the 'in caption' insertion mode 3058 * logic for the generalized WP_HTML_Processor::step() function. 3059 * 3060 * @since 6.7.0 3061 * 3062 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3063 * 3064 * @see https://html.spec.whatwg.org/#parsing-main-incaption 3065 * @see WP_HTML_Processor::step 3066 * 3067 * @return bool Whether an element was found. 3068 */ 3069 private function step_in_caption(): bool { 3070 $tag_name = $this->get_tag(); 3071 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3072 $op = "{$op_sigil}{$tag_name}"; 3073 3074 switch ( $op ) { 3075 /* 3076 * > An end tag whose tag name is "caption" 3077 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr" 3078 * > An end tag whose tag name is "table" 3079 * 3080 * These tag handling rules are identical except for the final instruction. 3081 * Handle them in a single block. 3082 */ 3083 case '-CAPTION': 3084 case '+CAPTION': 3085 case '+COL': 3086 case '+COLGROUP': 3087 case '+TBODY': 3088 case '+TD': 3089 case '+TFOOT': 3090 case '+TH': 3091 case '+THEAD': 3092 case '+TR': 3093 case '-TABLE': 3094 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'CAPTION' ) ) { 3095 // Parse error: ignore the token. 3096 return $this->step(); 3097 } 3098 3099 $this->generate_implied_end_tags(); 3100 if ( ! $this->state->stack_of_open_elements->current_node_is( 'CAPTION' ) ) { 3101 // @todo Indicate a parse error once it's possible. 3102 } 3103 3104 $this->state->stack_of_open_elements->pop_until( 'CAPTION' ); 3105 $this->state->active_formatting_elements->clear_up_to_last_marker(); 3106 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3107 3108 // If this is not a CAPTION end tag, the token should be reprocessed. 3109 if ( '-CAPTION' === $op ) { 3110 return true; 3111 } 3112 return $this->step( self::REPROCESS_CURRENT_NODE ); 3113 3114 /** 3115 * > An end tag whose tag name is one of: "body", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" 3116 */ 3117 case '-BODY': 3118 case '-COL': 3119 case '-COLGROUP': 3120 case '-HTML': 3121 case '-TBODY': 3122 case '-TD': 3123 case '-TFOOT': 3124 case '-TH': 3125 case '-THEAD': 3126 case '-TR': 3127 // Parse error: ignore the token. 3128 return $this->step(); 3129 } 3130 3131 /** 3132 * > Anything else 3133 * > Process the token using the rules for the "in body" insertion mode. 3134 */ 3135 return $this->step_in_body(); 3136 } 3137 3138 /** 3139 * Parses next element in the 'in column group' insertion mode. 3140 * 3141 * This internal function performs the 'in column group' insertion mode 3142 * logic for the generalized WP_HTML_Processor::step() function. 3143 * 3144 * @since 6.7.0 3145 * 3146 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3147 * 3148 * @see https://html.spec.whatwg.org/#parsing-main-incolgroup 3149 * @see WP_HTML_Processor::step 3150 * 3151 * @return bool Whether an element was found. 3152 */ 3153 private function step_in_column_group(): bool { 3154 $token_name = $this->get_token_name(); 3155 $token_type = $this->get_token_type(); 3156 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 3157 $op = "{$op_sigil}{$token_name}"; 3158 3159 switch ( $op ) { 3160 /* 3161 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 3162 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 3163 */ 3164 case '#text': 3165 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 3166 // Insert the character. 3167 $this->insert_html_element( $this->state->current_token ); 3168 return true; 3169 } 3170 3171 goto in_column_group_anything_else; 3172 break; 3173 3174 /* 3175 * > A comment token 3176 */ 3177 case '#comment': 3178 case '#funky-comment': 3179 case '#presumptuous-tag': 3180 $this->insert_html_element( $this->state->current_token ); 3181 return true; 3182 3183 /* 3184 * > A DOCTYPE token 3185 */ 3186 case 'html': 3187 // @todo Indicate a parse error once it's possible. 3188 return $this->step(); 3189 3190 /* 3191 * > A start tag whose tag name is "html" 3192 */ 3193 case '+HTML': 3194 return $this->step_in_body(); 3195 3196 /* 3197 * > A start tag whose tag name is "col" 3198 */ 3199 case '+COL': 3200 $this->insert_html_element( $this->state->current_token ); 3201 $this->state->stack_of_open_elements->pop(); 3202 return true; 3203 3204 /* 3205 * > An end tag whose tag name is "colgroup" 3206 */ 3207 case '-COLGROUP': 3208 if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) { 3209 // @todo Indicate a parse error once it's possible. 3210 return $this->step(); 3211 } 3212 $this->state->stack_of_open_elements->pop(); 3213 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3214 return true; 3215 3216 /* 3217 * > An end tag whose tag name is "col" 3218 */ 3219 case '-COL': 3220 // Parse error: ignore the token. 3221 return $this->step(); 3222 3223 /* 3224 * > A start tag whose tag name is "template" 3225 * > An end tag whose tag name is "template" 3226 */ 3227 case '+TEMPLATE': 3228 case '-TEMPLATE': 3229 return $this->step_in_head(); 3230 } 3231 3232 in_column_group_anything_else: 3233 /* 3234 * > Anything else 3235 */ 3236 if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) { 3237 // @todo Indicate a parse error once it's possible. 3238 return $this->step(); 3239 } 3240 $this->state->stack_of_open_elements->pop(); 3241 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3242 return $this->step( self::REPROCESS_CURRENT_NODE ); 3243 } 3244 3245 /** 3246 * Parses next element in the 'in table body' insertion mode. 3247 * 3248 * This internal function performs the 'in table body' insertion mode 3249 * logic for the generalized WP_HTML_Processor::step() function. 3250 * 3251 * @since 6.7.0 3252 * 3253 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3254 * 3255 * @see https://html.spec.whatwg.org/#parsing-main-intbody 3256 * @see WP_HTML_Processor::step 3257 * 3258 * @return bool Whether an element was found. 3259 */ 3260 private function step_in_table_body(): bool { 3261 $tag_name = $this->get_tag(); 3262 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3263 $op = "{$op_sigil}{$tag_name}"; 3264 3265 switch ( $op ) { 3266 /* 3267 * > A start tag whose tag name is "tr" 3268 */ 3269 case '+TR': 3270 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3271 $this->insert_html_element( $this->state->current_token ); 3272 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3273 return true; 3274 3275 /* 3276 * > A start tag whose tag name is one of: "th", "td" 3277 */ 3278 case '+TH': 3279 case '+TD': 3280 // @todo Indicate a parse error once it's possible. 3281 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3282 $this->insert_virtual_node( 'TR' ); 3283 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3284 return $this->step( self::REPROCESS_CURRENT_NODE ); 3285 3286 /* 3287 * > An end tag whose tag name is one of: "tbody", "tfoot", "thead" 3288 */ 3289 case '-TBODY': 3290 case '-TFOOT': 3291 case '-THEAD': 3292 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3293 // Parse error: ignore the token. 3294 return $this->step(); 3295 } 3296 3297 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3298 $this->state->stack_of_open_elements->pop(); 3299 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3300 return true; 3301 3302 /* 3303 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead" 3304 * > An end tag whose tag name is "table" 3305 */ 3306 case '+CAPTION': 3307 case '+COL': 3308 case '+COLGROUP': 3309 case '+TBODY': 3310 case '+TFOOT': 3311 case '+THEAD': 3312 case '-TABLE': 3313 if ( 3314 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TBODY' ) && 3315 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'THEAD' ) && 3316 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TFOOT' ) 3317 ) { 3318 // Parse error: ignore the token. 3319 return $this->step(); 3320 } 3321 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3322 $this->state->stack_of_open_elements->pop(); 3323 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3324 return $this->step( self::REPROCESS_CURRENT_NODE ); 3325 3326 /* 3327 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th", "tr" 3328 */ 3329 case '-BODY': 3330 case '-CAPTION': 3331 case '-COL': 3332 case '-COLGROUP': 3333 case '-HTML': 3334 case '-TD': 3335 case '-TH': 3336 case '-TR': 3337 // Parse error: ignore the token. 3338 return $this->step(); 3339 } 3340 3341 /* 3342 * > Anything else 3343 * > Process the token using the rules for the "in table" insertion mode. 3344 */ 3345 return $this->step_in_table(); 3346 } 3347 3348 /** 3349 * Parses next element in the 'in row' insertion mode. 3350 * 3351 * This internal function performs the 'in row' insertion mode 3352 * logic for the generalized WP_HTML_Processor::step() function. 3353 * 3354 * @since 6.7.0 3355 * 3356 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3357 * 3358 * @see https://html.spec.whatwg.org/#parsing-main-intr 3359 * @see WP_HTML_Processor::step 3360 * 3361 * @return bool Whether an element was found. 3362 */ 3363 private function step_in_row(): bool { 3364 $tag_name = $this->get_tag(); 3365 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3366 $op = "{$op_sigil}{$tag_name}"; 3367 3368 switch ( $op ) { 3369 /* 3370 * > A start tag whose tag name is one of: "th", "td" 3371 */ 3372 case '+TH': 3373 case '+TD': 3374 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3375 $this->insert_html_element( $this->state->current_token ); 3376 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL; 3377 $this->state->active_formatting_elements->insert_marker(); 3378 return true; 3379 3380 /* 3381 * > An end tag whose tag name is "tr" 3382 */ 3383 case '-TR': 3384 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) { 3385 // Parse error: ignore the token. 3386 return $this->step(); 3387 } 3388 3389 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3390 $this->state->stack_of_open_elements->pop(); 3391 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3392 return true; 3393 3394 /* 3395 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead", "tr" 3396 * > An end tag whose tag name is "table" 3397 */ 3398 case '+CAPTION': 3399 case '+COL': 3400 case '+COLGROUP': 3401 case '+TBODY': 3402 case '+TFOOT': 3403 case '+THEAD': 3404 case '+TR': 3405 case '-TABLE': 3406 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) { 3407 // Parse error: ignore the token. 3408 return $this->step(); 3409 } 3410 3411 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3412 $this->state->stack_of_open_elements->pop(); 3413 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3414 return $this->step( self::REPROCESS_CURRENT_NODE ); 3415 3416 /* 3417 * > An end tag whose tag name is one of: "tbody", "tfoot", "thead" 3418 */ 3419 case '-TBODY': 3420 case '-TFOOT': 3421 case '-THEAD': 3422 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3423 // Parse error: ignore the token. 3424 return $this->step(); 3425 } 3426 3427 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) { 3428 // Ignore the token. 3429 return $this->step(); 3430 } 3431 3432 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3433 $this->state->stack_of_open_elements->pop(); 3434 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3435 return $this->step( self::REPROCESS_CURRENT_NODE ); 3436 3437 /* 3438 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th" 3439 */ 3440 case '-BODY': 3441 case '-CAPTION': 3442 case '-COL': 3443 case '-COLGROUP': 3444 case '-HTML': 3445 case '-TD': 3446 case '-TH': 3447 // Parse error: ignore the token. 3448 return $this->step(); 3449 } 3450 3451 /* 3452 * > Anything else 3453 * > Process the token using the rules for the "in table" insertion mode. 3454 */ 3455 return $this->step_in_table(); 3456 } 3457 3458 /** 3459 * Parses next element in the 'in cell' insertion mode. 3460 * 3461 * This internal function performs the 'in cell' insertion mode 3462 * logic for the generalized WP_HTML_Processor::step() function. 3463 * 3464 * @since 6.7.0 3465 * 3466 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3467 * 3468 * @see https://html.spec.whatwg.org/#parsing-main-intd 3469 * @see WP_HTML_Processor::step 3470 * 3471 * @return bool Whether an element was found. 3472 */ 3473 private function step_in_cell(): bool { 3474 $tag_name = $this->get_tag(); 3475 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3476 $op = "{$op_sigil}{$tag_name}"; 3477 3478 switch ( $op ) { 3479 /* 3480 * > An end tag whose tag name is one of: "td", "th" 3481 */ 3482 case '-TD': 3483 case '-TH': 3484 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3485 // Parse error: ignore the token. 3486 return $this->step(); 3487 } 3488 3489 $this->generate_implied_end_tags(); 3490 3491 /* 3492 * @todo This needs to check if the current node is an HTML element, meaning that 3493 * when SVG and MathML support is added, this needs to differentiate between an 3494 * HTML element of the given name, such as `<center>`, and a foreign element of 3495 * the same given name. 3496 */ 3497 if ( ! $this->state->stack_of_open_elements->current_node_is( $tag_name ) ) { 3498 // @todo Indicate a parse error once it's possible. 3499 } 3500 3501 $this->state->stack_of_open_elements->pop_until( $tag_name ); 3502 $this->state->active_formatting_elements->clear_up_to_last_marker(); 3503 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3504 return true; 3505 3506 /* 3507 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", 3508 * > "tfoot", "th", "thead", "tr" 3509 */ 3510 case '+CAPTION': 3511 case '+COL': 3512 case '+COLGROUP': 3513 case '+TBODY': 3514 case '+TD': 3515 case '+TFOOT': 3516 case '+TH': 3517 case '+THEAD': 3518 case '+TR': 3519 /* 3520 * > Assert: The stack of open elements has a td or th element in table scope. 3521 * 3522 * Nothing to do here, except to verify in tests that this never appears. 3523 */ 3524 3525 $this->close_cell(); 3526 return $this->step( self::REPROCESS_CURRENT_NODE ); 3527 3528 /* 3529 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html" 3530 */ 3531 case '-BODY': 3532 case '-CAPTION': 3533 case '-COL': 3534 case '-COLGROUP': 3535 case '-HTML': 3536 // Parse error: ignore the token. 3537 return $this->step(); 3538 3539 /* 3540 * > An end tag whose tag name is one of: "table", "tbody", "tfoot", "thead", "tr" 3541 */ 3542 case '-TABLE': 3543 case '-TBODY': 3544 case '-TFOOT': 3545 case '-THEAD': 3546 case '-TR': 3547 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3548 // Parse error: ignore the token. 3549 return $this->step(); 3550 } 3551 $this->close_cell(); 3552 return $this->step( self::REPROCESS_CURRENT_NODE ); 3553 } 3554 3555 /* 3556 * > Anything else 3557 * > Process the token using the rules for the "in body" insertion mode. 3558 */ 3559 return $this->step_in_body(); 3560 } 3561 3562 /** 3563 * Parses next element in the 'in select' insertion mode. 3564 * 3565 * This internal function performs the 'in select' insertion mode 3566 * logic for the generalized WP_HTML_Processor::step() function. 3567 * 3568 * @since 6.7.0 3569 * 3570 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3571 * 3572 * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect 3573 * @see WP_HTML_Processor::step 3574 * 3575 * @return bool Whether an element was found. 3576 */ 3577 private function step_in_select(): bool { 3578 $token_name = $this->get_token_name(); 3579 $token_type = $this->get_token_type(); 3580 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 3581 $op = "{$op_sigil}{$token_name}"; 3582 3583 switch ( $op ) { 3584 /* 3585 * > Any other character token 3586 */ 3587 case '#text': 3588 /* 3589 * > A character token that is U+0000 NULL 3590 * 3591 * If a text node only comprises null bytes then it should be 3592 * entirely ignored and should not return to calling code. 3593 */ 3594 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) { 3595 // Parse error: ignore the token. 3596 return $this->step(); 3597 } 3598 3599 $this->insert_html_element( $this->state->current_token ); 3600 return true; 3601 3602 /* 3603 * > A comment token 3604 */ 3605 case '#comment': 3606 case '#funky-comment': 3607 case '#presumptuous-tag': 3608 $this->insert_html_element( $this->state->current_token ); 3609 return true; 3610 3611 /* 3612 * > A DOCTYPE token 3613 */ 3614 case 'html': 3615 // Parse error: ignore the token. 3616 return $this->step(); 3617 3618 /* 3619 * > A start tag whose tag name is "html" 3620 */ 3621 case '+HTML': 3622 return $this->step_in_body(); 3623 3624 /* 3625 * > A start tag whose tag name is "option" 3626 */ 3627 case '+OPTION': 3628 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 3629 $this->state->stack_of_open_elements->pop(); 3630 } 3631 $this->insert_html_element( $this->state->current_token ); 3632 return true; 3633 3634 /* 3635 * > A start tag whose tag name is "optgroup" 3636 * > A start tag whose tag name is "hr" 3637 * 3638 * These rules are identical except for the treatment of the self-closing flag and 3639 * the subsequent pop of the HR void element, all of which is handled elsewhere in the processor. 3640 */ 3641 case '+OPTGROUP': 3642 case '+HR': 3643 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 3644 $this->state->stack_of_open_elements->pop(); 3645 } 3646 3647 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) { 3648 $this->state->stack_of_open_elements->pop(); 3649 } 3650 3651 $this->insert_html_element( $this->state->current_token ); 3652 return true; 3653 3654 /* 3655 * > An end tag whose tag name is "optgroup" 3656 */ 3657 case '-OPTGROUP': 3658 $current_node = $this->state->stack_of_open_elements->current_node(); 3659 if ( $current_node && 'OPTION' === $current_node->node_name ) { 3660 foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) { 3661 break; 3662 } 3663 if ( $parent && 'OPTGROUP' === $parent->node_name ) { 3664 $this->state->stack_of_open_elements->pop(); 3665 } 3666 } 3667 3668 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) { 3669 $this->state->stack_of_open_elements->pop(); 3670 return true; 3671 } 3672 3673 // Parse error: ignore the token. 3674 return $this->step(); 3675 3676 /* 3677 * > An end tag whose tag name is "option" 3678 */ 3679 case '-OPTION': 3680 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 3681 $this->state->stack_of_open_elements->pop(); 3682 return true; 3683 } 3684 3685 // Parse error: ignore the token. 3686 return $this->step(); 3687 3688 /* 3689 * > An end tag whose tag name is "select" 3690 * > A start tag whose tag name is "select" 3691 * 3692 * > It just gets treated like an end tag. 3693 */ 3694 case '-SELECT': 3695 case '+SELECT': 3696 if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) { 3697 // Parse error: ignore the token. 3698 return $this->step(); 3699 } 3700 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 3701 $this->reset_insertion_mode_appropriately(); 3702 return true; 3703 3704 /* 3705 * > A start tag whose tag name is one of: "input", "keygen", "textarea" 3706 * 3707 * All three of these tags are considered a parse error when found in this insertion mode. 3708 */ 3709 case '+INPUT': 3710 case '+KEYGEN': 3711 case '+TEXTAREA': 3712 if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) { 3713 // Ignore the token. 3714 return $this->step(); 3715 } 3716 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 3717 $this->reset_insertion_mode_appropriately(); 3718 return $this->step( self::REPROCESS_CURRENT_NODE ); 3719 3720 /* 3721 * > A start tag whose tag name is one of: "script", "template" 3722 * > An end tag whose tag name is "template" 3723 */ 3724 case '+SCRIPT': 3725 case '+TEMPLATE': 3726 case '-TEMPLATE': 3727 return $this->step_in_head(); 3728 } 3729 3730 /* 3731 * > Anything else 3732 * > Parse error: ignore the token. 3733 */ 3734 return $this->step(); 3735 } 3736 3737 /** 3738 * Parses next element in the 'in select in table' insertion mode. 3739 * 3740 * This internal function performs the 'in select in table' insertion mode 3741 * logic for the generalized WP_HTML_Processor::step() function. 3742 * 3743 * @since 6.7.0 3744 * 3745 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3746 * 3747 * @see https://html.spec.whatwg.org/#parsing-main-inselectintable 3748 * @see WP_HTML_Processor::step 3749 * 3750 * @return bool Whether an element was found. 3751 */ 3752 private function step_in_select_in_table(): bool { 3753 $token_name = $this->get_token_name(); 3754 $token_type = $this->get_token_type(); 3755 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 3756 $op = "{$op_sigil}{$token_name}"; 3757 3758 switch ( $op ) { 3759 /* 3760 * > A start tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" 3761 */ 3762 case '+CAPTION': 3763 case '+TABLE': 3764 case '+TBODY': 3765 case '+TFOOT': 3766 case '+THEAD': 3767 case '+TR': 3768 case '+TD': 3769 case '+TH': 3770 // @todo Indicate a parse error once it's possible. 3771 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 3772 $this->reset_insertion_mode_appropriately(); 3773 return $this->step( self::REPROCESS_CURRENT_NODE ); 3774 3775 /* 3776 * > An end tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" 3777 */ 3778 case '-CAPTION': 3779 case '-TABLE': 3780 case '-TBODY': 3781 case '-TFOOT': 3782 case '-THEAD': 3783 case '-TR': 3784 case '-TD': 3785 case '-TH': 3786 // @todo Indicate a parse error once it's possible. 3787 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $token_name ) ) { 3788 return $this->step(); 3789 } 3790 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 3791 $this->reset_insertion_mode_appropriately(); 3792 return $this->step( self::REPROCESS_CURRENT_NODE ); 3793 } 3794 3795 /* 3796 * > Anything else 3797 */ 3798 return $this->step_in_select(); 3799 } 3800 3801 /** 3802 * Parses next element in the 'in template' insertion mode. 3803 * 3804 * This internal function performs the 'in template' insertion mode 3805 * logic for the generalized WP_HTML_Processor::step() function. 3806 * 3807 * @since 6.7.0 Stub implementation. 3808 * 3809 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3810 * 3811 * @see https://html.spec.whatwg.org/#parsing-main-intemplate 3812 * @see WP_HTML_Processor::step 3813 * 3814 * @return bool Whether an element was found. 3815 */ 3816 private function step_in_template(): bool { 3817 $token_name = $this->get_token_name(); 3818 $token_type = $this->get_token_type(); 3819 $is_closer = $this->is_tag_closer(); 3820 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 3821 $op = "{$op_sigil}{$token_name}"; 3822 3823 switch ( $op ) { 3824 /* 3825 * > A character token 3826 * > A comment token 3827 * > A DOCTYPE token 3828 */ 3829 case '#text': 3830 case '#comment': 3831 case '#funky-comment': 3832 case '#presumptuous-tag': 3833 case 'html': 3834 return $this->step_in_body(); 3835 3836 /* 3837 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link", 3838 * > "meta", "noframes", "script", "style", "template", "title" 3839 * > An end tag whose tag name is "template" 3840 */ 3841 case '+BASE': 3842 case '+BASEFONT': 3843 case '+BGSOUND': 3844 case '+LINK': 3845 case '+META': 3846 case '+NOFRAMES': 3847 case '+SCRIPT': 3848 case '+STYLE': 3849 case '+TEMPLATE': 3850 case '+TITLE': 3851 case '-TEMPLATE': 3852 return $this->step_in_head(); 3853 3854 /* 3855 * > A start tag whose tag name is one of: "caption", "colgroup", "tbody", "tfoot", "thead" 3856 */ 3857 case '+CAPTION': 3858 case '+COLGROUP': 3859 case '+TBODY': 3860 case '+TFOOT': 3861 case '+THEAD': 3862 array_pop( $this->state->stack_of_template_insertion_modes ); 3863 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3864 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3865 return $this->step( self::REPROCESS_CURRENT_NODE ); 3866 3867 /* 3868 * > A start tag whose tag name is "col" 3869 */ 3870 case '+COL': 3871 array_pop( $this->state->stack_of_template_insertion_modes ); 3872 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 3873 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 3874 return $this->step( self::REPROCESS_CURRENT_NODE ); 3875 3876 /* 3877 * > A start tag whose tag name is "tr" 3878 */ 3879 case '+TR': 3880 array_pop( $this->state->stack_of_template_insertion_modes ); 3881 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3882 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3883 return $this->step( self::REPROCESS_CURRENT_NODE ); 3884 3885 /* 3886 * > A start tag whose tag name is one of: "td", "th" 3887 */ 3888 case '+TD': 3889 case '+TH': 3890 array_pop( $this->state->stack_of_template_insertion_modes ); 3891 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3892 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3893 return $this->step( self::REPROCESS_CURRENT_NODE ); 3894 } 3895 3896 /* 3897 * > Any other start tag 3898 */ 3899 if ( ! $is_closer ) { 3900 array_pop( $this->state->stack_of_template_insertion_modes ); 3901 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 3902 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 3903 return $this->step( self::REPROCESS_CURRENT_NODE ); 3904 } 3905 3906 /* 3907 * > Any other end tag 3908 */ 3909 if ( $is_closer ) { 3910 // Parse error: ignore the token. 3911 return $this->step(); 3912 } 3913 3914 /* 3915 * > An end-of-file token 3916 */ 3917 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 3918 // Stop parsing. 3919 return false; 3920 } 3921 3922 // @todo Indicate a parse error once it's possible. 3923 $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' ); 3924 $this->state->active_formatting_elements->clear_up_to_last_marker(); 3925 array_pop( $this->state->stack_of_template_insertion_modes ); 3926 $this->reset_insertion_mode_appropriately(); 3927 return $this->step( self::REPROCESS_CURRENT_NODE ); 3928 } 3929 3930 /** 3931 * Parses next element in the 'after body' insertion mode. 3932 * 3933 * This internal function performs the 'after body' insertion mode 3934 * logic for the generalized WP_HTML_Processor::step() function. 3935 * 3936 * @since 6.7.0 Stub implementation. 3937 * 3938 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3939 * 3940 * @see https://html.spec.whatwg.org/#parsing-main-afterbody 3941 * @see WP_HTML_Processor::step 3942 * 3943 * @return bool Whether an element was found. 3944 */ 3945 private function step_after_body(): bool { 3946 $tag_name = $this->get_token_name(); 3947 $token_type = $this->get_token_type(); 3948 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 3949 $op = "{$op_sigil}{$tag_name}"; 3950 3951 switch ( $op ) { 3952 /* 3953 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 3954 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 3955 * 3956 * > Process the token using the rules for the "in body" insertion mode. 3957 */ 3958 case '#text': 3959 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 3960 return $this->step_in_body(); 3961 } 3962 goto after_body_anything_else; 3963 break; 3964 3965 /* 3966 * > A comment token 3967 */ 3968 case '#comment': 3969 case '#funky-comment': 3970 case '#presumptuous-tag': 3971 $this->bail( 'Content outside of BODY is unsupported.' ); 3972 break; 3973 3974 /* 3975 * > A DOCTYPE token 3976 */ 3977 case 'html': 3978 // Parse error: ignore the token. 3979 return $this->step(); 3980 3981 /* 3982 * > A start tag whose tag name is "html" 3983 */ 3984 case '+HTML': 3985 return $this->step_in_body(); 3986 3987 /* 3988 * > An end tag whose tag name is "html" 3989 * 3990 * > If the parser was created as part of the HTML fragment parsing algorithm, 3991 * > this is a parse error; ignore the token. (fragment case) 3992 * > 3993 * > Otherwise, switch the insertion mode to "after after body". 3994 */ 3995 case '-HTML': 3996 if ( isset( $this->context_node ) ) { 3997 return $this->step(); 3998 } 3999 4000 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY; 4001 return true; 4002 } 4003 4004 /* 4005 * > Parse error. Switch the insertion mode to "in body" and reprocess the token. 4006 */ 4007 after_body_anything_else: 4008 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 4009 return $this->step( self::REPROCESS_CURRENT_NODE ); 4010 } 4011 4012 /** 4013 * Parses next element in the 'in frameset' insertion mode. 4014 * 4015 * This internal function performs the 'in frameset' insertion mode 4016 * logic for the generalized WP_HTML_Processor::step() function. 4017 * 4018 * @since 6.7.0 Stub implementation. 4019 * 4020 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4021 * 4022 * @see https://html.spec.whatwg.org/#parsing-main-inframeset 4023 * @see WP_HTML_Processor::step 4024 * 4025 * @return bool Whether an element was found. 4026 */ 4027 private function step_in_frameset(): bool { 4028 $tag_name = $this->get_token_name(); 4029 $token_type = $this->get_token_type(); 4030 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4031 $op = "{$op_sigil}{$tag_name}"; 4032 4033 switch ( $op ) { 4034 /* 4035 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4036 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4037 * > 4038 * > Insert the character. 4039 * 4040 * This algorithm effectively strips non-whitespace characters from text and inserts 4041 * them under HTML. This is not supported at this time. 4042 */ 4043 case '#text': 4044 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4045 return $this->step_in_body(); 4046 } 4047 $this->bail( 'Non-whitespace characters cannot be handled in frameset.' ); 4048 break; 4049 4050 /* 4051 * > A comment token 4052 */ 4053 case '#comment': 4054 case '#funky-comment': 4055 case '#presumptuous-tag': 4056 $this->insert_html_element( $this->state->current_token ); 4057 return true; 4058 4059 /* 4060 * > A DOCTYPE token 4061 */ 4062 case 'html': 4063 // Parse error: ignore the token. 4064 return $this->step(); 4065 4066 /* 4067 * > A start tag whose tag name is "html" 4068 */ 4069 case '+HTML': 4070 return $this->step_in_body(); 4071 4072 /* 4073 * > A start tag whose tag name is "frameset" 4074 */ 4075 case '+FRAMESET': 4076 $this->insert_html_element( $this->state->current_token ); 4077 return true; 4078 4079 /* 4080 * > An end tag whose tag name is "frameset" 4081 */ 4082 case '-FRAMESET': 4083 /* 4084 * > If the current node is the root html element, then this is a parse error; 4085 * > ignore the token. (fragment case) 4086 */ 4087 if ( $this->state->stack_of_open_elements->current_node_is( 'HTML' ) ) { 4088 return $this->step(); 4089 } 4090 4091 /* 4092 * > Otherwise, pop the current node from the stack of open elements. 4093 */ 4094 $this->state->stack_of_open_elements->pop(); 4095 4096 /* 4097 * > If the parser was not created as part of the HTML fragment parsing algorithm 4098 * > (fragment case), and the current node is no longer a frameset element, then 4099 * > switch the insertion mode to "after frameset". 4100 */ 4101 if ( ! isset( $this->context_node ) && ! $this->state->stack_of_open_elements->current_node_is( 'FRAMESET' ) ) { 4102 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET; 4103 } 4104 4105 return true; 4106 4107 /* 4108 * > A start tag whose tag name is "frame" 4109 * 4110 * > Insert an HTML element for the token. Immediately pop the 4111 * > current node off the stack of open elements. 4112 * > 4113 * > Acknowledge the token's self-closing flag, if it is set. 4114 */ 4115 case '+FRAME': 4116 $this->insert_html_element( $this->state->current_token ); 4117 $this->state->stack_of_open_elements->pop(); 4118 return true; 4119 4120 /* 4121 * > A start tag whose tag name is "noframes" 4122 */ 4123 case '+NOFRAMES': 4124 return $this->step_in_head(); 4125 } 4126 4127 // Parse error: ignore the token. 4128 return $this->step(); 4129 } 4130 4131 /** 4132 * Parses next element in the 'after frameset' insertion mode. 4133 * 4134 * This internal function performs the 'after frameset' insertion mode 4135 * logic for the generalized WP_HTML_Processor::step() function. 4136 * 4137 * @since 6.7.0 Stub implementation. 4138 * 4139 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4140 * 4141 * @see https://html.spec.whatwg.org/#parsing-main-afterframeset 4142 * @see WP_HTML_Processor::step 4143 * 4144 * @return bool Whether an element was found. 4145 */ 4146 private function step_after_frameset(): bool { 4147 $tag_name = $this->get_token_name(); 4148 $token_type = $this->get_token_type(); 4149 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4150 $op = "{$op_sigil}{$tag_name}"; 4151 4152 switch ( $op ) { 4153 /* 4154 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4155 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4156 * > 4157 * > Insert the character. 4158 * 4159 * This algorithm effectively strips non-whitespace characters from text and inserts 4160 * them under HTML. This is not supported at this time. 4161 */ 4162 case '#text': 4163 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4164 return $this->step_in_body(); 4165 } 4166 $this->bail( 'Non-whitespace characters cannot be handled in after frameset' ); 4167 break; 4168 4169 /* 4170 * > A comment token 4171 */ 4172 case '#comment': 4173 case '#funky-comment': 4174 case '#presumptuous-tag': 4175 $this->insert_html_element( $this->state->current_token ); 4176 return true; 4177 4178 /* 4179 * > A DOCTYPE token 4180 */ 4181 case 'html': 4182 // Parse error: ignore the token. 4183 return $this->step(); 4184 4185 /* 4186 * > A start tag whose tag name is "html" 4187 */ 4188 case '+HTML': 4189 return $this->step_in_body(); 4190 4191 /* 4192 * > An end tag whose tag name is "html" 4193 */ 4194 case '-HTML': 4195 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET; 4196 return true; 4197 4198 /* 4199 * > A start tag whose tag name is "noframes" 4200 */ 4201 case '+NOFRAMES': 4202 return $this->step_in_head(); 4203 } 4204 4205 // Parse error: ignore the token. 4206 return $this->step(); 4207 } 4208 4209 /** 4210 * Parses next element in the 'after after body' insertion mode. 4211 * 4212 * This internal function performs the 'after after body' insertion mode 4213 * logic for the generalized WP_HTML_Processor::step() function. 4214 * 4215 * @since 6.7.0 Stub implementation. 4216 * 4217 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4218 * 4219 * @see https://html.spec.whatwg.org/#the-after-after-body-insertion-mode 4220 * @see WP_HTML_Processor::step 4221 * 4222 * @return bool Whether an element was found. 4223 */ 4224 private function step_after_after_body(): bool { 4225 $tag_name = $this->get_token_name(); 4226 $token_type = $this->get_token_type(); 4227 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4228 $op = "{$op_sigil}{$tag_name}"; 4229 4230 switch ( $op ) { 4231 /* 4232 * > A comment token 4233 */ 4234 case '#comment': 4235 case '#funky-comment': 4236 case '#presumptuous-tag': 4237 $this->bail( 'Content outside of HTML is unsupported.' ); 4238 break; 4239 4240 /* 4241 * > A DOCTYPE token 4242 * > A start tag whose tag name is "html" 4243 * 4244 * > Process the token using the rules for the "in body" insertion mode. 4245 */ 4246 case 'html': 4247 case '+HTML': 4248 return $this->step_in_body(); 4249 4250 /* 4251 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4252 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4253 * > 4254 * > Process the token using the rules for the "in body" insertion mode. 4255 */ 4256 case '#text': 4257 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4258 return $this->step_in_body(); 4259 } 4260 goto after_after_body_anything_else; 4261 break; 4262 } 4263 4264 /* 4265 * > Parse error. Switch the insertion mode to "in body" and reprocess the token. 4266 */ 4267 after_after_body_anything_else: 4268 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 4269 return $this->step( self::REPROCESS_CURRENT_NODE ); 4270 } 4271 4272 /** 4273 * Parses next element in the 'after after frameset' insertion mode. 4274 * 4275 * This internal function performs the 'after after frameset' insertion mode 4276 * logic for the generalized WP_HTML_Processor::step() function. 4277 * 4278 * @since 6.7.0 Stub implementation. 4279 * 4280 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4281 * 4282 * @see https://html.spec.whatwg.org/#the-after-after-frameset-insertion-mode 4283 * @see WP_HTML_Processor::step 4284 * 4285 * @return bool Whether an element was found. 4286 */ 4287 private function step_after_after_frameset(): bool { 4288 $tag_name = $this->get_token_name(); 4289 $token_type = $this->get_token_type(); 4290 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4291 $op = "{$op_sigil}{$tag_name}"; 4292 4293 switch ( $op ) { 4294 /* 4295 * > A comment token 4296 */ 4297 case '#comment': 4298 case '#funky-comment': 4299 case '#presumptuous-tag': 4300 $this->bail( 'Content outside of HTML is unsupported.' ); 4301 break; 4302 4303 /* 4304 * > A DOCTYPE token 4305 * > A start tag whose tag name is "html" 4306 * 4307 * > Process the token using the rules for the "in body" insertion mode. 4308 */ 4309 case 'html': 4310 case '+HTML': 4311 return $this->step_in_body(); 4312 4313 /* 4314 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4315 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4316 * > 4317 * > Process the token using the rules for the "in body" insertion mode. 4318 * 4319 * This algorithm effectively strips non-whitespace characters from text and inserts 4320 * them under HTML. This is not supported at this time. 4321 */ 4322 case '#text': 4323 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4324 return $this->step_in_body(); 4325 } 4326 $this->bail( 'Non-whitespace characters cannot be handled in after after frameset.' ); 4327 break; 4328 4329 /* 4330 * > A start tag whose tag name is "noframes" 4331 */ 4332 case '+NOFRAMES': 4333 return $this->step_in_head(); 4334 } 4335 4336 // Parse error: ignore the token. 4337 return $this->step(); 4338 } 4339 4340 /** 4341 * Parses next element in the 'in foreign content' insertion mode. 4342 * 4343 * This internal function performs the 'in foreign content' insertion mode 4344 * logic for the generalized WP_HTML_Processor::step() function. 4345 * 4346 * @since 6.7.0 Stub implementation. 4347 * 4348 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4349 * 4350 * @see https://html.spec.whatwg.org/#parsing-main-inforeign 4351 * @see WP_HTML_Processor::step 4352 * 4353 * @return bool Whether an element was found. 4354 */ 4355 private function step_in_foreign_content(): bool { 4356 $tag_name = $this->get_token_name(); 4357 $token_type = $this->get_token_type(); 4358 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4359 $op = "{$op_sigil}{$tag_name}"; 4360 4361 /* 4362 * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size" 4363 * 4364 * This section drawn out above the switch to more easily incorporate 4365 * the additional rules based on the presence of the attributes. 4366 */ 4367 if ( 4368 '+FONT' === $op && 4369 ( 4370 null !== $this->get_attribute( 'color' ) || 4371 null !== $this->get_attribute( 'face' ) || 4372 null !== $this->get_attribute( 'size' ) 4373 ) 4374 ) { 4375 $op = '+FONT with attributes'; 4376 } 4377 4378 switch ( $op ) { 4379 case '#text': 4380 /* 4381 * > A character token that is U+0000 NULL 4382 * 4383 * This is handled by `get_modifiable_text()`. 4384 */ 4385 4386 /* 4387 * Whitespace-only text does not affect the frameset-ok flag. 4388 * It is probably inter-element whitespace, but it may also 4389 * contain character references which decode only to whitespace. 4390 */ 4391 if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) { 4392 $this->state->frameset_ok = false; 4393 } 4394 4395 $this->insert_foreign_element( $this->state->current_token, false ); 4396 return true; 4397 4398 /* 4399 * CDATA sections are alternate wrappers for text content and therefore 4400 * ought to follow the same rules as text nodes. 4401 */ 4402 case '#cdata-section': 4403 /* 4404 * NULL bytes and whitespace do not change the frameset-ok flag. 4405 */ 4406 $current_token = $this->bookmarks[ $this->state->current_token->bookmark_name ]; 4407 $cdata_content_start = $current_token->start + 9; 4408 $cdata_content_length = $current_token->length - 12; 4409 if ( strspn( $this->html, "\0 \t\n\f\r", $cdata_content_start, $cdata_content_length ) !== $cdata_content_length ) { 4410 $this->state->frameset_ok = false; 4411 } 4412 4413 $this->insert_foreign_element( $this->state->current_token, false ); 4414 return true; 4415 4416 /* 4417 * > A comment token 4418 */ 4419 case '#comment': 4420 case '#funky-comment': 4421 case '#presumptuous-tag': 4422 $this->insert_foreign_element( $this->state->current_token, false ); 4423 return true; 4424 4425 /* 4426 * > A DOCTYPE token 4427 */ 4428 case 'html': 4429 // Parse error: ignore the token. 4430 return $this->step(); 4431 4432 /* 4433 * > A start tag whose tag name is "b", "big", "blockquote", "body", "br", "center", 4434 * > "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5", 4435 * > "h6", "head", "hr", "i", "img", "li", "listing", "menu", "meta", "nobr", "ol", 4436 * > "p", "pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup", 4437 * > "table", "tt", "u", "ul", "var" 4438 * 4439 * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size" 4440 * 4441 * > An end tag whose tag name is "br", "p" 4442 * 4443 * Closing BR tags are always reported by the Tag Processor as opening tags. 4444 */ 4445 case '+B': 4446 case '+BIG': 4447 case '+BLOCKQUOTE': 4448 case '+BODY': 4449 case '+BR': 4450 case '+CENTER': 4451 case '+CODE': 4452 case '+DD': 4453 case '+DIV': 4454 case '+DL': 4455 case '+DT': 4456 case '+EM': 4457 case '+EMBED': 4458 case '+H1': 4459 case '+H2': 4460 case '+H3': 4461 case '+H4': 4462 case '+H5': 4463 case '+H6': 4464 case '+HEAD': 4465 case '+HR': 4466 case '+I': 4467 case '+IMG': 4468 case '+LI': 4469 case '+LISTING': 4470 case '+MENU': 4471 case '+META': 4472 case '+NOBR': 4473 case '+OL': 4474 case '+P': 4475 case '+PRE': 4476 case '+RUBY': 4477 case '+S': 4478 case '+SMALL': 4479 case '+SPAN': 4480 case '+STRONG': 4481 case '+STRIKE': 4482 case '+SUB': 4483 case '+SUP': 4484 case '+TABLE': 4485 case '+TT': 4486 case '+U': 4487 case '+UL': 4488 case '+VAR': 4489 case '+FONT with attributes': 4490 case '-BR': 4491 case '-P': 4492 // @todo Indicate a parse error once it's possible. 4493 foreach ( $this->state->stack_of_open_elements->walk_up() as $current_node ) { 4494 if ( 4495 'math' === $current_node->integration_node_type || 4496 'html' === $current_node->integration_node_type || 4497 'html' === $current_node->namespace 4498 ) { 4499 break; 4500 } 4501 4502 $this->state->stack_of_open_elements->pop(); 4503 } 4504 return $this->step( self::REPROCESS_CURRENT_NODE ); 4505 } 4506 4507 /* 4508 * > Any other start tag 4509 */ 4510 if ( ! $this->is_tag_closer() ) { 4511 $this->insert_foreign_element( $this->state->current_token, false ); 4512 4513 /* 4514 * > If the token has its self-closing flag set, then run 4515 * > the appropriate steps from the following list: 4516 * > 4517 * > ↪ the token's tag name is "script", and the new current node is in the SVG namespace 4518 * > Acknowledge the token's self-closing flag, and then act as 4519 * > described in the steps for a "script" end tag below. 4520 * > 4521 * > ↪ Otherwise 4522 * > Pop the current node off the stack of open elements and 4523 * > acknowledge the token's self-closing flag. 4524 * 4525 * Since the rules for SCRIPT below indicate to pop the element off of the stack of 4526 * open elements, which is the same for the Otherwise condition, there's no need to 4527 * separate these checks. The difference comes when a parser operates with the scripting 4528 * flag enabled, and executes the script, which this parser does not support. 4529 */ 4530 if ( $this->state->current_token->has_self_closing_flag ) { 4531 $this->state->stack_of_open_elements->pop(); 4532 } 4533 return true; 4534 } 4535 4536 /* 4537 * > An end tag whose name is "script", if the current node is an SVG script element. 4538 */ 4539 if ( $this->is_tag_closer() && 'SCRIPT' === $this->state->current_token->node_name && 'svg' === $this->state->current_token->namespace ) { 4540 $this->state->stack_of_open_elements->pop(); 4541 return true; 4542 } 4543 4544 /* 4545 * > Any other end tag 4546 */ 4547 if ( $this->is_tag_closer() ) { 4548 $node = $this->state->stack_of_open_elements->current_node(); 4549 if ( $tag_name !== $node->node_name ) { 4550 // @todo Indicate a parse error once it's possible. 4551 } 4552 in_foreign_content_end_tag_loop: 4553 if ( $node === $this->state->stack_of_open_elements->at( 1 ) ) { 4554 return true; 4555 } 4556 4557 /* 4558 * > If node's tag name, converted to ASCII lowercase, is the same as the tag name 4559 * > of the token, pop elements from the stack of open elements until node has 4560 * > been popped from the stack, and then return. 4561 */ 4562 if ( 0 === strcasecmp( $node->node_name, $tag_name ) ) { 4563 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { 4564 $this->state->stack_of_open_elements->pop(); 4565 if ( $node === $item ) { 4566 return true; 4567 } 4568 } 4569 } 4570 4571 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) { 4572 $node = $item; 4573 break; 4574 } 4575 4576 if ( 'html' !== $node->namespace ) { 4577 goto in_foreign_content_end_tag_loop; 4578 } 4579 4580 switch ( $this->state->insertion_mode ) { 4581 case WP_HTML_Processor_State::INSERTION_MODE_INITIAL: 4582 return $this->step_initial(); 4583 4584 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML: 4585 return $this->step_before_html(); 4586 4587 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD: 4588 return $this->step_before_head(); 4589 4590 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD: 4591 return $this->step_in_head(); 4592 4593 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT: 4594 return $this->step_in_head_noscript(); 4595 4596 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD: 4597 return $this->step_after_head(); 4598 4599 case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY: 4600 return $this->step_in_body(); 4601 4602 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE: 4603 return $this->step_in_table(); 4604 4605 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT: 4606 return $this->step_in_table_text(); 4607 4608 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION: 4609 return $this->step_in_caption(); 4610 4611 case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP: 4612 return $this->step_in_column_group(); 4613 4614 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY: 4615 return $this->step_in_table_body(); 4616 4617 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW: 4618 return $this->step_in_row(); 4619 4620 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL: 4621 return $this->step_in_cell(); 4622 4623 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT: 4624 return $this->step_in_select(); 4625 4626 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE: 4627 return $this->step_in_select_in_table(); 4628 4629 case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE: 4630 return $this->step_in_template(); 4631 4632 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY: 4633 return $this->step_after_body(); 4634 4635 case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET: 4636 return $this->step_in_frameset(); 4637 4638 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET: 4639 return $this->step_after_frameset(); 4640 4641 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY: 4642 return $this->step_after_after_body(); 4643 4644 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET: 4645 return $this->step_after_after_frameset(); 4646 4647 // This should be unreachable but PHP doesn't have total type checking on switch. 4648 default: 4649 $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." ); 4650 } 4651 } 4652 4653 $this->bail( 'Should not have been able to reach end of IN FOREIGN CONTENT processing. Check HTML API code.' ); 4654 // This unnecessary return prevents tools from inaccurately reporting type errors. 4655 return false; 4656 } 4657 4658 /* 4659 * Internal helpers 4660 */ 4661 4662 /** 4663 * Creates a new bookmark for the currently-matched token and returns the generated name. 4664 * 4665 * @since 6.4.0 4666 * @since 6.5.0 Renamed from bookmark_tag() to bookmark_token(). 4667 * 4668 * @throws Exception When unable to allocate requested bookmark. 4669 * 4670 * @return string|false Name of created bookmark, or false if unable to create. 4671 */ 4672 private function bookmark_token() { 4673 if ( ! parent::set_bookmark( ++$this->bookmark_counter ) ) { 4674 $this->last_error = self::ERROR_EXCEEDED_MAX_BOOKMARKS; 4675 throw new Exception( 'could not allocate bookmark' ); 4676 } 4677 4678 return "{$this->bookmark_counter}"; 4679 } 4680 4681 /* 4682 * HTML semantic overrides for Tag Processor 4683 */ 4684 4685 /** 4686 * Indicates the namespace of the current token, or "html" if there is none. 4687 * 4688 * @return string One of "html", "math", or "svg". 4689 */ 4690 public function get_namespace(): string { 4691 if ( ! isset( $this->current_element ) ) { 4692 return parent::get_namespace(); 4693 } 4694 4695 return $this->current_element->token->namespace; 4696 } 4697 4698 /** 4699 * Returns the uppercase name of the matched tag. 4700 * 4701 * The semantic rules for HTML specify that certain tags be reprocessed 4702 * with a different tag name. Because of this, the tag name presented 4703 * by the HTML Processor may differ from the one reported by the HTML 4704 * Tag Processor, which doesn't apply these semantic rules. 4705 * 4706 * Example: 4707 * 4708 * $processor = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' ); 4709 * $processor->next_tag() === true; 4710 * $processor->get_tag() === 'DIV'; 4711 * 4712 * $processor->next_tag() === false; 4713 * $processor->get_tag() === null; 4714 * 4715 * @since 6.4.0 4716 * 4717 * @return string|null Name of currently matched tag in input HTML, or `null` if none found. 4718 */ 4719 public function get_tag(): ?string { 4720 if ( null !== $this->last_error ) { 4721 return null; 4722 } 4723 4724 if ( $this->is_virtual() ) { 4725 return $this->current_element->token->node_name; 4726 } 4727 4728 $tag_name = parent::get_tag(); 4729 4730 /* 4731 * > A start tag whose tag name is "image" 4732 * > Change the token's tag name to "img" and reprocess it. (Don't ask.) 4733 */ 4734 return ( 'IMAGE' === $tag_name && 'html' === $this->get_namespace() ) 4735 ? 'IMG' 4736 : $tag_name; 4737 } 4738 4739 /** 4740 * Indicates if the currently matched tag contains the self-closing flag. 4741 * 4742 * No HTML elements ought to have the self-closing flag and for those, the self-closing 4743 * flag will be ignored. For void elements this is benign because they "self close" 4744 * automatically. For non-void HTML elements though problems will appear if someone 4745 * intends to use a self-closing element in place of that element with an empty body. 4746 * For HTML foreign elements and custom elements the self-closing flag determines if 4747 * they self-close or not. 4748 * 4749 * This function does not determine if a tag is self-closing, 4750 * but only if the self-closing flag is present in the syntax. 4751 * 4752 * @since 6.6.0 Subclassed for the HTML Processor. 4753 * 4754 * @return bool Whether the currently matched tag contains the self-closing flag. 4755 */ 4756 public function has_self_closing_flag(): bool { 4757 return $this->is_virtual() ? false : parent::has_self_closing_flag(); 4758 } 4759 4760 /** 4761 * Returns the node name represented by the token. 4762 * 4763 * This matches the DOM API value `nodeName`. Some values 4764 * are static, such as `#text` for a text node, while others 4765 * are dynamically generated from the token itself. 4766 * 4767 * Dynamic names: 4768 * - Uppercase tag name for tag matches. 4769 * - `html` for DOCTYPE declarations. 4770 * 4771 * Note that if the Tag Processor is not matched on a token 4772 * then this function will return `null`, either because it 4773 * hasn't yet found a token or because it reached the end 4774 * of the document without matching a token. 4775 * 4776 * @since 6.6.0 Subclassed for the HTML Processor. 4777 * 4778 * @return string|null Name of the matched token. 4779 */ 4780 public function get_token_name(): ?string { 4781 return $this->is_virtual() 4782 ? $this->current_element->token->node_name 4783 : parent::get_token_name(); 4784 } 4785 4786 /** 4787 * Indicates the kind of matched token, if any. 4788 * 4789 * This differs from `get_token_name()` in that it always 4790 * returns a static string indicating the type, whereas 4791 * `get_token_name()` may return values derived from the 4792 * token itself, such as a tag name or processing 4793 * instruction tag. 4794 * 4795 * Possible values: 4796 * - `#tag` when matched on a tag. 4797 * - `#text` when matched on a text node. 4798 * - `#cdata-section` when matched on a CDATA node. 4799 * - `#comment` when matched on a comment. 4800 * - `#doctype` when matched on a DOCTYPE declaration. 4801 * - `#presumptuous-tag` when matched on an empty tag closer. 4802 * - `#funky-comment` when matched on a funky comment. 4803 * 4804 * @since 6.6.0 Subclassed for the HTML Processor. 4805 * 4806 * @return string|null What kind of token is matched, or null. 4807 */ 4808 public function get_token_type(): ?string { 4809 if ( $this->is_virtual() ) { 4810 /* 4811 * This logic comes from the Tag Processor. 4812 * 4813 * @todo It would be ideal not to repeat this here, but it's not clearly 4814 * better to allow passing a token name to `get_token_type()`. 4815 */ 4816 $node_name = $this->current_element->token->node_name; 4817 $starting_char = $node_name[0]; 4818 if ( 'A' <= $starting_char && 'Z' >= $starting_char ) { 4819 return '#tag'; 4820 } 4821 4822 if ( 'html' === $node_name ) { 4823 return '#doctype'; 4824 } 4825 4826 return $node_name; 4827 } 4828 4829 return parent::get_token_type(); 4830 } 4831 4832 /** 4833 * Returns the value of a requested attribute from a matched tag opener if that attribute exists. 4834 * 4835 * Example: 4836 * 4837 * $p = WP_HTML_Processor::create_fragment( '<div enabled class="test" data-test-id="14">Test</div>' ); 4838 * $p->next_token() === true; 4839 * $p->get_attribute( 'data-test-id' ) === '14'; 4840 * $p->get_attribute( 'enabled' ) === true; 4841 * $p->get_attribute( 'aria-label' ) === null; 4842 * 4843 * $p->next_tag() === false; 4844 * $p->get_attribute( 'class' ) === null; 4845 * 4846 * @since 6.6.0 Subclassed for HTML Processor. 4847 * 4848 * @param string $name Name of attribute whose value is requested. 4849 * @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`. 4850 */ 4851 public function get_attribute( $name ) { 4852 return $this->is_virtual() ? null : parent::get_attribute( $name ); 4853 } 4854 4855 /** 4856 * Updates or creates a new attribute on the currently matched tag with the passed value. 4857 * 4858 * For boolean attributes special handling is provided: 4859 * - When `true` is passed as the value, then only the attribute name is added to the tag. 4860 * - When `false` is passed, the attribute gets removed if it existed before. 4861 * 4862 * For string attributes, the value is escaped using the `esc_attr` function. 4863 * 4864 * @since 6.6.0 Subclassed for the HTML Processor. 4865 * 4866 * @param string $name The attribute name to target. 4867 * @param string|bool $value The new attribute value. 4868 * @return bool Whether an attribute value was set. 4869 */ 4870 public function set_attribute( $name, $value ): bool { 4871 return $this->is_virtual() ? false : parent::set_attribute( $name, $value ); 4872 } 4873 4874 /** 4875 * Remove an attribute from the currently-matched tag. 4876 * 4877 * @since 6.6.0 Subclassed for HTML Processor. 4878 * 4879 * @param string $name The attribute name to remove. 4880 * @return bool Whether an attribute was removed. 4881 */ 4882 public function remove_attribute( $name ): bool { 4883 return $this->is_virtual() ? false : parent::remove_attribute( $name ); 4884 } 4885 4886 /** 4887 * Gets lowercase names of all attributes matching a given prefix in the current tag. 4888 * 4889 * Note that matching is case-insensitive. This is in accordance with the spec: 4890 * 4891 * > There must never be two or more attributes on 4892 * > the same start tag whose names are an ASCII 4893 * > case-insensitive match for each other. 4894 * - HTML 5 spec 4895 * 4896 * Example: 4897 * 4898 * $p = new WP_HTML_Tag_Processor( '<div data-ENABLED class="test" DATA-test-id="14">Test</div>' ); 4899 * $p->next_tag( array( 'class_name' => 'test' ) ) === true; 4900 * $p->get_attribute_names_with_prefix( 'data-' ) === array( 'data-enabled', 'data-test-id' ); 4901 * 4902 * $p->next_tag() === false; 4903 * $p->get_attribute_names_with_prefix( 'data-' ) === null; 4904 * 4905 * @since 6.6.0 Subclassed for the HTML Processor. 4906 * 4907 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive 4908 * 4909 * @param string $prefix Prefix of requested attribute names. 4910 * @return array|null List of attribute names, or `null` when no tag opener is matched. 4911 */ 4912 public function get_attribute_names_with_prefix( $prefix ): ?array { 4913 return $this->is_virtual() ? null : parent::get_attribute_names_with_prefix( $prefix ); 4914 } 4915 4916 /** 4917 * Adds a new class name to the currently matched tag. 4918 * 4919 * @since 6.6.0 Subclassed for the HTML Processor. 4920 * 4921 * @param string $class_name The class name to add. 4922 * @return bool Whether the class was set to be added. 4923 */ 4924 public function add_class( $class_name ): bool { 4925 return $this->is_virtual() ? false : parent::add_class( $class_name ); 4926 } 4927 4928 /** 4929 * Removes a class name from the currently matched tag. 4930 * 4931 * @since 6.6.0 Subclassed for the HTML Processor. 4932 * 4933 * @param string $class_name The class name to remove. 4934 * @return bool Whether the class was set to be removed. 4935 */ 4936 public function remove_class( $class_name ): bool { 4937 return $this->is_virtual() ? false : parent::remove_class( $class_name ); 4938 } 4939 4940 /** 4941 * Returns if a matched tag contains the given ASCII case-insensitive class name. 4942 * 4943 * @since 6.6.0 Subclassed for the HTML Processor. 4944 * 4945 * @todo When reconstructing active formatting elements with attributes, find a way 4946 * to indicate if the virtually-reconstructed formatting elements contain the 4947 * wanted class name. 4948 * 4949 * @param string $wanted_class Look for this CSS class name, ASCII case-insensitive. 4950 * @return bool|null Whether the matched tag contains the given class name, or null if not matched. 4951 */ 4952 public function has_class( $wanted_class ): ?bool { 4953 return $this->is_virtual() ? null : parent::has_class( $wanted_class ); 4954 } 4955 4956 /** 4957 * Generator for a foreach loop to step through each class name for the matched tag. 4958 * 4959 * This generator function is designed to be used inside a "foreach" loop. 4960 * 4961 * Example: 4962 * 4963 * $p = WP_HTML_Processor::create_fragment( "<div class='free <egg<\tlang-en'>" ); 4964 * $p->next_tag(); 4965 * foreach ( $p->class_list() as $class_name ) { 4966 * echo "{$class_name} "; 4967 * } 4968 * // Outputs: "free <egg> lang-en " 4969 * 4970 * @since 6.6.0 Subclassed for the HTML Processor. 4971 */ 4972 public function class_list() { 4973 return $this->is_virtual() ? null : parent::class_list(); 4974 } 4975 4976 /** 4977 * Returns the modifiable text for a matched token, or an empty string. 4978 * 4979 * Modifiable text is text content that may be read and changed without 4980 * changing the HTML structure of the document around it. This includes 4981 * the contents of `#text` nodes in the HTML as well as the inner 4982 * contents of HTML comments, Processing Instructions, and others, even 4983 * though these nodes aren't part of a parsed DOM tree. They also contain 4984 * the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any 4985 * other section in an HTML document which cannot contain HTML markup (DATA). 4986 * 4987 * If a token has no modifiable text then an empty string is returned to 4988 * avoid needless crashing or type errors. An empty string does not mean 4989 * that a token has modifiable text, and a token with modifiable text may 4990 * have an empty string (e.g. a comment with no contents). 4991 * 4992 * @since 6.6.0 Subclassed for the HTML Processor. 4993 * 4994 * @return string 4995 */ 4996 public function get_modifiable_text(): string { 4997 return $this->is_virtual() ? '' : parent::get_modifiable_text(); 4998 } 4999 5000 /** 5001 * Indicates what kind of comment produced the comment node. 5002 * 5003 * Because there are different kinds of HTML syntax which produce 5004 * comments, the Tag Processor tracks and exposes this as a type 5005 * for the comment. Nominally only regular HTML comments exist as 5006 * they are commonly known, but a number of unrelated syntax errors 5007 * also produce comments. 5008 * 5009 * @see self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT 5010 * @see self::COMMENT_AS_CDATA_LOOKALIKE 5011 * @see self::COMMENT_AS_INVALID_HTML 5012 * @see self::COMMENT_AS_HTML_COMMENT 5013 * @see self::COMMENT_AS_PI_NODE_LOOKALIKE 5014 * 5015 * @since 6.6.0 Subclassed for the HTML Processor. 5016 * 5017 * @return string|null 5018 */ 5019 public function get_comment_type(): ?string { 5020 return $this->is_virtual() ? null : parent::get_comment_type(); 5021 } 5022 5023 /** 5024 * Removes a bookmark that is no longer needed. 5025 * 5026 * Releasing a bookmark frees up the small 5027 * performance overhead it requires. 5028 * 5029 * @since 6.4.0 5030 * 5031 * @param string $bookmark_name Name of the bookmark to remove. 5032 * @return bool Whether the bookmark already existed before removal. 5033 */ 5034 public function release_bookmark( $bookmark_name ): bool { 5035 return parent::release_bookmark( "_{$bookmark_name}" ); 5036 } 5037 5038 /** 5039 * Moves the internal cursor in the HTML Processor to a given bookmark's location. 5040 * 5041 * Be careful! Seeking backwards to a previous location resets the parser to the 5042 * start of the document and reparses the entire contents up until it finds the 5043 * sought-after bookmarked location. 5044 * 5045 * In order to prevent accidental infinite loops, there's a 5046 * maximum limit on the number of times seek() can be called. 5047 * 5048 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. 5049 * 5050 * @since 6.4.0 5051 * 5052 * @param string $bookmark_name Jump to the place in the document identified by this bookmark name. 5053 * @return bool Whether the internal cursor was successfully moved to the bookmark's location. 5054 */ 5055 public function seek( $bookmark_name ): bool { 5056 // Flush any pending updates to the document before beginning. 5057 $this->get_updated_html(); 5058 5059 $actual_bookmark_name = "_{$bookmark_name}"; 5060 $processor_started_at = $this->state->current_token 5061 ? $this->bookmarks[ $this->state->current_token->bookmark_name ]->start 5062 : 0; 5063 $bookmark_starts_at = $this->bookmarks[ $actual_bookmark_name ]->start; 5064 $direction = $bookmark_starts_at > $processor_started_at ? 'forward' : 'backward'; 5065 5066 /* 5067 * If seeking backwards, it's possible that the sought-after bookmark exists within an element 5068 * which has been closed before the current cursor; in other words, it has already been removed 5069 * from the stack of open elements. This means that it's insufficient to simply pop off elements 5070 * from the stack of open elements which appear after the bookmarked location and then jump to 5071 * that location, as the elements which were open before won't be re-opened. 5072 * 5073 * In order to maintain consistency, the HTML Processor rewinds to the start of the document 5074 * and reparses everything until it finds the sought-after bookmark. 5075 * 5076 * There are potentially better ways to do this: cache the parser state for each bookmark and 5077 * restore it when seeking; store an immutable and idempotent register of where elements open 5078 * and close. 5079 * 5080 * If caching the parser state it will be essential to properly maintain the cached stack of 5081 * open elements and active formatting elements when modifying the document. This could be a 5082 * tedious and time-consuming process as well, and so for now will not be performed. 5083 * 5084 * It may be possible to track bookmarks for where elements open and close, and in doing so 5085 * be able to quickly recalculate breadcrumbs for any element in the document. It may even 5086 * be possible to remove the stack of open elements and compute it on the fly this way. 5087 * If doing this, the parser would need to track the opening and closing locations for all 5088 * tokens in the breadcrumb path for any and all bookmarks. By utilizing bookmarks themselves 5089 * this list could be automatically maintained while modifying the document. Finding the 5090 * breadcrumbs would then amount to traversing that list from the start until the token 5091 * being inspected. Once an element closes, if there are no bookmarks pointing to locations 5092 * within that element, then all of these locations may be forgotten to save on memory use 5093 * and computation time. 5094 */ 5095 if ( 'backward' === $direction ) { 5096 /* 5097 * Instead of clearing the parser state and starting fresh, calling the stack methods 5098 * maintains the proper flags in the parser. 5099 */ 5100 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { 5101 if ( 'context-node' === $item->bookmark_name ) { 5102 break; 5103 } 5104 5105 $this->state->stack_of_open_elements->remove_node( $item ); 5106 } 5107 5108 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { 5109 if ( 'context-node' === $item->bookmark_name ) { 5110 break; 5111 } 5112 5113 $this->state->active_formatting_elements->remove_node( $item ); 5114 } 5115 5116 parent::seek( 'context-node' ); 5117 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 5118 $this->state->frameset_ok = true; 5119 $this->element_queue = array(); 5120 $this->current_element = null; 5121 5122 if ( isset( $this->context_node ) ) { 5123 $this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 ); 5124 } else { 5125 $this->breadcrumbs = array(); 5126 } 5127 } 5128 5129 // When moving forwards, reparse the document until reaching the same location as the original bookmark. 5130 if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) { 5131 return true; 5132 } 5133 5134 while ( $this->next_token() ) { 5135 if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) { 5136 while ( isset( $this->current_element ) && WP_HTML_Stack_Event::POP === $this->current_element->operation ) { 5137 $this->current_element = array_shift( $this->element_queue ); 5138 } 5139 return true; 5140 } 5141 } 5142 5143 return false; 5144 } 5145 5146 /** 5147 * Sets a bookmark in the HTML document. 5148 * 5149 * Bookmarks represent specific places or tokens in the HTML 5150 * document, such as a tag opener or closer. When applying 5151 * edits to a document, such as setting an attribute, the 5152 * text offsets of that token may shift; the bookmark is 5153 * kept updated with those shifts and remains stable unless 5154 * the entire span of text in which the token sits is removed. 5155 * 5156 * Release bookmarks when they are no longer needed. 5157 * 5158 * Example: 5159 * 5160 * <main><h2>Surprising fact you may not know!</h2></main> 5161 * ^ ^ 5162 * \-|-- this `H2` opener bookmark tracks the token 5163 * 5164 * <main class="clickbait"><h2>Surprising fact you may no… 5165 * ^ ^ 5166 * \-|-- it shifts with edits 5167 * 5168 * Bookmarks provide the ability to seek to a previously-scanned 5169 * place in the HTML document. This avoids the need to re-scan 5170 * the entire document. 5171 * 5172 * Example: 5173 * 5174 * <ul><li>One</li><li>Two</li><li>Three</li></ul> 5175 * ^^^^ 5176 * want to note this last item 5177 * 5178 * $p = new WP_HTML_Tag_Processor( $html ); 5179 * $in_list = false; 5180 * while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) { 5181 * if ( 'UL' === $p->get_tag() ) { 5182 * if ( $p->is_tag_closer() ) { 5183 * $in_list = false; 5184 * $p->set_bookmark( 'resume' ); 5185 * if ( $p->seek( 'last-li' ) ) { 5186 * $p->add_class( 'last-li' ); 5187 * } 5188 * $p->seek( 'resume' ); 5189 * $p->release_bookmark( 'last-li' ); 5190 * $p->release_bookmark( 'resume' ); 5191 * } else { 5192 * $in_list = true; 5193 * } 5194 * } 5195 * 5196 * if ( 'LI' === $p->get_tag() ) { 5197 * $p->set_bookmark( 'last-li' ); 5198 * } 5199 * } 5200 * 5201 * Bookmarks intentionally hide the internal string offsets 5202 * to which they refer. They are maintained internally as 5203 * updates are applied to the HTML document and therefore 5204 * retain their "position" - the location to which they 5205 * originally pointed. The inability to use bookmarks with 5206 * functions like `substr` is therefore intentional to guard 5207 * against accidentally breaking the HTML. 5208 * 5209 * Because bookmarks allocate memory and require processing 5210 * for every applied update, they are limited and require 5211 * a name. They should not be created with programmatically-made 5212 * names, such as "li_{$index}" with some loop. As a general 5213 * rule they should only be created with string-literal names 5214 * like "start-of-section" or "last-paragraph". 5215 * 5216 * Bookmarks are a powerful tool to enable complicated behavior. 5217 * Consider double-checking that you need this tool if you are 5218 * reaching for it, as inappropriate use could lead to broken 5219 * HTML structure or unwanted processing overhead. 5220 * 5221 * @since 6.4.0 5222 * 5223 * @param string $bookmark_name Identifies this particular bookmark. 5224 * @return bool Whether the bookmark was successfully created. 5225 */ 5226 public function set_bookmark( $bookmark_name ): bool { 5227 return parent::set_bookmark( "_{$bookmark_name}" ); 5228 } 5229 5230 /** 5231 * Checks whether a bookmark with the given name exists. 5232 * 5233 * @since 6.5.0 5234 * 5235 * @param string $bookmark_name Name to identify a bookmark that potentially exists. 5236 * @return bool Whether that bookmark exists. 5237 */ 5238 public function has_bookmark( $bookmark_name ): bool { 5239 return parent::has_bookmark( "_{$bookmark_name}" ); 5240 } 5241 5242 /* 5243 * HTML Parsing Algorithms 5244 */ 5245 5246 /** 5247 * Closes a P element. 5248 * 5249 * @since 6.4.0 5250 * 5251 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 5252 * 5253 * @see https://html.spec.whatwg.org/#close-a-p-element 5254 */ 5255 private function close_a_p_element(): void { 5256 $this->generate_implied_end_tags( 'P' ); 5257 $this->state->stack_of_open_elements->pop_until( 'P' ); 5258 } 5259 5260 /** 5261 * Closes elements that have implied end tags. 5262 * 5263 * @since 6.4.0 5264 * @since 6.7.0 Full spec support. 5265 * 5266 * @see https://html.spec.whatwg.org/#generate-implied-end-tags 5267 * 5268 * @param string|null $except_for_this_element Perform as if this element doesn't exist in the stack of open elements. 5269 */ 5270 private function generate_implied_end_tags( ?string $except_for_this_element = null ): void { 5271 $elements_with_implied_end_tags = array( 5272 'DD', 5273 'DT', 5274 'LI', 5275 'OPTGROUP', 5276 'OPTION', 5277 'P', 5278 'RB', 5279 'RP', 5280 'RT', 5281 'RTC', 5282 ); 5283 5284 $no_exclusions = ! isset( $except_for_this_element ); 5285 5286 while ( 5287 ( $no_exclusions || ! $this->state->stack_of_open_elements->current_node_is( $except_for_this_element ) ) && 5288 in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true ) 5289 ) { 5290 $this->state->stack_of_open_elements->pop(); 5291 } 5292 } 5293 5294 /** 5295 * Closes elements that have implied end tags, thoroughly. 5296 * 5297 * See the HTML specification for an explanation why this is 5298 * different from generating end tags in the normal sense. 5299 * 5300 * @since 6.4.0 5301 * @since 6.7.0 Full spec support. 5302 * 5303 * @see WP_HTML_Processor::generate_implied_end_tags 5304 * @see https://html.spec.whatwg.org/#generate-implied-end-tags 5305 */ 5306 private function generate_implied_end_tags_thoroughly(): void { 5307 $elements_with_implied_end_tags = array( 5308 'CAPTION', 5309 'COLGROUP', 5310 'DD', 5311 'DT', 5312 'LI', 5313 'OPTGROUP', 5314 'OPTION', 5315 'P', 5316 'RB', 5317 'RP', 5318 'RT', 5319 'RTC', 5320 'TBODY', 5321 'TD', 5322 'TFOOT', 5323 'TH', 5324 'THEAD', 5325 'TR', 5326 ); 5327 5328 while ( in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true ) ) { 5329 $this->state->stack_of_open_elements->pop(); 5330 } 5331 } 5332 5333 /** 5334 * Returns the adjusted current node. 5335 * 5336 * > The adjusted current node is the context element if the parser was created as 5337 * > part of the HTML fragment parsing algorithm and the stack of open elements 5338 * > has only one element in it (fragment case); otherwise, the adjusted current 5339 * > node is the current node. 5340 * 5341 * @see https://html.spec.whatwg.org/#adjusted-current-node 5342 * 5343 * @since 6.7.0 5344 * 5345 * @return WP_HTML_Token|null The adjusted current node. 5346 */ 5347 private function get_adjusted_current_node(): ?WP_HTML_Token { 5348 if ( isset( $this->context_node ) && 1 === $this->state->stack_of_open_elements->count() ) { 5349 return $this->context_node; 5350 } 5351 5352 return $this->state->stack_of_open_elements->current_node(); 5353 } 5354 5355 /** 5356 * Reconstructs the active formatting elements. 5357 * 5358 * > This has the effect of reopening all the formatting elements that were opened 5359 * > in the current body, cell, or caption (whichever is youngest) that haven't 5360 * > been explicitly closed. 5361 * 5362 * @since 6.4.0 5363 * 5364 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 5365 * 5366 * @see https://html.spec.whatwg.org/#reconstruct-the-active-formatting-elements 5367 * 5368 * @return bool Whether any formatting elements needed to be reconstructed. 5369 */ 5370 private function reconstruct_active_formatting_elements(): bool { 5371 /* 5372 * > If there are no entries in the list of active formatting elements, then there is nothing 5373 * > to reconstruct; stop this algorithm. 5374 */ 5375 if ( 0 === $this->state->active_formatting_elements->count() ) { 5376 return false; 5377 } 5378 5379 $last_entry = $this->state->active_formatting_elements->current_node(); 5380 if ( 5381 5382 /* 5383 * > If the last (most recently added) entry in the list of active formatting elements is a marker; 5384 * > stop this algorithm. 5385 */ 5386 'marker' === $last_entry->node_name || 5387 5388 /* 5389 * > If the last (most recently added) entry in the list of active formatting elements is an 5390 * > element that is in the stack of open elements, then there is nothing to reconstruct; 5391 * > stop this algorithm. 5392 */ 5393 $this->state->stack_of_open_elements->contains_node( $last_entry ) 5394 ) { 5395 return false; 5396 } 5397 5398 $this->bail( 'Cannot reconstruct active formatting elements when advancing and rewinding is required.' ); 5399 } 5400 5401 /** 5402 * Runs the reset the insertion mode appropriately algorithm. 5403 * 5404 * @since 6.7.0 5405 * 5406 * @see https://html.spec.whatwg.org/multipage/parsing.html#reset-the-insertion-mode-appropriately 5407 */ 5408 private function reset_insertion_mode_appropriately(): void { 5409 // Set the first node. 5410 $first_node = null; 5411 foreach ( $this->state->stack_of_open_elements->walk_down() as $first_node ) { 5412 break; 5413 } 5414 5415 /* 5416 * > 1. Let _last_ be false. 5417 */ 5418 $last = false; 5419 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) { 5420 /* 5421 * > 2. Let _node_ be the last node in the stack of open elements. 5422 * > 3. _Loop_: If _node_ is the first node in the stack of open elements, then set _last_ 5423 * > to true, and, if the parser was created as part of the HTML fragment parsing 5424 * > algorithm (fragment case), set node to the context element passed to 5425 * > that algorithm. 5426 * > … 5427 */ 5428 if ( $node === $first_node ) { 5429 $last = true; 5430 if ( isset( $this->context_node ) ) { 5431 $node = $this->context_node; 5432 } 5433 } 5434 5435 // All of the following rules are for matching HTML elements. 5436 if ( 'html' !== $node->namespace ) { 5437 continue; 5438 } 5439 5440 switch ( $node->node_name ) { 5441 /* 5442 * > 4. If node is a `select` element, run these substeps: 5443 * > 1. If _last_ is true, jump to the step below labeled done. 5444 * > 2. Let _ancestor_ be _node_. 5445 * > 3. _Loop_: If _ancestor_ is the first node in the stack of open elements, 5446 * > jump to the step below labeled done. 5447 * > 4. Let ancestor be the node before ancestor in the stack of open elements. 5448 * > … 5449 * > 7. Jump back to the step labeled _loop_. 5450 * > 8. _Done_: Switch the insertion mode to "in select" and return. 5451 */ 5452 case 'SELECT': 5453 if ( ! $last ) { 5454 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $ancestor ) { 5455 if ( 'html' !== $ancestor->namespace ) { 5456 continue; 5457 } 5458 5459 switch ( $ancestor->node_name ) { 5460 /* 5461 * > 5. If _ancestor_ is a `template` node, jump to the step below 5462 * > labeled _done_. 5463 */ 5464 case 'TEMPLATE': 5465 break 2; 5466 5467 /* 5468 * > 6. If _ancestor_ is a `table` node, switch the insertion mode to 5469 * > "in select in table" and return. 5470 */ 5471 case 'TABLE': 5472 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE; 5473 return; 5474 } 5475 } 5476 } 5477 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT; 5478 return; 5479 5480 /* 5481 * > 5. If _node_ is a `td` or `th` element and _last_ is false, then switch the 5482 * > insertion mode to "in cell" and return. 5483 */ 5484 case 'TD': 5485 case 'TH': 5486 if ( ! $last ) { 5487 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL; 5488 return; 5489 } 5490 break; 5491 5492 /* 5493 * > 6. If _node_ is a `tr` element, then switch the insertion mode to "in row" 5494 * > and return. 5495 */ 5496 case 'TR': 5497 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 5498 return; 5499 5500 /* 5501 * > 7. If _node_ is a `tbody`, `thead`, or `tfoot` element, then switch the 5502 * > insertion mode to "in table body" and return. 5503 */ 5504 case 'TBODY': 5505 case 'THEAD': 5506 case 'TFOOT': 5507 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 5508 return; 5509 5510 /* 5511 * > 8. If _node_ is a `caption` element, then switch the insertion mode to 5512 * > "in caption" and return. 5513 */ 5514 case 'CAPTION': 5515 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION; 5516 return; 5517 5518 /* 5519 * > 9. If _node_ is a `colgroup` element, then switch the insertion mode to 5520 * > "in column group" and return. 5521 */ 5522 case 'COLGROUP': 5523 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 5524 return; 5525 5526 /* 5527 * > 10. If _node_ is a `table` element, then switch the insertion mode to 5528 * > "in table" and return. 5529 */ 5530 case 'TABLE': 5531 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 5532 return; 5533 5534 /* 5535 * > 11. If _node_ is a `template` element, then switch the insertion mode to the 5536 * > current template insertion mode and return. 5537 */ 5538 case 'TEMPLATE': 5539 $this->state->insertion_mode = end( $this->state->stack_of_template_insertion_modes ); 5540 return; 5541 5542 /* 5543 * > 12. If _node_ is a `head` element and _last_ is false, then switch the 5544 * > insertion mode to "in head" and return. 5545 */ 5546 case 'HEAD': 5547 if ( ! $last ) { 5548 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 5549 return; 5550 } 5551 break; 5552 5553 /* 5554 * > 13. If _node_ is a `body` element, then switch the insertion mode to "in body" 5555 * > and return. 5556 */