[ Index ] |
PHP Cross Reference of WordPress Trunk (Updated Daily) |
[Summary view] [Print] [Text view]
1 <?php 2 /** 3 * HTML API: WP_HTML_Processor class 4 * 5 * @package WordPress 6 * @subpackage HTML-API 7 * @since 6.4.0 8 */ 9 10 /** 11 * Core class used to safely parse and modify an HTML document. 12 * 13 * The HTML Processor class properly parses and modifies HTML5 documents. 14 * 15 * It supports a subset of the HTML5 specification, and when it encounters 16 * unsupported markup, it aborts early to avoid unintentionally breaking 17 * the document. The HTML Processor should never break an HTML document. 18 * 19 * While the `WP_HTML_Tag_Processor` is a valuable tool for modifying 20 * attributes on individual HTML tags, the HTML Processor is more capable 21 * and useful for the following operations: 22 * 23 * - Querying based on nested HTML structure. 24 * 25 * Eventually the HTML Processor will also support: 26 * - Wrapping a tag in surrounding HTML. 27 * - Unwrapping a tag by removing its parent. 28 * - Inserting and removing nodes. 29 * - Reading and changing inner content. 30 * - Navigating up or around HTML structure. 31 * 32 * ## Usage 33 * 34 * Use of this class requires three steps: 35 * 36 * 1. Call a static creator method with your input HTML document. 37 * 2. Find the location in the document you are looking for. 38 * 3. Request changes to the document at that location. 39 * 40 * Example: 41 * 42 * $processor = WP_HTML_Processor::create_fragment( $html ); 43 * if ( $processor->next_tag( array( 'breadcrumbs' => array( 'DIV', 'FIGURE', 'IMG' ) ) ) ) { 44 * $processor->add_class( 'responsive-image' ); 45 * } 46 * 47 * #### Breadcrumbs 48 * 49 * Breadcrumbs represent the stack of open elements from the root 50 * of the document or fragment down to the currently-matched node, 51 * if one is currently selected. Call WP_HTML_Processor::get_breadcrumbs() 52 * to inspect the breadcrumbs for a matched tag. 53 * 54 * Breadcrumbs can specify nested HTML structure and are equivalent 55 * to a CSS selector comprising tag names separated by the child 56 * combinator, such as "DIV > FIGURE > IMG". 57 * 58 * Since all elements find themselves inside a full HTML document 59 * when parsed, the return value from `get_breadcrumbs()` will always 60 * contain any implicit outermost elements. For example, when parsing 61 * with `create_fragment()` in the `BODY` context (the default), any 62 * tag in the given HTML document will contain `array( 'HTML', 'BODY', … )` 63 * in its breadcrumbs. 64 * 65 * Despite containing the implied outermost elements in their breadcrumbs, 66 * tags may be found with the shortest-matching breadcrumb query. That is, 67 * `array( 'IMG' )` matches all IMG elements and `array( 'P', 'IMG' )` 68 * matches all IMG elements directly inside a P element. To ensure that no 69 * partial matches erroneously match it's possible to specify in a query 70 * the full breadcrumb match all the way down from the root HTML element. 71 * 72 * Example: 73 * 74 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>'; 75 * // ----- Matches here. 76 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) ); 77 * 78 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>'; 79 * // ---- Matches here. 80 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'FIGCAPTION', 'EM' ) ) ); 81 * 82 * $html = '<div><img></div><img>'; 83 * // ----- Matches here, because IMG must be a direct child of the implicit BODY. 84 * $processor->next_tag( array( 'breadcrumbs' => array( 'BODY', 'IMG' ) ) ); 85 * 86 * ## HTML Support 87 * 88 * This class implements a small part of the HTML5 specification. 89 * It's designed to operate within its support and abort early whenever 90 * encountering circumstances it can't properly handle. This is 91 * the principle way in which this class remains as simple as possible 92 * without cutting corners and breaking compliance. 93 * 94 * ### Supported elements 95 * 96 * If any unsupported element appears in the HTML input the HTML Processor 97 * will abort early and stop all processing. This draconian measure ensures 98 * that the HTML Processor won't break any HTML it doesn't fully understand. 99 * 100 * The HTML Processor supports all elements other than a specific set: 101 * 102 * - Any element inside a TABLE. 103 * - Any element inside foreign content, including SVG and MATH. 104 * - Any element outside the IN BODY insertion mode, e.g. doctype declarations, meta, links. 105 * 106 * ### Supported markup 107 * 108 * Some kinds of non-normative HTML involve reconstruction of formatting elements and 109 * re-parenting of mis-nested elements. For example, a DIV tag found inside a TABLE 110 * may in fact belong _before_ the table in the DOM. If the HTML Processor encounters 111 * such a case it will stop processing. 112 * 113 * The following list illustrates some common examples of unexpected HTML inputs that 114 * the HTML Processor properly parses and represents: 115 * 116 * - HTML with optional tags omitted, e.g. `<p>one<p>two`. 117 * - HTML with unexpected tag closers, e.g. `<p>one </span> more</p>`. 118 * - Non-void tags with self-closing flag, e.g. `<div/>the DIV is still open.</div>`. 119 * - Heading elements which close open heading elements of another level, e.g. `<h1>Closed by </h2>`. 120 * - Elements containing text that looks like other tags but isn't, e.g. `<title>The <img> is plaintext</title>`. 121 * - SCRIPT and STYLE tags containing text that looks like HTML but isn't, e.g. `<script>document.write('<p>Hi</p>');</script>`. 122 * - SCRIPT content which has been escaped, e.g. `<script><!-- document.write('<script>console.log("hi")</script>') --></script>`. 123 * 124 * ### Unsupported Features 125 * 126 * This parser does not report parse errors. 127 * 128 * Normally, when additional HTML or BODY tags are encountered in a document, if there 129 * are any additional attributes on them that aren't found on the previous elements, 130 * the existing HTML and BODY elements adopt those missing attribute values. This 131 * parser does not add those additional attributes. 132 * 133 * In certain situations, elements are moved to a different part of the document in 134 * a process called "adoption" and "fostering." Because the nodes move to a location 135 * in the document that the parser had already processed, this parser does not support 136 * these situations and will bail. 137 * 138 * @since 6.4.0 139 * 140 * @see WP_HTML_Tag_Processor 141 * @see https://html.spec.whatwg.org/ 142 */ 143 class WP_HTML_Processor extends WP_HTML_Tag_Processor { 144 /** 145 * The maximum number of bookmarks allowed to exist at any given time. 146 * 147 * HTML processing requires more bookmarks than basic tag processing, 148 * so this class constant from the Tag Processor is overwritten. 149 * 150 * @since 6.4.0 151 * 152 * @var int 153 */ 154 const MAX_BOOKMARKS = 100; 155 156 /** 157 * Holds the working state of the parser, including the stack of 158 * open elements and the stack of active formatting elements. 159 * 160 * Initialized in the constructor. 161 * 162 * @since 6.4.0 163 * 164 * @var WP_HTML_Processor_State 165 */ 166 private $state; 167 168 /** 169 * Used to create unique bookmark names. 170 * 171 * This class sets a bookmark for every tag in the HTML document that it encounters. 172 * The bookmark name is auto-generated and increments, starting with `1`. These are 173 * internal bookmarks and are automatically released when the referring WP_HTML_Token 174 * goes out of scope and is garbage-collected. 175 * 176 * @since 6.4.0 177 * 178 * @see WP_HTML_Processor::$release_internal_bookmark_on_destruct 179 * 180 * @var int 181 */ 182 private $bookmark_counter = 0; 183 184 /** 185 * Stores an explanation for why something failed, if it did. 186 * 187 * @see self::get_last_error 188 * 189 * @since 6.4.0 190 * 191 * @var string|null 192 */ 193 private $last_error = null; 194 195 /** 196 * Stores context for why the parser bailed on unsupported HTML, if it did. 197 * 198 * @see self::get_unsupported_exception 199 * 200 * @since 6.7.0 201 * 202 * @var WP_HTML_Unsupported_Exception|null 203 */ 204 private $unsupported_exception = null; 205 206 /** 207 * Releases a bookmark when PHP garbage-collects its wrapping WP_HTML_Token instance. 208 * 209 * This function is created inside the class constructor so that it can be passed to 210 * the stack of open elements and the stack of active formatting elements without 211 * exposing it as a public method on the class. 212 * 213 * @since 6.4.0 214 * 215 * @var Closure|null 216 */ 217 private $release_internal_bookmark_on_destruct = null; 218 219 /** 220 * Stores stack events which arise during parsing of the 221 * HTML document, which will then supply the "match" events. 222 * 223 * @since 6.6.0 224 * 225 * @var WP_HTML_Stack_Event[] 226 */ 227 private $element_queue = array(); 228 229 /** 230 * Stores the current breadcrumbs. 231 * 232 * @since 6.7.0 233 * 234 * @var string[] 235 */ 236 private $breadcrumbs = array(); 237 238 /** 239 * Current stack event, if set, representing a matched token. 240 * 241 * Because the parser may internally point to a place further along in a document 242 * than the nodes which have already been processed (some "virtual" nodes may have 243 * appeared while scanning the HTML document), this will point at the "current" node 244 * being processed. It comes from the front of the element queue. 245 * 246 * @since 6.6.0 247 * 248 * @var WP_HTML_Stack_Event|null 249 */ 250 private $current_element = null; 251 252 /** 253 * Context node if created as a fragment parser. 254 * 255 * @var WP_HTML_Token|null 256 */ 257 private $context_node = null; 258 259 /* 260 * Public Interface Functions 261 */ 262 263 /** 264 * Creates an HTML processor in the fragment parsing mode. 265 * 266 * Use this for cases where you are processing chunks of HTML that 267 * will be found within a bigger HTML document, such as rendered 268 * block output that exists within a post, `the_content` inside a 269 * rendered site layout. 270 * 271 * Fragment parsing occurs within a context, which is an HTML element 272 * that the document will eventually be placed in. It becomes important 273 * when special elements have different rules than others, such as inside 274 * a TEXTAREA or a TITLE tag where things that look like tags are text, 275 * or inside a SCRIPT tag where things that look like HTML syntax are JS. 276 * 277 * The context value should be a representation of the tag into which the 278 * HTML is found. For most cases this will be the body element. The HTML 279 * form is provided because a context element may have attributes that 280 * impact the parse, such as with a SCRIPT tag and its `type` attribute. 281 * 282 * ## Current HTML Support 283 * 284 * - The only supported context is `<body>`, which is the default value. 285 * - The only supported document encoding is `UTF-8`, which is the default value. 286 * 287 * @since 6.4.0 288 * @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances. 289 * 290 * @param string $html Input HTML fragment to process. 291 * @param string $context Context element for the fragment, must be default of `<body>`. 292 * @param string $encoding Text encoding of the document; must be default of 'UTF-8'. 293 * @return static|null The created processor if successful, otherwise null. 294 */ 295 public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) { 296 if ( '<body>' !== $context || 'UTF-8' !== $encoding ) { 297 return null; 298 } 299 300 $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE ); 301 $processor->state->context_node = array( 'BODY', array() ); 302 $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 303 $processor->state->encoding = $encoding; 304 $processor->state->encoding_confidence = 'certain'; 305 306 // @todo Create "fake" bookmarks for non-existent but implied nodes. 307 $processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 ); 308 $processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 ); 309 310 $root_node = new WP_HTML_Token( 311 'root-node', 312 'HTML', 313 false 314 ); 315 316 $processor->state->stack_of_open_elements->push( $root_node ); 317 318 $context_node = new WP_HTML_Token( 319 'context-node', 320 $processor->state->context_node[0], 321 false 322 ); 323 324 $processor->context_node = $context_node; 325 $processor->breadcrumbs = array( 'HTML', $context_node->node_name ); 326 327 return $processor; 328 } 329 330 /** 331 * Creates an HTML processor in the full parsing mode. 332 * 333 * It's likely that a fragment parser is more appropriate, unless sending an 334 * entire HTML document from start to finish. Consider a fragment parser with 335 * a context node of `<body>`. 336 * 337 * Since UTF-8 is the only currently-accepted charset, if working with a 338 * document that isn't UTF-8, it's important to convert the document before 339 * creating the processor: pass in the converted HTML. 340 * 341 * @param string $html Input HTML document to process. 342 * @param string|null $known_definite_encoding Optional. If provided, specifies the charset used 343 * in the input byte stream. Currently must be UTF-8. 344 * @return static|null The created processor if successful, otherwise null. 345 */ 346 public static function create_full_parser( $html, $known_definite_encoding = 'UTF-8' ) { 347 if ( 'UTF-8' !== $known_definite_encoding ) { 348 return null; 349 } 350 351 $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE ); 352 $processor->state->encoding = $known_definite_encoding; 353 $processor->state->encoding_confidence = 'certain'; 354 355 return $processor; 356 } 357 358 /** 359 * Constructor. 360 * 361 * Do not use this method. Use the static creator methods instead. 362 * 363 * @access private 364 * 365 * @since 6.4.0 366 * 367 * @see WP_HTML_Processor::create_fragment() 368 * 369 * @param string $html HTML to process. 370 * @param string|null $use_the_static_create_methods_instead This constructor should not be called manually. 371 */ 372 public function __construct( $html, $use_the_static_create_methods_instead = null ) { 373 parent::__construct( $html ); 374 375 if ( self::CONSTRUCTOR_UNLOCK_CODE !== $use_the_static_create_methods_instead ) { 376 _doing_it_wrong( 377 __METHOD__, 378 sprintf( 379 /* translators: %s: WP_HTML_Processor::create_fragment(). */ 380 __( 'Call %s to create an HTML Processor instead of calling the constructor directly.' ), 381 '<code>WP_HTML_Processor::create_fragment()</code>' 382 ), 383 '6.4.0' 384 ); 385 } 386 387 $this->state = new WP_HTML_Processor_State(); 388 389 $this->state->stack_of_open_elements->set_push_handler( 390 function ( WP_HTML_Token $token ): void { 391 $is_virtual = ! isset( $this->state->current_token ) || $this->is_tag_closer(); 392 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name; 393 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real'; 394 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::PUSH, $provenance ); 395 396 $this->change_parsing_namespace( $token->integration_node_type ? 'html' : $token->namespace ); 397 } 398 ); 399 400 $this->state->stack_of_open_elements->set_pop_handler( 401 function ( WP_HTML_Token $token ): void { 402 $is_virtual = ! isset( $this->state->current_token ) || ! $this->is_tag_closer(); 403 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name; 404 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real'; 405 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::POP, $provenance ); 406 407 $adjusted_current_node = $this->get_adjusted_current_node(); 408 409 if ( $adjusted_current_node ) { 410 $this->change_parsing_namespace( $adjusted_current_node->integration_node_type ? 'html' : $adjusted_current_node->namespace ); 411 } else { 412 $this->change_parsing_namespace( 'html' ); 413 } 414 } 415 ); 416 417 /* 418 * Create this wrapper so that it's possible to pass 419 * a private method into WP_HTML_Token classes without 420 * exposing it to any public API. 421 */ 422 $this->release_internal_bookmark_on_destruct = function ( string $name ): void { 423 parent::release_bookmark( $name ); 424 }; 425 } 426 427 /** 428 * Creates a fragment processor at the current node. 429 * 430 * HTML Fragment parsing always happens with a context node. HTML Fragment Processors can be 431 * instantiated with a `BODY` context node via `WP_HTML_Processor::create_fragment( $html )`. 432 * 433 * The context node may impact how a fragment of HTML is parsed. For example, consider the HTML 434 * fragment `<td />Inside TD?</td>`. 435 * 436 * A BODY context node will produce the following tree: 437 * 438 * └─#text Inside TD? 439 * 440 * Notice that the `<td>` tags are completely ignored. 441 * 442 * Compare that with an SVG context node that produces the following tree: 443 * 444 * ├─svg:td 445 * └─#text Inside TD? 446 * 447 * Here, a `td` node in the `svg` namespace is created, and its self-closing flag is respected. 448 * This is a peculiarity of parsing HTML in foreign content like SVG. 449 * 450 * Finally, consider the tree produced with a TABLE context node: 451 * 452 * └─TBODY 453 * └─TR 454 * └─TD 455 * └─#text Inside TD? 456 * 457 * These examples demonstrate how important the context node may be when processing an HTML 458 * fragment. Special care must be taken when processing fragments that are expected to appear 459 * in specific contexts. SVG and TABLE are good examples, but there are others. 460 * 461 * @see https://html.spec.whatwg.org/multipage/parsing.html#html-fragment-parsing-algorithm 462 * 463 * @param string $html Input HTML fragment to process. 464 * @return static|null The created processor if successful, otherwise null. 465 */ 466 public function create_fragment_at_current_node( string $html ) { 467 if ( $this->get_token_type() !== '#tag' || $this->is_tag_closer() ) { 468 return null; 469 } 470 471 $namespace = $this->current_element->token->namespace; 472 473 /* 474 * Prevent creating fragments at nodes that require a special tokenizer state. 475 * This is unsupported by the HTML Processor. 476 */ 477 if ( 478 'html' === $namespace && 479 in_array( $this->current_element->token->node_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP', 'PLAINTEXT' ), true ) 480 ) { 481 return null; 482 } 483 484 $fragment_processor = static::create_fragment( $html ); 485 if ( null === $fragment_processor ) { 486 return null; 487 } 488 489 $fragment_processor->compat_mode = $this->compat_mode; 490 491 $fragment_processor->context_node = clone $this->state->current_token; 492 $fragment_processor->context_node->bookmark_name = 'context-node'; 493 $fragment_processor->context_node->on_destroy = null; 494 495 $fragment_processor->state->context_node = array( $fragment_processor->context_node->node_name, array() ); 496 497 $attribute_names = $this->get_attribute_names_with_prefix( '' ); 498 if ( null !== $attribute_names ) { 499 foreach ( $attribute_names as $name ) { 500 $fragment_processor->state->context_node[1][ $name ] = $this->get_attribute( $name ); 501 } 502 } 503 504 $fragment_processor->breadcrumbs = array( 'HTML', $fragment_processor->context_node->node_name ); 505 506 if ( 'TEMPLATE' === $fragment_processor->context_node->node_name ) { 507 $fragment_processor->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE; 508 } 509 510 $fragment_processor->reset_insertion_mode_appropriately(); 511 512 /* 513 * > Set the parser's form element pointer to the nearest node to the context element that 514 * > is a form element (going straight up the ancestor chain, and including the element 515 * > itself, if it is a form element), if any. (If there is no such form element, the 516 * > form element pointer keeps its initial value, null.) 517 */ 518 foreach ( $this->state->stack_of_open_elements->walk_up() as $element ) { 519 if ( 'FORM' === $element->node_name && 'html' === $element->namespace ) { 520 $fragment_processor->state->form_element = clone $element; 521 $fragment_processor->state->form_element->bookmark_name = null; 522 $fragment_processor->state->form_element->on_destroy = null; 523 break; 524 } 525 } 526 527 $fragment_processor->state->encoding_confidence = 'irrelevant'; 528 529 /* 530 * Update the parsing namespace near the end of the process. 531 * This is important so that any push/pop from the stack of open 532 * elements does not change the parsing namespace. 533 */ 534 $fragment_processor->change_parsing_namespace( 535 $this->current_element->token->integration_node_type ? 'html' : $namespace 536 ); 537 538 return $fragment_processor; 539 } 540 541 /** 542 * Stops the parser and terminates its execution when encountering unsupported markup. 543 * 544 * @throws WP_HTML_Unsupported_Exception Halts execution of the parser. 545 * 546 * @since 6.7.0 547 * 548 * @param string $message Explains support is missing in order to parse the current node. 549 */ 550 private function bail( string $message ) { 551 $here = $this->bookmarks[ $this->state->current_token->bookmark_name ]; 552 $token = substr( $this->html, $here->start, $here->length ); 553 554 $open_elements = array(); 555 foreach ( $this->state->stack_of_open_elements->stack as $item ) { 556 $open_elements[] = $item->node_name; 557 } 558 559 $active_formats = array(); 560 foreach ( $this->state->active_formatting_elements->walk_down() as $item ) { 561 $active_formats[] = $item->node_name; 562 } 563 564 $this->last_error = self::ERROR_UNSUPPORTED; 565 566 $this->unsupported_exception = new WP_HTML_Unsupported_Exception( 567 $message, 568 $this->state->current_token->node_name, 569 $here->start, 570 $token, 571 $open_elements, 572 $active_formats 573 ); 574 575 throw $this->unsupported_exception; 576 } 577 578 /** 579 * Returns the last error, if any. 580 * 581 * Various situations lead to parsing failure but this class will 582 * return `false` in all those cases. To determine why something 583 * failed it's possible to request the last error. This can be 584 * helpful to know to distinguish whether a given tag couldn't 585 * be found or if content in the document caused the processor 586 * to give up and abort processing. 587 * 588 * Example 589 * 590 * $processor = WP_HTML_Processor::create_fragment( '<template><strong><button><em><p><em>' ); 591 * false === $processor->next_tag(); 592 * WP_HTML_Processor::ERROR_UNSUPPORTED === $processor->get_last_error(); 593 * 594 * @since 6.4.0 595 * 596 * @see self::ERROR_UNSUPPORTED 597 * @see self::ERROR_EXCEEDED_MAX_BOOKMARKS 598 * 599 * @return string|null The last error, if one exists, otherwise null. 600 */ 601 public function get_last_error(): ?string { 602 return $this->last_error; 603 } 604 605 /** 606 * Returns context for why the parser aborted due to unsupported HTML, if it did. 607 * 608 * This is meant for debugging purposes, not for production use. 609 * 610 * @since 6.7.0 611 * 612 * @see self::$unsupported_exception 613 * 614 * @return WP_HTML_Unsupported_Exception|null 615 */ 616 public function get_unsupported_exception() { 617 return $this->unsupported_exception; 618 } 619 620 /** 621 * Finds the next tag matching the $query. 622 * 623 * @todo Support matching the class name and tag name. 624 * 625 * @since 6.4.0 626 * @since 6.6.0 Visits all tokens, including virtual ones. 627 * 628 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. 629 * 630 * @param array|string|null $query { 631 * Optional. Which tag name to find, having which class, etc. Default is to find any tag. 632 * 633 * @type string|null $tag_name Which tag to find, or `null` for "any tag." 634 * @type string $tag_closers 'visit' to pause at tag closers, 'skip' or unset to only visit openers. 635 * @type int|null $match_offset Find the Nth tag matching all search criteria. 636 * 1 for "first" tag, 3 for "third," etc. 637 * Defaults to first tag. 638 * @type string|null $class_name Tag must contain this whole class name to match. 639 * @type string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`. 640 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`. 641 * } 642 * @return bool Whether a tag was matched. 643 */ 644 public function next_tag( $query = null ): bool { 645 $visit_closers = isset( $query['tag_closers'] ) && 'visit' === $query['tag_closers']; 646 647 if ( null === $query ) { 648 while ( $this->next_token() ) { 649 if ( '#tag' !== $this->get_token_type() ) { 650 continue; 651 } 652 653 if ( ! $this->is_tag_closer() || $visit_closers ) { 654 return true; 655 } 656 } 657 658 return false; 659 } 660 661 if ( is_string( $query ) ) { 662 $query = array( 'breadcrumbs' => array( $query ) ); 663 } 664 665 if ( ! is_array( $query ) ) { 666 _doing_it_wrong( 667 __METHOD__, 668 __( 'Please pass a query array to this function.' ), 669 '6.4.0' 670 ); 671 return false; 672 } 673 674 if ( isset( $query['tag_name'] ) ) { 675 $query['tag_name'] = strtoupper( $query['tag_name'] ); 676 } 677 678 $needs_class = ( isset( $query['class_name'] ) && is_string( $query['class_name'] ) ) 679 ? $query['class_name'] 680 : null; 681 682 if ( ! ( array_key_exists( 'breadcrumbs', $query ) && is_array( $query['breadcrumbs'] ) ) ) { 683 while ( $this->next_token() ) { 684 if ( '#tag' !== $this->get_token_type() ) { 685 continue; 686 } 687 688 if ( isset( $query['tag_name'] ) && $query['tag_name'] !== $this->get_token_name() ) { 689 continue; 690 } 691 692 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) { 693 continue; 694 } 695 696 if ( ! $this->is_tag_closer() || $visit_closers ) { 697 return true; 698 } 699 } 700 701 return false; 702 } 703 704 $breadcrumbs = $query['breadcrumbs']; 705 $match_offset = isset( $query['match_offset'] ) ? (int) $query['match_offset'] : 1; 706 707 while ( $match_offset > 0 && $this->next_token() ) { 708 if ( '#tag' !== $this->get_token_type() || $this->is_tag_closer() ) { 709 continue; 710 } 711 712 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) { 713 continue; 714 } 715 716 if ( $this->matches_breadcrumbs( $breadcrumbs ) && 0 === --$match_offset ) { 717 return true; 718 } 719 } 720 721 return false; 722 } 723 724 /** 725 * Finds the next token in the HTML document. 726 * 727 * This doesn't currently have a way to represent non-tags and doesn't process 728 * semantic rules for text nodes. For access to the raw tokens consider using 729 * WP_HTML_Tag_Processor instead. 730 * 731 * @since 6.5.0 Added for internal support; do not use. 732 * @since 6.7.1 Refactored so subclasses may extend. 733 * 734 * @return bool Whether a token was parsed. 735 */ 736 public function next_token(): bool { 737 return $this->next_visitable_token(); 738 } 739 740 /** 741 * Ensures internal accounting is maintained for HTML semantic rules while 742 * the underlying Tag Processor class is seeking to a bookmark. 743 * 744 * This doesn't currently have a way to represent non-tags and doesn't process 745 * semantic rules for text nodes. For access to the raw tokens consider using 746 * WP_HTML_Tag_Processor instead. 747 * 748 * Note that this method may call itself recursively. This is why it is not 749 * implemented as {@see WP_HTML_Processor::next_token()}, which instead calls 750 * this method similarly to how {@see WP_HTML_Tag_Processor::next_token()} 751 * calls the {@see WP_HTML_Tag_Processor::base_class_next_token()} method. 752 * 753 * @since 6.7.1 Added for internal support. 754 * 755 * @access private 756 * 757 * @return bool 758 */ 759 private function next_visitable_token(): bool { 760 $this->current_element = null; 761 762 if ( isset( $this->last_error ) ) { 763 return false; 764 } 765 766 /* 767 * Prime the events if there are none. 768 * 769 * @todo In some cases, probably related to the adoption agency 770 * algorithm, this call to step() doesn't create any new 771 * events. Calling it again creates them. Figure out why 772 * this is and if it's inherent or if it's a bug. Looping 773 * until there are events or until there are no more 774 * tokens works in the meantime and isn't obviously wrong. 775 */ 776 if ( empty( $this->element_queue ) && $this->step() ) { 777 return $this->next_visitable_token(); 778 } 779 780 // Process the next event on the queue. 781 $this->current_element = array_shift( $this->element_queue ); 782 if ( ! isset( $this->current_element ) ) { 783 // There are no tokens left, so close all remaining open elements. 784 while ( $this->state->stack_of_open_elements->pop() ) { 785 continue; 786 } 787 788 return empty( $this->element_queue ) ? false : $this->next_visitable_token(); 789 } 790 791 $is_pop = WP_HTML_Stack_Event::POP === $this->current_element->operation; 792 793 /* 794 * The root node only exists in the fragment parser, and closing it 795 * indicates that the parse is complete. Stop before popping it from 796 * the breadcrumbs. 797 */ 798 if ( 'root-node' === $this->current_element->token->bookmark_name ) { 799 return $this->next_visitable_token(); 800 } 801 802 // Adjust the breadcrumbs for this event. 803 if ( $is_pop ) { 804 array_pop( $this->breadcrumbs ); 805 } else { 806 $this->breadcrumbs[] = $this->current_element->token->node_name; 807 } 808 809 // Avoid sending close events for elements which don't expect a closing. 810 if ( $is_pop && ! $this->expects_closer( $this->current_element->token ) ) { 811 return $this->next_visitable_token(); 812 } 813 814 return true; 815 } 816 817 /** 818 * Indicates if the current tag token is a tag closer. 819 * 820 * Example: 821 * 822 * $p = WP_HTML_Processor::create_fragment( '<div></div>' ); 823 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); 824 * $p->is_tag_closer() === false; 825 * 826 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); 827 * $p->is_tag_closer() === true; 828 * 829 * @since 6.6.0 Subclassed for HTML Processor. 830 * 831 * @return bool Whether the current tag is a tag closer. 832 */ 833 public function is_tag_closer(): bool { 834 return $this->is_virtual() 835 ? ( WP_HTML_Stack_Event::POP === $this->current_element->operation && '#tag' === $this->get_token_type() ) 836 : parent::is_tag_closer(); 837 } 838 839 /** 840 * Indicates if the currently-matched token is virtual, created by a stack operation 841 * while processing HTML, rather than a token found in the HTML text itself. 842 * 843 * @since 6.6.0 844 * 845 * @return bool Whether the current token is virtual. 846 */ 847 private function is_virtual(): bool { 848 return ( 849 isset( $this->current_element->provenance ) && 850 'virtual' === $this->current_element->provenance 851 ); 852 } 853 854 /** 855 * Indicates if the currently-matched tag matches the given breadcrumbs. 856 * 857 * A "*" represents a single tag wildcard, where any tag matches, but not no tags. 858 * 859 * At some point this function _may_ support a `**` syntax for matching any number 860 * of unspecified tags in the breadcrumb stack. This has been intentionally left 861 * out, however, to keep this function simple and to avoid introducing backtracking, 862 * which could open up surprising performance breakdowns. 863 * 864 * Example: 865 * 866 * $processor = WP_HTML_Processor::create_fragment( '<div><span><figure><img></figure></span></div>' ); 867 * $processor->next_tag( 'img' ); 868 * true === $processor->matches_breadcrumbs( array( 'figure', 'img' ) ); 869 * true === $processor->matches_breadcrumbs( array( 'span', 'figure', 'img' ) ); 870 * false === $processor->matches_breadcrumbs( array( 'span', 'img' ) ); 871 * true === $processor->matches_breadcrumbs( array( 'span', '*', 'img' ) ); 872 * 873 * @since 6.4.0 874 * 875 * @param string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`. 876 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`. 877 * @return bool Whether the currently-matched tag is found at the given nested structure. 878 */ 879 public function matches_breadcrumbs( $breadcrumbs ): bool { 880 // Everything matches when there are zero constraints. 881 if ( 0 === count( $breadcrumbs ) ) { 882 return true; 883 } 884 885 // Start at the last crumb. 886 $crumb = end( $breadcrumbs ); 887 888 if ( '*' !== $crumb && $this->get_tag() !== strtoupper( $crumb ) ) { 889 return false; 890 } 891 892 for ( $i = count( $this->breadcrumbs ) - 1; $i >= 0; $i-- ) { 893 $node = $this->breadcrumbs[ $i ]; 894 $crumb = strtoupper( current( $breadcrumbs ) ); 895 896 if ( '*' !== $crumb && $node !== $crumb ) { 897 return false; 898 } 899 900 if ( false === prev( $breadcrumbs ) ) { 901 return true; 902 } 903 } 904 905 return false; 906 } 907 908 /** 909 * Indicates if the currently-matched node expects a closing 910 * token, or if it will self-close on the next step. 911 * 912 * Most HTML elements expect a closer, such as a P element or 913 * a DIV element. Others, like an IMG element are void and don't 914 * have a closing tag. Special elements, such as SCRIPT and STYLE, 915 * are treated just like void tags. Text nodes and self-closing 916 * foreign content will also act just like a void tag, immediately 917 * closing as soon as the processor advances to the next token. 918 * 919 * @since 6.6.0 920 * 921 * @param WP_HTML_Token|null $node Optional. Node to examine, if provided. 922 * Default is to examine current node. 923 * @return bool|null Whether to expect a closer for the currently-matched node, 924 * or `null` if not matched on any token. 925 */ 926 public function expects_closer( ?WP_HTML_Token $node = null ): ?bool { 927 $token_name = $node->node_name ?? $this->get_token_name(); 928 929 if ( ! isset( $token_name ) ) { 930 return null; 931 } 932 933 $token_namespace = $node->namespace ?? $this->get_namespace(); 934 $token_has_self_closing = $node->has_self_closing_flag ?? $this->has_self_closing_flag(); 935 936 return ! ( 937 // Comments, text nodes, and other atomic tokens. 938 '#' === $token_name[0] || 939 // Doctype declarations. 940 'html' === $token_name || 941 // Void elements. 942 ( 'html' === $token_namespace && self::is_void( $token_name ) ) || 943 // Special atomic elements. 944 ( 'html' === $token_namespace && in_array( $token_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) ) || 945 // Self-closing elements in foreign content. 946 ( 'html' !== $token_namespace && $token_has_self_closing ) 947 ); 948 } 949 950 /** 951 * Steps through the HTML document and stop at the next tag, if any. 952 * 953 * @since 6.4.0 954 * 955 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. 956 * 957 * @see self::PROCESS_NEXT_NODE 958 * @see self::REPROCESS_CURRENT_NODE 959 * 960 * @param string $node_to_process Whether to parse the next node or reprocess the current node. 961 * @return bool Whether a tag was matched. 962 */ 963 public function step( $node_to_process = self::PROCESS_NEXT_NODE ): bool { 964 // Refuse to proceed if there was a previous error. 965 if ( null !== $this->last_error ) { 966 return false; 967 } 968 969 if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) { 970 /* 971 * Void elements still hop onto the stack of open elements even though 972 * there's no corresponding closing tag. This is important for managing 973 * stack-based operations such as "navigate to parent node" or checking 974 * on an element's breadcrumbs. 975 * 976 * When moving on to the next node, therefore, if the bottom-most element 977 * on the stack is a void element, it must be closed. 978 */ 979 $top_node = $this->state->stack_of_open_elements->current_node(); 980 if ( isset( $top_node ) && ! $this->expects_closer( $top_node ) ) { 981 $this->state->stack_of_open_elements->pop(); 982 } 983 } 984 985 if ( self::PROCESS_NEXT_NODE === $node_to_process ) { 986 parent::next_token(); 987 if ( WP_HTML_Tag_Processor::STATE_TEXT_NODE === $this->parser_state ) { 988 parent::subdivide_text_appropriately(); 989 } 990 } 991 992 // Finish stepping when there are no more tokens in the document. 993 if ( 994 WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state || 995 WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state 996 ) { 997 return false; 998 } 999 1000 $adjusted_current_node = $this->get_adjusted_current_node(); 1001 $is_closer = $this->is_tag_closer(); 1002 $is_start_tag = WP_HTML_Tag_Processor::STATE_MATCHED_TAG === $this->parser_state && ! $is_closer; 1003 $token_name = $this->get_token_name(); 1004 1005 if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) { 1006 $this->state->current_token = new WP_HTML_Token( 1007 $this->bookmark_token(), 1008 $token_name, 1009 $this->has_self_closing_flag(), 1010 $this->release_internal_bookmark_on_destruct 1011 ); 1012 } 1013 1014 $parse_in_current_insertion_mode = ( 1015 0 === $this->state->stack_of_open_elements->count() || 1016 'html' === $adjusted_current_node->namespace || 1017 ( 1018 'math' === $adjusted_current_node->integration_node_type && 1019 ( 1020 ( $is_start_tag && ! in_array( $token_name, array( 'MGLYPH', 'MALIGNMARK' ), true ) ) || 1021 '#text' === $token_name 1022 ) 1023 ) || 1024 ( 1025 'math' === $adjusted_current_node->namespace && 1026 'ANNOTATION-XML' === $adjusted_current_node->node_name && 1027 $is_start_tag && 'SVG' === $token_name 1028 ) || 1029 ( 1030 'html' === $adjusted_current_node->integration_node_type && 1031 ( $is_start_tag || '#text' === $token_name ) 1032 ) 1033 ); 1034 1035 try { 1036 if ( ! $parse_in_current_insertion_mode ) { 1037 return $this->step_in_foreign_content(); 1038 } 1039 1040 switch ( $this->state->insertion_mode ) { 1041 case WP_HTML_Processor_State::INSERTION_MODE_INITIAL: 1042 return $this->step_initial(); 1043 1044 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML: 1045 return $this->step_before_html(); 1046 1047 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD: 1048 return $this->step_before_head(); 1049 1050 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD: 1051 return $this->step_in_head(); 1052 1053 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT: 1054 return $this->step_in_head_noscript(); 1055 1056 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD: 1057 return $this->step_after_head(); 1058 1059 case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY: 1060 return $this->step_in_body(); 1061 1062 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE: 1063 return $this->step_in_table(); 1064 1065 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT: 1066 return $this->step_in_table_text(); 1067 1068 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION: 1069 return $this->step_in_caption(); 1070 1071 case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP: 1072 return $this->step_in_column_group(); 1073 1074 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY: 1075 return $this->step_in_table_body(); 1076 1077 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW: 1078 return $this->step_in_row(); 1079 1080 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL: 1081 return $this->step_in_cell(); 1082 1083 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT: 1084 return $this->step_in_select(); 1085 1086 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE: 1087 return $this->step_in_select_in_table(); 1088 1089 case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE: 1090 return $this->step_in_template(); 1091 1092 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY: 1093 return $this->step_after_body(); 1094 1095 case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET: 1096 return $this->step_in_frameset(); 1097 1098 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET: 1099 return $this->step_after_frameset(); 1100 1101 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY: 1102 return $this->step_after_after_body(); 1103 1104 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET: 1105 return $this->step_after_after_frameset(); 1106 1107 // This should be unreachable but PHP doesn't have total type checking on switch. 1108 default: 1109 $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." ); 1110 } 1111 } catch ( WP_HTML_Unsupported_Exception $e ) { 1112 /* 1113 * Exceptions are used in this class to escape deep call stacks that 1114 * otherwise might involve messier calling and return conventions. 1115 */ 1116 return false; 1117 } 1118 } 1119 1120 /** 1121 * Computes the HTML breadcrumbs for the currently-matched node, if matched. 1122 * 1123 * Breadcrumbs start at the outermost parent and descend toward the matched element. 1124 * They always include the entire path from the root HTML node to the matched element. 1125 * 1126 * @todo It could be more efficient to expose a generator-based version of this function 1127 * to avoid creating the array copy on tag iteration. If this is done, it would likely 1128 * be more useful to walk up the stack when yielding instead of starting at the top. 1129 * 1130 * Example 1131 * 1132 * $processor = WP_HTML_Processor::create_fragment( '<p><strong><em><img></em></strong></p>' ); 1133 * $processor->next_tag( 'IMG' ); 1134 * $processor->get_breadcrumbs() === array( 'HTML', 'BODY', 'P', 'STRONG', 'EM', 'IMG' ); 1135 * 1136 * @since 6.4.0 1137 * 1138 * @return string[]|null Array of tag names representing path to matched node, if matched, otherwise NULL. 1139 */ 1140 public function get_breadcrumbs(): ?array { 1141 return $this->breadcrumbs; 1142 } 1143 1144 /** 1145 * Returns the nesting depth of the current location in the document. 1146 * 1147 * Example: 1148 * 1149 * $processor = WP_HTML_Processor::create_fragment( '<div><p></p></div>' ); 1150 * // The processor starts in the BODY context, meaning it has depth from the start: HTML > BODY. 1151 * 2 === $processor->get_current_depth(); 1152 * 1153 * // Opening the DIV element increases the depth. 1154 * $processor->next_token(); 1155 * 3 === $processor->get_current_depth(); 1156 * 1157 * // Opening the P element increases the depth. 1158 * $processor->next_token(); 1159 * 4 === $processor->get_current_depth(); 1160 * 1161 * // The P element is closed during `next_token()` so the depth is decreased to reflect that. 1162 * $processor->next_token(); 1163 * 3 === $processor->get_current_depth(); 1164 * 1165 * @since 6.6.0 1166 * 1167 * @return int Nesting-depth of current location in the document. 1168 */ 1169 public function get_current_depth(): int { 1170 return count( $this->breadcrumbs ); 1171 } 1172 1173 /** 1174 * Normalizes an HTML fragment by serializing it. 1175 * 1176 * This method assumes that the given HTML snippet is found in BODY context. 1177 * For normalizing full documents or fragments found in other contexts, create 1178 * a new processor using {@see WP_HTML_Processor::create_fragment} or 1179 * {@see WP_HTML_Processor::create_full_parser} and call {@see WP_HTML_Processor::serialize} 1180 * on the created instances. 1181 * 1182 * Many aspects of an input HTML fragment may be changed during normalization. 1183 * 1184 * - Attribute values will be double-quoted. 1185 * - Duplicate attributes will be removed. 1186 * - Omitted tags will be added. 1187 * - Tag and attribute name casing will be lower-cased, 1188 * except for specific SVG and MathML tags or attributes. 1189 * - Text will be re-encoded, null bytes handled, 1190 * and invalid UTF-8 replaced with U+FFFD. 1191 * - Any incomplete syntax trailing at the end will be omitted, 1192 * for example, an unclosed comment opener will be removed. 1193 * 1194 * Example: 1195 * 1196 * echo WP_HTML_Processor::normalize( '<a href=#anchor v=5 href="/" enabled>One</a another v=5><!--' ); 1197 * // <a href="#anchor" v="5" enabled>One</a> 1198 * 1199 * echo WP_HTML_Processor::normalize( '<div></p>fun<table><td>cell</div>' ); 1200 * // <div><p></p>fun<table><tbody><tr><td>cell</td></tr></tbody></table></div> 1201 * 1202 * echo WP_HTML_Processor::normalize( '<![CDATA[invalid comment]]> syntax < <> "oddities"' ); 1203 * // <!--[CDATA[invalid comment]]--> syntax < <> "oddities" 1204 * 1205 * @since 6.7.0 1206 * 1207 * @param string $html Input HTML to normalize. 1208 * 1209 * @return string|null Normalized output, or `null` if unable to normalize. 1210 */ 1211 public static function normalize( string $html ): ?string { 1212 return static::create_fragment( $html )->serialize(); 1213 } 1214 1215 /** 1216 * Returns normalized HTML for a fragment by serializing it. 1217 * 1218 * This differs from {@see WP_HTML_Processor::normalize} in that it starts with 1219 * a specific HTML Processor, which _must_ not have already started scanning; 1220 * it must be in the initial ready state and will be in the completed state once 1221 * serialization is complete. 1222 * 1223 * Many aspects of an input HTML fragment may be changed during normalization. 1224 * 1225 * - Attribute values will be double-quoted. 1226 * - Duplicate attributes will be removed. 1227 * - Omitted tags will be added. 1228 * - Tag and attribute name casing will be lower-cased, 1229 * except for specific SVG and MathML tags or attributes. 1230 * - Text will be re-encoded, null bytes handled, 1231 * and invalid UTF-8 replaced with U+FFFD. 1232 * - Any incomplete syntax trailing at the end will be omitted, 1233 * for example, an unclosed comment opener will be removed. 1234 * 1235 * Example: 1236 * 1237 * $processor = WP_HTML_Processor::create_fragment( '<a href=#anchor v=5 href="/" enabled>One</a another v=5><!--' ); 1238 * echo $processor->serialize(); 1239 * // <a href="#anchor" v="5" enabled>One</a> 1240 * 1241 * $processor = WP_HTML_Processor::create_fragment( '<div></p>fun<table><td>cell</div>' ); 1242 * echo $processor->serialize(); 1243 * // <div><p></p>fun<table><tbody><tr><td>cell</td></tr></tbody></table></div> 1244 * 1245 * $processor = WP_HTML_Processor::create_fragment( '<![CDATA[invalid comment]]> syntax < <> "oddities"' ); 1246 * echo $processor->serialize(); 1247 * // <!--[CDATA[invalid comment]]--> syntax < <> "oddities" 1248 * 1249 * @since 6.7.0 1250 * 1251 * @return string|null Normalized HTML markup represented by processor, 1252 * or `null` if unable to generate serialization. 1253 */ 1254 public function serialize(): ?string { 1255 if ( WP_HTML_Tag_Processor::STATE_READY !== $this->parser_state ) { 1256 wp_trigger_error( 1257 __METHOD__, 1258 'An HTML Processor which has already started processing cannot serialize its contents. Serialize immediately after creating the instance.', 1259 E_USER_WARNING 1260 ); 1261 return null; 1262 } 1263 1264 $html = ''; 1265 while ( $this->next_token() ) { 1266 $html .= $this->serialize_token(); 1267 } 1268 1269 if ( null !== $this->get_last_error() ) { 1270 wp_trigger_error( 1271 __METHOD__, 1272 "Cannot serialize HTML Processor with parsing error: {$this->get_last_error()}.", 1273 E_USER_WARNING 1274 ); 1275 return null; 1276 } 1277 1278 return $html; 1279 } 1280 1281 /** 1282 * Serializes the currently-matched token. 1283 * 1284 * This method produces a fully-normative HTML string for the currently-matched token, 1285 * if able. If not matched at any token or if the token doesn't correspond to any HTML 1286 * it will return an empty string (for example, presumptuous end tags are ignored). 1287 * 1288 * @see static::serialize() 1289 * 1290 * @since 6.7.0 1291 * 1292 * @return string Serialization of token, or empty string if no serialization exists. 1293 */ 1294 protected function serialize_token(): string { 1295 $html = ''; 1296 $token_type = $this->get_token_type(); 1297 1298 switch ( $token_type ) { 1299 case '#doctype': 1300 $doctype = $this->get_doctype_info(); 1301 if ( null === $doctype ) { 1302 break; 1303 } 1304 1305 $html .= '<!DOCTYPE'; 1306 1307 if ( $doctype->name ) { 1308 $html .= " {$doctype->name}"; 1309 } 1310 1311 if ( null !== $doctype->public_identifier ) { 1312 $quote = str_contains( $doctype->public_identifier, '"' ) ? "'" : '"'; 1313 $html .= " PUBLIC {$quote}{$doctype->public_identifier}{$quote}"; 1314 } 1315 if ( null !== $doctype->system_identifier ) { 1316 if ( null === $doctype->public_identifier ) { 1317 $html .= ' SYSTEM'; 1318 } 1319 $quote = str_contains( $doctype->system_identifier, '"' ) ? "'" : '"'; 1320 $html .= " {$quote}{$doctype->system_identifier}{$quote}"; 1321 } 1322 1323 $html .= '>'; 1324 break; 1325 1326 case '#text': 1327 $html .= htmlspecialchars( $this->get_modifiable_text(), ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5, 'UTF-8' ); 1328 break; 1329 1330 // Unlike the `<>` which is interpreted as plaintext, this is ignored entirely. 1331 case '#presumptuous-tag': 1332 break; 1333 1334 case '#funky-comment': 1335 case '#comment': 1336 $html .= "<!--{$this->get_full_comment_text()}-->"; 1337 break; 1338 1339 case '#cdata-section': 1340 $html .= "<![CDATA[{$this->get_modifiable_text()}]]>"; 1341 break; 1342 } 1343 1344 if ( '#tag' !== $token_type ) { 1345 return $html; 1346 } 1347 1348 $tag_name = str_replace( "\x00", "\u{FFFD}", $this->get_tag() ); 1349 $in_html = 'html' === $this->get_namespace(); 1350 $qualified_name = $in_html ? strtolower( $tag_name ) : $this->get_qualified_tag_name(); 1351 1352 if ( $this->is_tag_closer() ) { 1353 $html .= "</{$qualified_name}>"; 1354 return $html; 1355 } 1356 1357 $attribute_names = $this->get_attribute_names_with_prefix( '' ); 1358 if ( ! isset( $attribute_names ) ) { 1359 $html .= "<{$qualified_name}>"; 1360 return $html; 1361 } 1362 1363 $html .= "<{$qualified_name}"; 1364 foreach ( $attribute_names as $attribute_name ) { 1365 $html .= " {$this->get_qualified_attribute_name( $attribute_name )}"; 1366 $value = $this->get_attribute( $attribute_name ); 1367 1368 if ( is_string( $value ) ) { 1369 $html .= '="' . htmlspecialchars( $value, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5 ) . '"'; 1370 } 1371 1372 $html = str_replace( "\x00", "\u{FFFD}", $html ); 1373 } 1374 1375 if ( ! $in_html && $this->has_self_closing_flag() ) { 1376 $html .= ' /'; 1377 } 1378 1379 $html .= '>'; 1380 1381 // Flush out self-contained elements. 1382 if ( $in_html && in_array( $tag_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) ) { 1383 $text = $this->get_modifiable_text(); 1384 1385 switch ( $tag_name ) { 1386 case 'IFRAME': 1387 case 'NOEMBED': 1388 case 'NOFRAMES': 1389 $text = ''; 1390 break; 1391 1392 case 'SCRIPT': 1393 case 'STYLE': 1394 break; 1395 1396 default: 1397 $text = htmlspecialchars( $text, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5, 'UTF-8' ); 1398 } 1399 1400 $html .= "{$text}</{$qualified_name}>"; 1401 } 1402 1403 return $html; 1404 } 1405 1406 /** 1407 * Parses next element in the 'initial' insertion mode. 1408 * 1409 * This internal function performs the 'initial' insertion mode 1410 * logic for the generalized WP_HTML_Processor::step() function. 1411 * 1412 * @since 6.7.0 1413 * 1414 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1415 * 1416 * @see https://html.spec.whatwg.org/#the-initial-insertion-mode 1417 * @see WP_HTML_Processor::step 1418 * 1419 * @return bool Whether an element was found. 1420 */ 1421 private function step_initial(): bool { 1422 $token_name = $this->get_token_name(); 1423 $token_type = $this->get_token_type(); 1424 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 1425 $op = "{$op_sigil}{$token_name}"; 1426 1427 switch ( $op ) { 1428 /* 1429 * > A character token that is one of U+0009 CHARACTER TABULATION, 1430 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1431 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1432 * 1433 * Parse error: ignore the token. 1434 */ 1435 case '#text': 1436 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1437 return $this->step(); 1438 } 1439 goto initial_anything_else; 1440 break; 1441 1442 /* 1443 * > A comment token 1444 */ 1445 case '#comment': 1446 case '#funky-comment': 1447 case '#presumptuous-tag': 1448 $this->insert_html_element( $this->state->current_token ); 1449 return true; 1450 1451 /* 1452 * > A DOCTYPE token 1453 */ 1454 case 'html': 1455 $doctype = $this->get_doctype_info(); 1456 if ( null !== $doctype && 'quirks' === $doctype->indicated_compatability_mode ) { 1457 $this->compat_mode = WP_HTML_Tag_Processor::QUIRKS_MODE; 1458 } 1459 1460 /* 1461 * > Then, switch the insertion mode to "before html". 1462 */ 1463 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML; 1464 $this->insert_html_element( $this->state->current_token ); 1465 return true; 1466 } 1467 1468 /* 1469 * > Anything else 1470 */ 1471 initial_anything_else: 1472 $this->compat_mode = WP_HTML_Tag_Processor::QUIRKS_MODE; 1473 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML; 1474 return $this->step( self::REPROCESS_CURRENT_NODE ); 1475 } 1476 1477 /** 1478 * Parses next element in the 'before html' insertion mode. 1479 * 1480 * This internal function performs the 'before html' insertion mode 1481 * logic for the generalized WP_HTML_Processor::step() function. 1482 * 1483 * @since 6.7.0 1484 * 1485 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1486 * 1487 * @see https://html.spec.whatwg.org/#the-before-html-insertion-mode 1488 * @see WP_HTML_Processor::step 1489 * 1490 * @return bool Whether an element was found. 1491 */ 1492 private function step_before_html(): bool { 1493 $token_name = $this->get_token_name(); 1494 $token_type = $this->get_token_type(); 1495 $is_closer = parent::is_tag_closer(); 1496 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1497 $op = "{$op_sigil}{$token_name}"; 1498 1499 switch ( $op ) { 1500 /* 1501 * > A DOCTYPE token 1502 */ 1503 case 'html': 1504 // Parse error: ignore the token. 1505 return $this->step(); 1506 1507 /* 1508 * > A comment token 1509 */ 1510 case '#comment': 1511 case '#funky-comment': 1512 case '#presumptuous-tag': 1513 $this->insert_html_element( $this->state->current_token ); 1514 return true; 1515 1516 /* 1517 * > A character token that is one of U+0009 CHARACTER TABULATION, 1518 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1519 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1520 * 1521 * Parse error: ignore the token. 1522 */ 1523 case '#text': 1524 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1525 return $this->step(); 1526 } 1527 goto before_html_anything_else; 1528 break; 1529 1530 /* 1531 * > A start tag whose tag name is "html" 1532 */ 1533 case '+HTML': 1534 $this->insert_html_element( $this->state->current_token ); 1535 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD; 1536 return true; 1537 1538 /* 1539 * > An end tag whose tag name is one of: "head", "body", "html", "br" 1540 * 1541 * Closing BR tags are always reported by the Tag Processor as opening tags. 1542 */ 1543 case '-HEAD': 1544 case '-BODY': 1545 case '-HTML': 1546 /* 1547 * > Act as described in the "anything else" entry below. 1548 */ 1549 goto before_html_anything_else; 1550 break; 1551 } 1552 1553 /* 1554 * > Any other end tag 1555 */ 1556 if ( $is_closer ) { 1557 // Parse error: ignore the token. 1558 return $this->step(); 1559 } 1560 1561 /* 1562 * > Anything else. 1563 * 1564 * > Create an html element whose node document is the Document object. 1565 * > Append it to the Document object. Put this element in the stack of open elements. 1566 * > Switch the insertion mode to "before head", then reprocess the token. 1567 */ 1568 before_html_anything_else: 1569 $this->insert_virtual_node( 'HTML' ); 1570 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD; 1571 return $this->step( self::REPROCESS_CURRENT_NODE ); 1572 } 1573 1574 /** 1575 * Parses next element in the 'before head' insertion mode. 1576 * 1577 * This internal function performs the 'before head' insertion mode 1578 * logic for the generalized WP_HTML_Processor::step() function. 1579 * 1580 * @since 6.7.0 Stub implementation. 1581 * 1582 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1583 * 1584 * @see https://html.spec.whatwg.org/#the-before-head-insertion-mode 1585 * @see WP_HTML_Processor::step 1586 * 1587 * @return bool Whether an element was found. 1588 */ 1589 private function step_before_head(): bool { 1590 $token_name = $this->get_token_name(); 1591 $token_type = $this->get_token_type(); 1592 $is_closer = parent::is_tag_closer(); 1593 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1594 $op = "{$op_sigil}{$token_name}"; 1595 1596 switch ( $op ) { 1597 /* 1598 * > A character token that is one of U+0009 CHARACTER TABULATION, 1599 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1600 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1601 * 1602 * Parse error: ignore the token. 1603 */ 1604 case '#text': 1605 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1606 return $this->step(); 1607 } 1608 goto before_head_anything_else; 1609 break; 1610 1611 /* 1612 * > A comment token 1613 */ 1614 case '#comment': 1615 case '#funky-comment': 1616 case '#presumptuous-tag': 1617 $this->insert_html_element( $this->state->current_token ); 1618 return true; 1619 1620 /* 1621 * > A DOCTYPE token 1622 */ 1623 case 'html': 1624 // Parse error: ignore the token. 1625 return $this->step(); 1626 1627 /* 1628 * > A start tag whose tag name is "html" 1629 */ 1630 case '+HTML': 1631 return $this->step_in_body(); 1632 1633 /* 1634 * > A start tag whose tag name is "head" 1635 */ 1636 case '+HEAD': 1637 $this->insert_html_element( $this->state->current_token ); 1638 $this->state->head_element = $this->state->current_token; 1639 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1640 return true; 1641 1642 /* 1643 * > An end tag whose tag name is one of: "head", "body", "html", "br" 1644 * > Act as described in the "anything else" entry below. 1645 * 1646 * Closing BR tags are always reported by the Tag Processor as opening tags. 1647 */ 1648 case '-HEAD': 1649 case '-BODY': 1650 case '-HTML': 1651 goto before_head_anything_else; 1652 break; 1653 } 1654 1655 if ( $is_closer ) { 1656 // Parse error: ignore the token. 1657 return $this->step(); 1658 } 1659 1660 /* 1661 * > Anything else 1662 * 1663 * > Insert an HTML element for a "head" start tag token with no attributes. 1664 */ 1665 before_head_anything_else: 1666 $this->state->head_element = $this->insert_virtual_node( 'HEAD' ); 1667 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1668 return $this->step( self::REPROCESS_CURRENT_NODE ); 1669 } 1670 1671 /** 1672 * Parses next element in the 'in head' insertion mode. 1673 * 1674 * This internal function performs the 'in head' insertion mode 1675 * logic for the generalized WP_HTML_Processor::step() function. 1676 * 1677 * @since 6.7.0 1678 * 1679 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1680 * 1681 * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhead 1682 * @see WP_HTML_Processor::step 1683 * 1684 * @return bool Whether an element was found. 1685 */ 1686 private function step_in_head(): bool { 1687 $token_name = $this->get_token_name(); 1688 $token_type = $this->get_token_type(); 1689 $is_closer = parent::is_tag_closer(); 1690 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1691 $op = "{$op_sigil}{$token_name}"; 1692 1693 switch ( $op ) { 1694 case '#text': 1695 /* 1696 * > A character token that is one of U+0009 CHARACTER TABULATION, 1697 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1698 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1699 */ 1700 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1701 // Insert the character. 1702 $this->insert_html_element( $this->state->current_token ); 1703 return true; 1704 } 1705 1706 goto in_head_anything_else; 1707 break; 1708 1709 /* 1710 * > A comment token 1711 */ 1712 case '#comment': 1713 case '#funky-comment': 1714 case '#presumptuous-tag': 1715 $this->insert_html_element( $this->state->current_token ); 1716 return true; 1717 1718 /* 1719 * > A DOCTYPE token 1720 */ 1721 case 'html': 1722 // Parse error: ignore the token. 1723 return $this->step(); 1724 1725 /* 1726 * > A start tag whose tag name is "html" 1727 */ 1728 case '+HTML': 1729 return $this->step_in_body(); 1730 1731 /* 1732 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link" 1733 */ 1734 case '+BASE': 1735 case '+BASEFONT': 1736 case '+BGSOUND': 1737 case '+LINK': 1738 $this->insert_html_element( $this->state->current_token ); 1739 return true; 1740 1741 /* 1742 * > A start tag whose tag name is "meta" 1743 */ 1744 case '+META': 1745 $this->insert_html_element( $this->state->current_token ); 1746 1747 /* 1748 * > If the active speculative HTML parser is null, then: 1749 * > - If the element has a charset attribute, and getting an encoding from 1750 * > its value results in an encoding, and the confidence is currently 1751 * > tentative, then change the encoding to the resulting encoding. 1752 */ 1753 $charset = $this->get_attribute( 'charset' ); 1754 if ( is_string( $charset ) && 'tentative' === $this->state->encoding_confidence ) { 1755 $this->bail( 'Cannot yet process META tags with charset to determine encoding.' ); 1756 } 1757 1758 /* 1759 * > - Otherwise, if the element has an http-equiv attribute whose value is 1760 * > an ASCII case-insensitive match for the string "Content-Type", and 1761 * > the element has a content attribute, and applying the algorithm for 1762 * > extracting a character encoding from a meta element to that attribute's 1763 * > value returns an encoding, and the confidence is currently tentative, 1764 * > then change the encoding to the extracted encoding. 1765 */ 1766 $http_equiv = $this->get_attribute( 'http-equiv' ); 1767 $content = $this->get_attribute( 'content' ); 1768 if ( 1769 is_string( $http_equiv ) && 1770 is_string( $content ) && 1771 0 === strcasecmp( $http_equiv, 'Content-Type' ) && 1772 'tentative' === $this->state->encoding_confidence 1773 ) { 1774 $this->bail( 'Cannot yet process META tags with http-equiv Content-Type to determine encoding.' ); 1775 } 1776 1777 return true; 1778 1779 /* 1780 * > A start tag whose tag name is "title" 1781 */ 1782 case '+TITLE': 1783 $this->insert_html_element( $this->state->current_token ); 1784 return true; 1785 1786 /* 1787 * > A start tag whose tag name is "noscript", if the scripting flag is enabled 1788 * > A start tag whose tag name is one of: "noframes", "style" 1789 * 1790 * The scripting flag is never enabled in this parser. 1791 */ 1792 case '+NOFRAMES': 1793 case '+STYLE': 1794 $this->insert_html_element( $this->state->current_token ); 1795 return true; 1796 1797 /* 1798 * > A start tag whose tag name is "noscript", if the scripting flag is disabled 1799 */ 1800 case '+NOSCRIPT': 1801 $this->insert_html_element( $this->state->current_token ); 1802 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT; 1803 return true; 1804 1805 /* 1806 * > A start tag whose tag name is "script" 1807 * 1808 * @todo Could the adjusted insertion location be anything other than the current location? 1809 */ 1810 case '+SCRIPT': 1811 $this->insert_html_element( $this->state->current_token ); 1812 return true; 1813 1814 /* 1815 * > An end tag whose tag name is "head" 1816 */ 1817 case '-HEAD': 1818 $this->state->stack_of_open_elements->pop(); 1819 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD; 1820 return true; 1821 1822 /* 1823 * > An end tag whose tag name is one of: "body", "html", "br" 1824 * 1825 * BR tags are always reported by the Tag Processor as opening tags. 1826 */ 1827 case '-BODY': 1828 case '-HTML': 1829 /* 1830 * > Act as described in the "anything else" entry below. 1831 */ 1832 goto in_head_anything_else; 1833 break; 1834 1835 /* 1836 * > A start tag whose tag name is "template" 1837 * 1838 * @todo Could the adjusted insertion location be anything other than the current location? 1839 */ 1840 case '+TEMPLATE': 1841 $this->state->active_formatting_elements->insert_marker(); 1842 $this->state->frameset_ok = false; 1843 1844 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE; 1845 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE; 1846 1847 $this->insert_html_element( $this->state->current_token ); 1848 return true; 1849 1850 /* 1851 * > An end tag whose tag name is "template" 1852 */ 1853 case '-TEMPLATE': 1854 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 1855 // @todo Indicate a parse error once it's possible. 1856 return $this->step(); 1857 } 1858 1859 $this->generate_implied_end_tags_thoroughly(); 1860 if ( ! $this->state->stack_of_open_elements->current_node_is( 'TEMPLATE' ) ) { 1861 // @todo Indicate a parse error once it's possible. 1862 } 1863 1864 $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' ); 1865 $this->state->active_formatting_elements->clear_up_to_last_marker(); 1866 array_pop( $this->state->stack_of_template_insertion_modes ); 1867 $this->reset_insertion_mode_appropriately(); 1868 return true; 1869 } 1870 1871 /* 1872 * > A start tag whose tag name is "head" 1873 * > Any other end tag 1874 */ 1875 if ( '+HEAD' === $op || $is_closer ) { 1876 // Parse error: ignore the token. 1877 return $this->step(); 1878 } 1879 1880 /* 1881 * > Anything else 1882 */ 1883 in_head_anything_else: 1884 $this->state->stack_of_open_elements->pop(); 1885 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD; 1886 return $this->step( self::REPROCESS_CURRENT_NODE ); 1887 } 1888 1889 /** 1890 * Parses next element in the 'in head noscript' insertion mode. 1891 * 1892 * This internal function performs the 'in head noscript' insertion mode 1893 * logic for the generalized WP_HTML_Processor::step() function. 1894 * 1895 * @since 6.7.0 Stub implementation. 1896 * 1897 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 1898 * 1899 * @see https://html.spec.whatwg.org/#parsing-main-inheadnoscript 1900 * @see WP_HTML_Processor::step 1901 * 1902 * @return bool Whether an element was found. 1903 */ 1904 private function step_in_head_noscript(): bool { 1905 $token_name = $this->get_token_name(); 1906 $token_type = $this->get_token_type(); 1907 $is_closer = parent::is_tag_closer(); 1908 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 1909 $op = "{$op_sigil}{$token_name}"; 1910 1911 switch ( $op ) { 1912 /* 1913 * > A character token that is one of U+0009 CHARACTER TABULATION, 1914 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 1915 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 1916 * 1917 * Parse error: ignore the token. 1918 */ 1919 case '#text': 1920 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 1921 return $this->step_in_head(); 1922 } 1923 1924 goto in_head_noscript_anything_else; 1925 break; 1926 1927 /* 1928 * > A DOCTYPE token 1929 */ 1930 case 'html': 1931 // Parse error: ignore the token. 1932 return $this->step(); 1933 1934 /* 1935 * > A start tag whose tag name is "html" 1936 */ 1937 case '+HTML': 1938 return $this->step_in_body(); 1939 1940 /* 1941 * > An end tag whose tag name is "noscript" 1942 */ 1943 case '-NOSCRIPT': 1944 $this->state->stack_of_open_elements->pop(); 1945 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1946 return true; 1947 1948 /* 1949 * > A comment token 1950 * > 1951 * > A start tag whose tag name is one of: "basefont", "bgsound", 1952 * > "link", "meta", "noframes", "style" 1953 */ 1954 case '#comment': 1955 case '#funky-comment': 1956 case '#presumptuous-tag': 1957 case '+BASEFONT': 1958 case '+BGSOUND': 1959 case '+LINK': 1960 case '+META': 1961 case '+NOFRAMES': 1962 case '+STYLE': 1963 return $this->step_in_head(); 1964 1965 /* 1966 * > An end tag whose tag name is "br" 1967 * 1968 * This should never happen, as the Tag Processor prevents showing a BR closing tag. 1969 */ 1970 } 1971 1972 /* 1973 * > A start tag whose tag name is one of: "head", "noscript" 1974 * > Any other end tag 1975 */ 1976 if ( '+HEAD' === $op || '+NOSCRIPT' === $op || $is_closer ) { 1977 // Parse error: ignore the token. 1978 return $this->step(); 1979 } 1980 1981 /* 1982 * > Anything else 1983 * 1984 * Anything here is a parse error. 1985 */ 1986 in_head_noscript_anything_else: 1987 $this->state->stack_of_open_elements->pop(); 1988 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 1989 return $this->step( self::REPROCESS_CURRENT_NODE ); 1990 } 1991 1992 /** 1993 * Parses next element in the 'after head' insertion mode. 1994 * 1995 * This internal function performs the 'after head' insertion mode 1996 * logic for the generalized WP_HTML_Processor::step() function. 1997 * 1998 * @since 6.7.0 Stub implementation. 1999 * 2000 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 2001 * 2002 * @see https://html.spec.whatwg.org/#the-after-head-insertion-mode 2003 * @see WP_HTML_Processor::step 2004 * 2005 * @return bool Whether an element was found. 2006 */ 2007 private function step_after_head(): bool { 2008 $token_name = $this->get_token_name(); 2009 $token_type = $this->get_token_type(); 2010 $is_closer = parent::is_tag_closer(); 2011 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 2012 $op = "{$op_sigil}{$token_name}"; 2013 2014 switch ( $op ) { 2015 /* 2016 * > A character token that is one of U+0009 CHARACTER TABULATION, 2017 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF), 2018 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 2019 */ 2020 case '#text': 2021 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 2022 // Insert the character. 2023 $this->insert_html_element( $this->state->current_token ); 2024 return true; 2025 } 2026 goto after_head_anything_else; 2027 break; 2028 2029 /* 2030 * > A comment token 2031 */ 2032 case '#comment': 2033 case '#funky-comment': 2034 case '#presumptuous-tag': 2035 $this->insert_html_element( $this->state->current_token ); 2036 return true; 2037 2038 /* 2039 * > A DOCTYPE token 2040 */ 2041 case 'html': 2042 // Parse error: ignore the token. 2043 return $this->step(); 2044 2045 /* 2046 * > A start tag whose tag name is "html" 2047 */ 2048 case '+HTML': 2049 return $this->step_in_body(); 2050 2051 /* 2052 * > A start tag whose tag name is "body" 2053 */ 2054 case '+BODY': 2055 $this->insert_html_element( $this->state->current_token ); 2056 $this->state->frameset_ok = false; 2057 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 2058 return true; 2059 2060 /* 2061 * > A start tag whose tag name is "frameset" 2062 */ 2063 case '+FRAMESET': 2064 $this->insert_html_element( $this->state->current_token ); 2065 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET; 2066 return true; 2067 2068 /* 2069 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", 2070 * > "link", "meta", "noframes", "script", "style", "template", "title" 2071 * 2072 * Anything here is a parse error. 2073 */ 2074 case '+BASE': 2075 case '+BASEFONT': 2076 case '+BGSOUND': 2077 case '+LINK': 2078 case '+META': 2079 case '+NOFRAMES': 2080 case '+SCRIPT': 2081 case '+STYLE': 2082 case '+TEMPLATE': 2083 case '+TITLE': 2084 /* 2085 * > Push the node pointed to by the head element pointer onto the stack of open elements. 2086 * > Process the token using the rules for the "in head" insertion mode. 2087 * > Remove the node pointed to by the head element pointer from the stack of open elements. (It might not be the current node at this point.) 2088 */ 2089 $this->bail( 'Cannot process elements after HEAD which reopen the HEAD element.' ); 2090 /* 2091 * Do not leave this break in when adding support; it's here to prevent 2092 * WPCS from getting confused at the switch structure without a return, 2093 * because it doesn't know that `bail()` always throws. 2094 */ 2095 break; 2096 2097 /* 2098 * > An end tag whose tag name is "template" 2099 */ 2100 case '-TEMPLATE': 2101 return $this->step_in_head(); 2102 2103 /* 2104 * > An end tag whose tag name is one of: "body", "html", "br" 2105 * 2106 * Closing BR tags are always reported by the Tag Processor as opening tags. 2107 */ 2108 case '-BODY': 2109 case '-HTML': 2110 /* 2111 * > Act as described in the "anything else" entry below. 2112 */ 2113 goto after_head_anything_else; 2114 break; 2115 } 2116 2117 /* 2118 * > A start tag whose tag name is "head" 2119 * > Any other end tag 2120 */ 2121 if ( '+HEAD' === $op || $is_closer ) { 2122 // Parse error: ignore the token. 2123 return $this->step(); 2124 } 2125 2126 /* 2127 * > Anything else 2128 * > Insert an HTML element for a "body" start tag token with no attributes. 2129 */ 2130 after_head_anything_else: 2131 $this->insert_virtual_node( 'BODY' ); 2132 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 2133 return $this->step( self::REPROCESS_CURRENT_NODE ); 2134 } 2135 2136 /** 2137 * Parses next element in the 'in body' insertion mode. 2138 * 2139 * This internal function performs the 'in body' insertion mode 2140 * logic for the generalized WP_HTML_Processor::step() function. 2141 * 2142 * @since 6.4.0 2143 * 2144 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 2145 * 2146 * @see https://html.spec.whatwg.org/#parsing-main-inbody 2147 * @see WP_HTML_Processor::step 2148 * 2149 * @return bool Whether an element was found. 2150 */ 2151 private function step_in_body(): bool { 2152 $token_name = $this->get_token_name(); 2153 $token_type = $this->get_token_type(); 2154 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 2155 $op = "{$op_sigil}{$token_name}"; 2156 2157 switch ( $op ) { 2158 case '#text': 2159 /* 2160 * > A character token that is U+0000 NULL 2161 * 2162 * Any successive sequence of NULL bytes is ignored and won't 2163 * trigger active format reconstruction. Therefore, if the text 2164 * only comprises NULL bytes then the token should be ignored 2165 * here, but if there are any other characters in the stream 2166 * the active formats should be reconstructed. 2167 */ 2168 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) { 2169 // Parse error: ignore the token. 2170 return $this->step(); 2171 } 2172 2173 $this->reconstruct_active_formatting_elements(); 2174 2175 /* 2176 * Whitespace-only text does not affect the frameset-ok flag. 2177 * It is probably inter-element whitespace, but it may also 2178 * contain character references which decode only to whitespace. 2179 */ 2180 if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) { 2181 $this->state->frameset_ok = false; 2182 } 2183 2184 $this->insert_html_element( $this->state->current_token ); 2185 return true; 2186 2187 case '#comment': 2188 case '#funky-comment': 2189 case '#presumptuous-tag': 2190 $this->insert_html_element( $this->state->current_token ); 2191 return true; 2192 2193 /* 2194 * > A DOCTYPE token 2195 * > Parse error. Ignore the token. 2196 */ 2197 case 'html': 2198 return $this->step(); 2199 2200 /* 2201 * > A start tag whose tag name is "html" 2202 */ 2203 case '+HTML': 2204 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 2205 /* 2206 * > Otherwise, for each attribute on the token, check to see if the attribute 2207 * > is already present on the top element of the stack of open elements. If 2208 * > it is not, add the attribute and its corresponding value to that element. 2209 * 2210 * This parser does not currently support this behavior: ignore the token. 2211 */ 2212 } 2213 2214 // Ignore the token. 2215 return $this->step(); 2216 2217 /* 2218 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link", 2219 * > "meta", "noframes", "script", "style", "template", "title" 2220 * > 2221 * > An end tag whose tag name is "template" 2222 */ 2223 case '+BASE': 2224 case '+BASEFONT': 2225 case '+BGSOUND': 2226 case '+LINK': 2227 case '+META': 2228 case '+NOFRAMES': 2229 case '+SCRIPT': 2230 case '+STYLE': 2231 case '+TEMPLATE': 2232 case '+TITLE': 2233 case '-TEMPLATE': 2234 return $this->step_in_head(); 2235 2236 /* 2237 * > A start tag whose tag name is "body" 2238 * 2239 * This tag in the IN BODY insertion mode is a parse error. 2240 */ 2241 case '+BODY': 2242 if ( 2243 1 === $this->state->stack_of_open_elements->count() || 2244 'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) || 2245 $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) 2246 ) { 2247 // Ignore the token. 2248 return $this->step(); 2249 } 2250 2251 /* 2252 * > Otherwise, set the frameset-ok flag to "not ok"; then, for each attribute 2253 * > on the token, check to see if the attribute is already present on the body 2254 * > element (the second element) on the stack of open elements, and if it is 2255 * > not, add the attribute and its corresponding value to that element. 2256 * 2257 * This parser does not currently support this behavior: ignore the token. 2258 */ 2259 $this->state->frameset_ok = false; 2260 return $this->step(); 2261 2262 /* 2263 * > A start tag whose tag name is "frameset" 2264 * 2265 * This tag in the IN BODY insertion mode is a parse error. 2266 */ 2267 case '+FRAMESET': 2268 if ( 2269 1 === $this->state->stack_of_open_elements->count() || 2270 'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) || 2271 false === $this->state->frameset_ok 2272 ) { 2273 // Ignore the token. 2274 return $this->step(); 2275 } 2276 2277 /* 2278 * > Otherwise, run the following steps: 2279 */ 2280 $this->bail( 'Cannot process non-ignored FRAMESET tags.' ); 2281 break; 2282 2283 /* 2284 * > An end tag whose tag name is "body" 2285 */ 2286 case '-BODY': 2287 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) { 2288 // Parse error: ignore the token. 2289 return $this->step(); 2290 } 2291 2292 /* 2293 * > Otherwise, if there is a node in the stack of open elements that is not either a 2294 * > dd element, a dt element, an li element, an optgroup element, an option element, 2295 * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody 2296 * > element, a td element, a tfoot element, a th element, a thread element, a tr 2297 * > element, the body element, or the html element, then this is a parse error. 2298 * 2299 * There is nothing to do for this parse error, so don't check for it. 2300 */ 2301 2302 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY; 2303 return true; 2304 2305 /* 2306 * > An end tag whose tag name is "html" 2307 */ 2308 case '-HTML': 2309 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) { 2310 // Parse error: ignore the token. 2311 return $this->step(); 2312 } 2313 2314 /* 2315 * > Otherwise, if there is a node in the stack of open elements that is not either a 2316 * > dd element, a dt element, an li element, an optgroup element, an option element, 2317 * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody 2318 * > element, a td element, a tfoot element, a th element, a thread element, a tr 2319 * > element, the body element, or the html element, then this is a parse error. 2320 * 2321 * There is nothing to do for this parse error, so don't check for it. 2322 */ 2323 2324 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY; 2325 return $this->step( self::REPROCESS_CURRENT_NODE ); 2326 2327 /* 2328 * > A start tag whose tag name is one of: "address", "article", "aside", 2329 * > "blockquote", "center", "details", "dialog", "dir", "div", "dl", 2330 * > "fieldset", "figcaption", "figure", "footer", "header", "hgroup", 2331 * > "main", "menu", "nav", "ol", "p", "search", "section", "summary", "ul" 2332 */ 2333 case '+ADDRESS': 2334 case '+ARTICLE': 2335 case '+ASIDE': 2336 case '+BLOCKQUOTE': 2337 case '+CENTER': 2338 case '+DETAILS': 2339 case '+DIALOG': 2340 case '+DIR': 2341 case '+DIV': 2342 case '+DL': 2343 case '+FIELDSET': 2344 case '+FIGCAPTION': 2345 case '+FIGURE': 2346 case '+FOOTER': 2347 case '+HEADER': 2348 case '+HGROUP': 2349 case '+MAIN': 2350 case '+MENU': 2351 case '+NAV': 2352 case '+OL': 2353 case '+P': 2354 case '+SEARCH': 2355 case '+SECTION': 2356 case '+SUMMARY': 2357 case '+UL': 2358 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2359 $this->close_a_p_element(); 2360 } 2361 2362 $this->insert_html_element( $this->state->current_token ); 2363 return true; 2364 2365 /* 2366 * > A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" 2367 */ 2368 case '+H1': 2369 case '+H2': 2370 case '+H3': 2371 case '+H4': 2372 case '+H5': 2373 case '+H6': 2374 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2375 $this->close_a_p_element(); 2376 } 2377 2378 if ( 2379 in_array( 2380 $this->state->stack_of_open_elements->current_node()->node_name, 2381 array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), 2382 true 2383 ) 2384 ) { 2385 // @todo Indicate a parse error once it's possible. 2386 $this->state->stack_of_open_elements->pop(); 2387 } 2388 2389 $this->insert_html_element( $this->state->current_token ); 2390 return true; 2391 2392 /* 2393 * > A start tag whose tag name is one of: "pre", "listing" 2394 */ 2395 case '+PRE': 2396 case '+LISTING': 2397 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2398 $this->close_a_p_element(); 2399 } 2400 2401 /* 2402 * > If the next token is a U+000A LINE FEED (LF) character token, 2403 * > then ignore that token and move on to the next one. (Newlines 2404 * > at the start of pre blocks are ignored as an authoring convenience.) 2405 * 2406 * This is handled in `get_modifiable_text()`. 2407 */ 2408 2409 $this->insert_html_element( $this->state->current_token ); 2410 $this->state->frameset_ok = false; 2411 return true; 2412 2413 /* 2414 * > A start tag whose tag name is "form" 2415 */ 2416 case '+FORM': 2417 $stack_contains_template = $this->state->stack_of_open_elements->contains( 'TEMPLATE' ); 2418 2419 if ( isset( $this->state->form_element ) && ! $stack_contains_template ) { 2420 // Parse error: ignore the token. 2421 return $this->step(); 2422 } 2423 2424 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2425 $this->close_a_p_element(); 2426 } 2427 2428 $this->insert_html_element( $this->state->current_token ); 2429 if ( ! $stack_contains_template ) { 2430 $this->state->form_element = $this->state->current_token; 2431 } 2432 2433 return true; 2434 2435 /* 2436 * > A start tag whose tag name is "li" 2437 * > A start tag whose tag name is one of: "dd", "dt" 2438 */ 2439 case '+DD': 2440 case '+DT': 2441 case '+LI': 2442 $this->state->frameset_ok = false; 2443 $node = $this->state->stack_of_open_elements->current_node(); 2444 $is_li = 'LI' === $token_name; 2445 2446 in_body_list_loop: 2447 /* 2448 * The logic for LI and DT/DD is the same except for one point: LI elements _only_ 2449 * close other LI elements, but a DT or DD element closes _any_ open DT or DD element. 2450 */ 2451 if ( $is_li ? 'LI' === $node->node_name : ( 'DD' === $node->node_name || 'DT' === $node->node_name ) ) { 2452 $node_name = $is_li ? 'LI' : $node->node_name; 2453 $this->generate_implied_end_tags( $node_name ); 2454 if ( ! $this->state->stack_of_open_elements->current_node_is( $node_name ) ) { 2455 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2456 } 2457 2458 $this->state->stack_of_open_elements->pop_until( $node_name ); 2459 goto in_body_list_done; 2460 } 2461 2462 if ( 2463 'ADDRESS' !== $node->node_name && 2464 'DIV' !== $node->node_name && 2465 'P' !== $node->node_name && 2466 self::is_special( $node ) 2467 ) { 2468 /* 2469 * > If node is in the special category, but is not an address, div, 2470 * > or p element, then jump to the step labeled done below. 2471 */ 2472 goto in_body_list_done; 2473 } else { 2474 /* 2475 * > Otherwise, set node to the previous entry in the stack of open elements 2476 * > and return to the step labeled loop. 2477 */ 2478 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) { 2479 $node = $item; 2480 break; 2481 } 2482 goto in_body_list_loop; 2483 } 2484 2485 in_body_list_done: 2486 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2487 $this->close_a_p_element(); 2488 } 2489 2490 $this->insert_html_element( $this->state->current_token ); 2491 return true; 2492 2493 case '+PLAINTEXT': 2494 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2495 $this->close_a_p_element(); 2496 } 2497 2498 /* 2499 * @todo This may need to be handled in the Tag Processor and turn into 2500 * a single self-contained tag like TEXTAREA, whose modifiable text 2501 * is the rest of the input document as plaintext. 2502 */ 2503 $this->bail( 'Cannot process PLAINTEXT elements.' ); 2504 break; 2505 2506 /* 2507 * > A start tag whose tag name is "button" 2508 */ 2509 case '+BUTTON': 2510 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'BUTTON' ) ) { 2511 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2512 $this->generate_implied_end_tags(); 2513 $this->state->stack_of_open_elements->pop_until( 'BUTTON' ); 2514 } 2515 2516 $this->reconstruct_active_formatting_elements(); 2517 $this->insert_html_element( $this->state->current_token ); 2518 $this->state->frameset_ok = false; 2519 2520 return true; 2521 2522 /* 2523 * > An end tag whose tag name is one of: "address", "article", "aside", "blockquote", 2524 * > "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset", 2525 * > "figcaption", "figure", "footer", "header", "hgroup", "listing", "main", 2526 * > "menu", "nav", "ol", "pre", "search", "section", "summary", "ul" 2527 */ 2528 case '-ADDRESS': 2529 case '-ARTICLE': 2530 case '-ASIDE': 2531 case '-BLOCKQUOTE': 2532 case '-BUTTON': 2533 case '-CENTER': 2534 case '-DETAILS': 2535 case '-DIALOG': 2536 case '-DIR': 2537 case '-DIV': 2538 case '-DL': 2539 case '-FIELDSET': 2540 case '-FIGCAPTION': 2541 case '-FIGURE': 2542 case '-FOOTER': 2543 case '-HEADER': 2544 case '-HGROUP': 2545 case '-LISTING': 2546 case '-MAIN': 2547 case '-MENU': 2548 case '-NAV': 2549 case '-OL': 2550 case '-PRE': 2551 case '-SEARCH': 2552 case '-SECTION': 2553 case '-SUMMARY': 2554 case '-UL': 2555 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) { 2556 // @todo Report parse error. 2557 // Ignore the token. 2558 return $this->step(); 2559 } 2560 2561 $this->generate_implied_end_tags(); 2562 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2563 // @todo Record parse error: this error doesn't impact parsing. 2564 } 2565 $this->state->stack_of_open_elements->pop_until( $token_name ); 2566 return true; 2567 2568 /* 2569 * > An end tag whose tag name is "form" 2570 */ 2571 case '-FORM': 2572 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 2573 $node = $this->state->form_element; 2574 $this->state->form_element = null; 2575 2576 /* 2577 * > If node is null or if the stack of open elements does not have node 2578 * > in scope, then this is a parse error; return and ignore the token. 2579 * 2580 * @todo It's necessary to check if the form token itself is in scope, not 2581 * simply whether any FORM is in scope. 2582 */ 2583 if ( 2584 null === $node || 2585 ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' ) 2586 ) { 2587 // Parse error: ignore the token. 2588 return $this->step(); 2589 } 2590 2591 $this->generate_implied_end_tags(); 2592 if ( $node !== $this->state->stack_of_open_elements->current_node() ) { 2593 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2594 $this->bail( 'Cannot close a FORM when other elements remain open as this would throw off the breadcrumbs for the following tokens.' ); 2595 } 2596 2597 $this->state->stack_of_open_elements->remove_node( $node ); 2598 return true; 2599 } else { 2600 /* 2601 * > If the stack of open elements does not have a form element in scope, 2602 * > then this is a parse error; return and ignore the token. 2603 * 2604 * Note that unlike in the clause above, this is checking for any FORM in scope. 2605 */ 2606 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' ) ) { 2607 // Parse error: ignore the token. 2608 return $this->step(); 2609 } 2610 2611 $this->generate_implied_end_tags(); 2612 2613 if ( ! $this->state->stack_of_open_elements->current_node_is( 'FORM' ) ) { 2614 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2615 } 2616 2617 $this->state->stack_of_open_elements->pop_until( 'FORM' ); 2618 return true; 2619 } 2620 break; 2621 2622 /* 2623 * > An end tag whose tag name is "p" 2624 */ 2625 case '-P': 2626 if ( ! $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2627 $this->insert_html_element( $this->state->current_token ); 2628 } 2629 2630 $this->close_a_p_element(); 2631 return true; 2632 2633 /* 2634 * > An end tag whose tag name is "li" 2635 * > An end tag whose tag name is one of: "dd", "dt" 2636 */ 2637 case '-DD': 2638 case '-DT': 2639 case '-LI': 2640 if ( 2641 /* 2642 * An end tag whose tag name is "li": 2643 * If the stack of open elements does not have an li element in list item scope, 2644 * then this is a parse error; ignore the token. 2645 */ 2646 ( 2647 'LI' === $token_name && 2648 ! $this->state->stack_of_open_elements->has_element_in_list_item_scope( 'LI' ) 2649 ) || 2650 /* 2651 * An end tag whose tag name is one of: "dd", "dt": 2652 * If the stack of open elements does not have an element in scope that is an 2653 * HTML element with the same tag name as that of the token, then this is a 2654 * parse error; ignore the token. 2655 */ 2656 ( 2657 'LI' !== $token_name && 2658 ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) 2659 ) 2660 ) { 2661 /* 2662 * This is a parse error, ignore the token. 2663 * 2664 * @todo Indicate a parse error once it's possible. 2665 */ 2666 return $this->step(); 2667 } 2668 2669 $this->generate_implied_end_tags( $token_name ); 2670 2671 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2672 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. 2673 } 2674 2675 $this->state->stack_of_open_elements->pop_until( $token_name ); 2676 return true; 2677 2678 /* 2679 * > An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" 2680 */ 2681 case '-H1': 2682 case '-H2': 2683 case '-H3': 2684 case '-H4': 2685 case '-H5': 2686 case '-H6': 2687 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( '(internal: H1 through H6 - do not use)' ) ) { 2688 /* 2689 * This is a parse error; ignore the token. 2690 * 2691 * @todo Indicate a parse error once it's possible. 2692 */ 2693 return $this->step(); 2694 } 2695 2696 $this->generate_implied_end_tags(); 2697 2698 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2699 // @todo Record parse error: this error doesn't impact parsing. 2700 } 2701 2702 $this->state->stack_of_open_elements->pop_until( '(internal: H1 through H6 - do not use)' ); 2703 return true; 2704 2705 /* 2706 * > A start tag whose tag name is "a" 2707 */ 2708 case '+A': 2709 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { 2710 switch ( $item->node_name ) { 2711 case 'marker': 2712 break 2; 2713 2714 case 'A': 2715 $this->run_adoption_agency_algorithm(); 2716 $this->state->active_formatting_elements->remove_node( $item ); 2717 $this->state->stack_of_open_elements->remove_node( $item ); 2718 break 2; 2719 } 2720 } 2721 2722 $this->reconstruct_active_formatting_elements(); 2723 $this->insert_html_element( $this->state->current_token ); 2724 $this->state->active_formatting_elements->push( $this->state->current_token ); 2725 return true; 2726 2727 /* 2728 * > A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i", 2729 * > "s", "small", "strike", "strong", "tt", "u" 2730 */ 2731 case '+B': 2732 case '+BIG': 2733 case '+CODE': 2734 case '+EM': 2735 case '+FONT': 2736 case '+I': 2737 case '+S': 2738 case '+SMALL': 2739 case '+STRIKE': 2740 case '+STRONG': 2741 case '+TT': 2742 case '+U': 2743 $this->reconstruct_active_formatting_elements(); 2744 $this->insert_html_element( $this->state->current_token ); 2745 $this->state->active_formatting_elements->push( $this->state->current_token ); 2746 return true; 2747 2748 /* 2749 * > A start tag whose tag name is "nobr" 2750 */ 2751 case '+NOBR': 2752 $this->reconstruct_active_formatting_elements(); 2753 2754 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'NOBR' ) ) { 2755 // Parse error. 2756 $this->run_adoption_agency_algorithm(); 2757 $this->reconstruct_active_formatting_elements(); 2758 } 2759 2760 $this->insert_html_element( $this->state->current_token ); 2761 $this->state->active_formatting_elements->push( $this->state->current_token ); 2762 return true; 2763 2764 /* 2765 * > An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i", 2766 * > "nobr", "s", "small", "strike", "strong", "tt", "u" 2767 */ 2768 case '-A': 2769 case '-B': 2770 case '-BIG': 2771 case '-CODE': 2772 case '-EM': 2773 case '-FONT': 2774 case '-I': 2775 case '-NOBR': 2776 case '-S': 2777 case '-SMALL': 2778 case '-STRIKE': 2779 case '-STRONG': 2780 case '-TT': 2781 case '-U': 2782 $this->run_adoption_agency_algorithm(); 2783 return true; 2784 2785 /* 2786 * > A start tag whose tag name is one of: "applet", "marquee", "object" 2787 */ 2788 case '+APPLET': 2789 case '+MARQUEE': 2790 case '+OBJECT': 2791 $this->reconstruct_active_formatting_elements(); 2792 $this->insert_html_element( $this->state->current_token ); 2793 $this->state->active_formatting_elements->insert_marker(); 2794 $this->state->frameset_ok = false; 2795 return true; 2796 2797 /* 2798 * > A end tag token whose tag name is one of: "applet", "marquee", "object" 2799 */ 2800 case '-APPLET': 2801 case '-MARQUEE': 2802 case '-OBJECT': 2803 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) { 2804 // Parse error: ignore the token. 2805 return $this->step(); 2806 } 2807 2808 $this->generate_implied_end_tags(); 2809 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) { 2810 // This is a parse error. 2811 } 2812 2813 $this->state->stack_of_open_elements->pop_until( $token_name ); 2814 $this->state->active_formatting_elements->clear_up_to_last_marker(); 2815 return true; 2816 2817 /* 2818 * > A start tag whose tag name is "table" 2819 */ 2820 case '+TABLE': 2821 /* 2822 * > If the Document is not set to quirks mode, and the stack of open elements 2823 * > has a p element in button scope, then close a p element. 2824 */ 2825 if ( 2826 WP_HTML_Tag_Processor::QUIRKS_MODE !== $this->compat_mode && 2827 $this->state->stack_of_open_elements->has_p_in_button_scope() 2828 ) { 2829 $this->close_a_p_element(); 2830 } 2831 2832 $this->insert_html_element( $this->state->current_token ); 2833 $this->state->frameset_ok = false; 2834 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 2835 return true; 2836 2837 /* 2838 * > An end tag whose tag name is "br" 2839 * 2840 * This is prevented from happening because the Tag Processor 2841 * reports all closing BR tags as if they were opening tags. 2842 */ 2843 2844 /* 2845 * > A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr" 2846 */ 2847 case '+AREA': 2848 case '+BR': 2849 case '+EMBED': 2850 case '+IMG': 2851 case '+KEYGEN': 2852 case '+WBR': 2853 $this->reconstruct_active_formatting_elements(); 2854 $this->insert_html_element( $this->state->current_token ); 2855 $this->state->frameset_ok = false; 2856 return true; 2857 2858 /* 2859 * > A start tag whose tag name is "input" 2860 */ 2861 case '+INPUT': 2862 $this->reconstruct_active_formatting_elements(); 2863 $this->insert_html_element( $this->state->current_token ); 2864 2865 /* 2866 * > If the token does not have an attribute with the name "type", or if it does, 2867 * > but that attribute's value is not an ASCII case-insensitive match for the 2868 * > string "hidden", then: set the frameset-ok flag to "not ok". 2869 */ 2870 $type_attribute = $this->get_attribute( 'type' ); 2871 if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) { 2872 $this->state->frameset_ok = false; 2873 } 2874 2875 return true; 2876 2877 /* 2878 * > A start tag whose tag name is one of: "param", "source", "track" 2879 */ 2880 case '+PARAM': 2881 case '+SOURCE': 2882 case '+TRACK': 2883 $this->insert_html_element( $this->state->current_token ); 2884 return true; 2885 2886 /* 2887 * > A start tag whose tag name is "hr" 2888 */ 2889 case '+HR': 2890 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2891 $this->close_a_p_element(); 2892 } 2893 $this->insert_html_element( $this->state->current_token ); 2894 $this->state->frameset_ok = false; 2895 return true; 2896 2897 /* 2898 * > A start tag whose tag name is "image" 2899 */ 2900 case '+IMAGE': 2901 /* 2902 * > Parse error. Change the token's tag name to "img" and reprocess it. (Don't ask.) 2903 * 2904 * Note that this is handled elsewhere, so it should not be possible to reach this code. 2905 */ 2906 $this->bail( "Cannot process an IMAGE tag. (Don't ask.)" ); 2907 break; 2908 2909 /* 2910 * > A start tag whose tag name is "textarea" 2911 */ 2912 case '+TEXTAREA': 2913 $this->insert_html_element( $this->state->current_token ); 2914 2915 /* 2916 * > If the next token is a U+000A LINE FEED (LF) character token, then ignore 2917 * > that token and move on to the next one. (Newlines at the start of 2918 * > textarea elements are ignored as an authoring convenience.) 2919 * 2920 * This is handled in `get_modifiable_text()`. 2921 */ 2922 2923 $this->state->frameset_ok = false; 2924 2925 /* 2926 * > Switch the insertion mode to "text". 2927 * 2928 * As a self-contained node, this behavior is handled in the Tag Processor. 2929 */ 2930 return true; 2931 2932 /* 2933 * > A start tag whose tag name is "xmp" 2934 */ 2935 case '+XMP': 2936 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { 2937 $this->close_a_p_element(); 2938 } 2939 2940 $this->reconstruct_active_formatting_elements(); 2941 $this->state->frameset_ok = false; 2942 2943 /* 2944 * > Follow the generic raw text element parsing algorithm. 2945 * 2946 * As a self-contained node, this behavior is handled in the Tag Processor. 2947 */ 2948 $this->insert_html_element( $this->state->current_token ); 2949 return true; 2950 2951 /* 2952 * A start tag whose tag name is "iframe" 2953 */ 2954 case '+IFRAME': 2955 $this->state->frameset_ok = false; 2956 2957 /* 2958 * > Follow the generic raw text element parsing algorithm. 2959 * 2960 * As a self-contained node, this behavior is handled in the Tag Processor. 2961 */ 2962 $this->insert_html_element( $this->state->current_token ); 2963 return true; 2964 2965 /* 2966 * > A start tag whose tag name is "noembed" 2967 * > A start tag whose tag name is "noscript", if the scripting flag is enabled 2968 * 2969 * The scripting flag is never enabled in this parser. 2970 */ 2971 case '+NOEMBED': 2972 $this->insert_html_element( $this->state->current_token ); 2973 return true; 2974 2975 /* 2976 * > A start tag whose tag name is "select" 2977 */ 2978 case '+SELECT': 2979 $this->reconstruct_active_formatting_elements(); 2980 $this->insert_html_element( $this->state->current_token ); 2981 $this->state->frameset_ok = false; 2982 2983 switch ( $this->state->insertion_mode ) { 2984 /* 2985 * > If the insertion mode is one of "in table", "in caption", "in table body", "in row", 2986 * > or "in cell", then switch the insertion mode to "in select in table". 2987 */ 2988 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE: 2989 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION: 2990 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY: 2991 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW: 2992 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL: 2993 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE; 2994 break; 2995 2996 /* 2997 * > Otherwise, switch the insertion mode to "in select". 2998 */ 2999 default: 3000 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT; 3001 break; 3002 } 3003 return true; 3004 3005 /* 3006 * > A start tag whose tag name is one of: "optgroup", "option" 3007 */ 3008 case '+OPTGROUP': 3009 case '+OPTION': 3010 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 3011 $this->state->stack_of_open_elements->pop(); 3012 } 3013 $this->reconstruct_active_formatting_elements(); 3014 $this->insert_html_element( $this->state->current_token ); 3015 return true; 3016 3017 /* 3018 * > A start tag whose tag name is one of: "rb", "rtc" 3019 */ 3020 case '+RB': 3021 case '+RTC': 3022 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) { 3023 $this->generate_implied_end_tags(); 3024 3025 if ( $this->state->stack_of_open_elements->current_node_is( 'RUBY' ) ) { 3026 // @todo Indicate a parse error once it's possible. 3027 } 3028 } 3029 3030 $this->insert_html_element( $this->state->current_token ); 3031 return true; 3032 3033 /* 3034 * > A start tag whose tag name is one of: "rp", "rt" 3035 */ 3036 case '+RP': 3037 case '+RT': 3038 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) { 3039 $this->generate_implied_end_tags( 'RTC' ); 3040 3041 $current_node_name = $this->state->stack_of_open_elements->current_node()->node_name; 3042 if ( 'RTC' === $current_node_name || 'RUBY' === $current_node_name ) { 3043 // @todo Indicate a parse error once it's possible. 3044 } 3045 } 3046 3047 $this->insert_html_element( $this->state->current_token ); 3048 return true; 3049 3050 /* 3051 * > A start tag whose tag name is "math" 3052 */ 3053 case '+MATH': 3054 $this->reconstruct_active_formatting_elements(); 3055 3056 /* 3057 * @todo Adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.) 3058 * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink.) 3059 * 3060 * These ought to be handled in the attribute methods. 3061 */ 3062 $this->state->current_token->namespace = 'math'; 3063 $this->insert_html_element( $this->state->current_token ); 3064 if ( $this->state->current_token->has_self_closing_flag ) { 3065 $this->state->stack_of_open_elements->pop(); 3066 } 3067 return true; 3068 3069 /* 3070 * > A start tag whose tag name is "svg" 3071 */ 3072 case '+SVG': 3073 $this->reconstruct_active_formatting_elements(); 3074 3075 /* 3076 * @todo Adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.) 3077 * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.) 3078 * 3079 * These ought to be handled in the attribute methods. 3080 */ 3081 $this->state->current_token->namespace = 'svg'; 3082 $this->insert_html_element( $this->state->current_token ); 3083 if ( $this->state->current_token->has_self_closing_flag ) { 3084 $this->state->stack_of_open_elements->pop(); 3085 } 3086 return true; 3087 3088 /* 3089 * > A start tag whose tag name is one of: "caption", "col", "colgroup", 3090 * > "frame", "head", "tbody", "td", "tfoot", "th", "thead", "tr" 3091 */ 3092 case '+CAPTION': 3093 case '+COL': 3094 case '+COLGROUP': 3095 case '+FRAME': 3096 case '+HEAD': 3097 case '+TBODY': 3098 case '+TD': 3099 case '+TFOOT': 3100 case '+TH': 3101 case '+THEAD': 3102 case '+TR': 3103 // Parse error. Ignore the token. 3104 return $this->step(); 3105 } 3106 3107 if ( ! parent::is_tag_closer() ) { 3108 /* 3109 * > Any other start tag 3110 */ 3111 $this->reconstruct_active_formatting_elements(); 3112 $this->insert_html_element( $this->state->current_token ); 3113 return true; 3114 } else { 3115 /* 3116 * > Any other end tag 3117 */ 3118 3119 /* 3120 * Find the corresponding tag opener in the stack of open elements, if 3121 * it exists before reaching a special element, which provides a kind 3122 * of boundary in the stack. For example, a `</custom-tag>` should not 3123 * close anything beyond its containing `P` or `DIV` element. 3124 */ 3125 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) { 3126 if ( 'html' === $node->namespace && $token_name === $node->node_name ) { 3127 break; 3128 } 3129 3130 if ( self::is_special( $node ) ) { 3131 // This is a parse error, ignore the token. 3132 return $this->step(); 3133 } 3134 } 3135 3136 $this->generate_implied_end_tags( $token_name ); 3137 if ( $node !== $this->state->stack_of_open_elements->current_node() ) { 3138 // @todo Record parse error: this error doesn't impact parsing. 3139 } 3140 3141 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { 3142 $this->state->stack_of_open_elements->pop(); 3143 if ( $node === $item ) { 3144 return true; 3145 } 3146 } 3147 } 3148 3149 $this->bail( 'Should not have been able to reach end of IN BODY processing. Check HTML API code.' ); 3150 // This unnecessary return prevents tools from inaccurately reporting type errors. 3151 return false; 3152 } 3153 3154 /** 3155 * Parses next element in the 'in table' insertion mode. 3156 * 3157 * This internal function performs the 'in table' insertion mode 3158 * logic for the generalized WP_HTML_Processor::step() function. 3159 * 3160 * @since 6.7.0 3161 * 3162 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3163 * 3164 * @see https://html.spec.whatwg.org/#parsing-main-intable 3165 * @see WP_HTML_Processor::step 3166 * 3167 * @return bool Whether an element was found. 3168 */ 3169 private function step_in_table(): bool { 3170 $token_name = $this->get_token_name(); 3171 $token_type = $this->get_token_type(); 3172 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 3173 $op = "{$op_sigil}{$token_name}"; 3174 3175 switch ( $op ) { 3176 /* 3177 * > A character token, if the current node is table, 3178 * > tbody, template, tfoot, thead, or tr element 3179 */ 3180 case '#text': 3181 $current_node = $this->state->stack_of_open_elements->current_node(); 3182 $current_node_name = $current_node ? $current_node->node_name : null; 3183 if ( 3184 $current_node_name && ( 3185 'TABLE' === $current_node_name || 3186 'TBODY' === $current_node_name || 3187 'TEMPLATE' === $current_node_name || 3188 'TFOOT' === $current_node_name || 3189 'THEAD' === $current_node_name || 3190 'TR' === $current_node_name 3191 ) 3192 ) { 3193 /* 3194 * If the text is empty after processing HTML entities and stripping 3195 * U+0000 NULL bytes then ignore the token. 3196 */ 3197 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) { 3198 return $this->step(); 3199 } 3200 3201 /* 3202 * This follows the rules for "in table text" insertion mode. 3203 * 3204 * Whitespace-only text nodes are inserted in-place. Otherwise 3205 * foster parenting is enabled and the nodes would be 3206 * inserted out-of-place. 3207 * 3208 * > If any of the tokens in the pending table character tokens 3209 * > list are character tokens that are not ASCII whitespace, 3210 * > then this is a parse error: reprocess the character tokens 3211 * > in the pending table character tokens list using the rules 3212 * > given in the "anything else" entry in the "in table" 3213 * > insertion mode. 3214 * > 3215 * > Otherwise, insert the characters given by the pending table 3216 * > character tokens list. 3217 * 3218 * @see https://html.spec.whatwg.org/#parsing-main-intabletext 3219 */ 3220 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 3221 $this->insert_html_element( $this->state->current_token ); 3222 return true; 3223 } 3224 3225 // Non-whitespace would trigger fostering, unsupported at this time. 3226 $this->bail( 'Foster parenting is not supported.' ); 3227 break; 3228 } 3229 break; 3230 3231 /* 3232 * > A comment token 3233 */ 3234 case '#comment': 3235 case '#funky-comment': 3236 case '#presumptuous-tag': 3237 $this->insert_html_element( $this->state->current_token ); 3238 return true; 3239 3240 /* 3241 * > A DOCTYPE token 3242 */ 3243 case 'html': 3244 // Parse error: ignore the token. 3245 return $this->step(); 3246 3247 /* 3248 * > A start tag whose tag name is "caption" 3249 */ 3250 case '+CAPTION': 3251 $this->state->stack_of_open_elements->clear_to_table_context(); 3252 $this->state->active_formatting_elements->insert_marker(); 3253 $this->insert_html_element( $this->state->current_token ); 3254 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION; 3255 return true; 3256 3257 /* 3258 * > A start tag whose tag name is "colgroup" 3259 */ 3260 case '+COLGROUP': 3261 $this->state->stack_of_open_elements->clear_to_table_context(); 3262 $this->insert_html_element( $this->state->current_token ); 3263 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 3264 return true; 3265 3266 /* 3267 * > A start tag whose tag name is "col" 3268 */ 3269 case '+COL': 3270 $this->state->stack_of_open_elements->clear_to_table_context(); 3271 3272 /* 3273 * > Insert an HTML element for a "colgroup" start tag token with no attributes, 3274 * > then switch the insertion mode to "in column group". 3275 */ 3276 $this->insert_virtual_node( 'COLGROUP' ); 3277 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 3278 return $this->step( self::REPROCESS_CURRENT_NODE ); 3279 3280 /* 3281 * > A start tag whose tag name is one of: "tbody", "tfoot", "thead" 3282 */ 3283 case '+TBODY': 3284 case '+TFOOT': 3285 case '+THEAD': 3286 $this->state->stack_of_open_elements->clear_to_table_context(); 3287 $this->insert_html_element( $this->state->current_token ); 3288 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3289 return true; 3290 3291 /* 3292 * > A start tag whose tag name is one of: "td", "th", "tr" 3293 */ 3294 case '+TD': 3295 case '+TH': 3296 case '+TR': 3297 $this->state->stack_of_open_elements->clear_to_table_context(); 3298 /* 3299 * > Insert an HTML element for a "tbody" start tag token with no attributes, 3300 * > then switch the insertion mode to "in table body". 3301 */ 3302 $this->insert_virtual_node( 'TBODY' ); 3303 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3304 return $this->step( self::REPROCESS_CURRENT_NODE ); 3305 3306 /* 3307 * > A start tag whose tag name is "table" 3308 * 3309 * This tag in the IN TABLE insertion mode is a parse error. 3310 */ 3311 case '+TABLE': 3312 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) { 3313 return $this->step(); 3314 } 3315 3316 $this->state->stack_of_open_elements->pop_until( 'TABLE' ); 3317 $this->reset_insertion_mode_appropriately(); 3318 return $this->step( self::REPROCESS_CURRENT_NODE ); 3319 3320 /* 3321 * > An end tag whose tag name is "table" 3322 */ 3323 case '-TABLE': 3324 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) { 3325 // @todo Indicate a parse error once it's possible. 3326 return $this->step(); 3327 } 3328 3329 $this->state->stack_of_open_elements->pop_until( 'TABLE' ); 3330 $this->reset_insertion_mode_appropriately(); 3331 return true; 3332 3333 /* 3334 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" 3335 */ 3336 case '-BODY': 3337 case '-CAPTION': 3338 case '-COL': 3339 case '-COLGROUP': 3340 case '-HTML': 3341 case '-TBODY': 3342 case '-TD': 3343 case '-TFOOT': 3344 case '-TH': 3345 case '-THEAD': 3346 case '-TR': 3347 // Parse error: ignore the token. 3348 return $this->step(); 3349 3350 /* 3351 * > A start tag whose tag name is one of: "style", "script", "template" 3352 * > An end tag whose tag name is "template" 3353 */ 3354 case '+STYLE': 3355 case '+SCRIPT': 3356 case '+TEMPLATE': 3357 case '-TEMPLATE': 3358 /* 3359 * > Process the token using the rules for the "in head" insertion mode. 3360 */ 3361 return $this->step_in_head(); 3362 3363 /* 3364 * > A start tag whose tag name is "input" 3365 * 3366 * > If the token does not have an attribute with the name "type", or if it does, but 3367 * > that attribute's value is not an ASCII case-insensitive match for the string 3368 * > "hidden", then: act as described in the "anything else" entry below. 3369 */ 3370 case '+INPUT': 3371 $type_attribute = $this->get_attribute( 'type' ); 3372 if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) { 3373 goto anything_else; 3374 } 3375 // @todo Indicate a parse error once it's possible. 3376 $this->insert_html_element( $this->state->current_token ); 3377 return true; 3378 3379 /* 3380 * > A start tag whose tag name is "form" 3381 * 3382 * This tag in the IN TABLE insertion mode is a parse error. 3383 */ 3384 case '+FORM': 3385 if ( 3386 $this->state->stack_of_open_elements->has_element_in_scope( 'TEMPLATE' ) || 3387 isset( $this->state->form_element ) 3388 ) { 3389 return $this->step(); 3390 } 3391 3392 // This FORM is special because it immediately closes and cannot have other children. 3393 $this->insert_html_element( $this->state->current_token ); 3394 $this->state->form_element = $this->state->current_token; 3395 $this->state->stack_of_open_elements->pop(); 3396 return true; 3397 } 3398 3399 /* 3400 * > Anything else 3401 * > Parse error. Enable foster parenting, process the token using the rules for the 3402 * > "in body" insertion mode, and then disable foster parenting. 3403 * 3404 * @todo Indicate a parse error once it's possible. 3405 */ 3406 anything_else: 3407 $this->bail( 'Foster parenting is not supported.' ); 3408 } 3409 3410 /** 3411 * Parses next element in the 'in table text' insertion mode. 3412 * 3413 * This internal function performs the 'in table text' insertion mode 3414 * logic for the generalized WP_HTML_Processor::step() function. 3415 * 3416 * @since 6.7.0 Stub implementation. 3417 * 3418 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3419 * 3420 * @see https://html.spec.whatwg.org/#parsing-main-intabletext 3421 * @see WP_HTML_Processor::step 3422 * 3423 * @return bool Whether an element was found. 3424 */ 3425 private function step_in_table_text(): bool { 3426 $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT . ' state.' ); 3427 } 3428 3429 /** 3430 * Parses next element in the 'in caption' insertion mode. 3431 * 3432 * This internal function performs the 'in caption' insertion mode 3433 * logic for the generalized WP_HTML_Processor::step() function. 3434 * 3435 * @since 6.7.0 3436 * 3437 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3438 * 3439 * @see https://html.spec.whatwg.org/#parsing-main-incaption 3440 * @see WP_HTML_Processor::step 3441 * 3442 * @return bool Whether an element was found. 3443 */ 3444 private function step_in_caption(): bool { 3445 $tag_name = $this->get_tag(); 3446 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3447 $op = "{$op_sigil}{$tag_name}"; 3448 3449 switch ( $op ) { 3450 /* 3451 * > An end tag whose tag name is "caption" 3452 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr" 3453 * > An end tag whose tag name is "table" 3454 * 3455 * These tag handling rules are identical except for the final instruction. 3456 * Handle them in a single block. 3457 */ 3458 case '-CAPTION': 3459 case '+CAPTION': 3460 case '+COL': 3461 case '+COLGROUP': 3462 case '+TBODY': 3463 case '+TD': 3464 case '+TFOOT': 3465 case '+TH': 3466 case '+THEAD': 3467 case '+TR': 3468 case '-TABLE': 3469 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'CAPTION' ) ) { 3470 // Parse error: ignore the token. 3471 return $this->step(); 3472 } 3473 3474 $this->generate_implied_end_tags(); 3475 if ( ! $this->state->stack_of_open_elements->current_node_is( 'CAPTION' ) ) { 3476 // @todo Indicate a parse error once it's possible. 3477 } 3478 3479 $this->state->stack_of_open_elements->pop_until( 'CAPTION' ); 3480 $this->state->active_formatting_elements->clear_up_to_last_marker(); 3481 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3482 3483 // If this is not a CAPTION end tag, the token should be reprocessed. 3484 if ( '-CAPTION' === $op ) { 3485 return true; 3486 } 3487 return $this->step( self::REPROCESS_CURRENT_NODE ); 3488 3489 /** 3490 * > An end tag whose tag name is one of: "body", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" 3491 */ 3492 case '-BODY': 3493 case '-COL': 3494 case '-COLGROUP': 3495 case '-HTML': 3496 case '-TBODY': 3497 case '-TD': 3498 case '-TFOOT': 3499 case '-TH': 3500 case '-THEAD': 3501 case '-TR': 3502 // Parse error: ignore the token. 3503 return $this->step(); 3504 } 3505 3506 /** 3507 * > Anything else 3508 * > Process the token using the rules for the "in body" insertion mode. 3509 */ 3510 return $this->step_in_body(); 3511 } 3512 3513 /** 3514 * Parses next element in the 'in column group' insertion mode. 3515 * 3516 * This internal function performs the 'in column group' insertion mode 3517 * logic for the generalized WP_HTML_Processor::step() function. 3518 * 3519 * @since 6.7.0 3520 * 3521 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3522 * 3523 * @see https://html.spec.whatwg.org/#parsing-main-incolgroup 3524 * @see WP_HTML_Processor::step 3525 * 3526 * @return bool Whether an element was found. 3527 */ 3528 private function step_in_column_group(): bool { 3529 $token_name = $this->get_token_name(); 3530 $token_type = $this->get_token_type(); 3531 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 3532 $op = "{$op_sigil}{$token_name}"; 3533 3534 switch ( $op ) { 3535 /* 3536 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 3537 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 3538 */ 3539 case '#text': 3540 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 3541 // Insert the character. 3542 $this->insert_html_element( $this->state->current_token ); 3543 return true; 3544 } 3545 3546 goto in_column_group_anything_else; 3547 break; 3548 3549 /* 3550 * > A comment token 3551 */ 3552 case '#comment': 3553 case '#funky-comment': 3554 case '#presumptuous-tag': 3555 $this->insert_html_element( $this->state->current_token ); 3556 return true; 3557 3558 /* 3559 * > A DOCTYPE token 3560 */ 3561 case 'html': 3562 // @todo Indicate a parse error once it's possible. 3563 return $this->step(); 3564 3565 /* 3566 * > A start tag whose tag name is "html" 3567 */ 3568 case '+HTML': 3569 return $this->step_in_body(); 3570 3571 /* 3572 * > A start tag whose tag name is "col" 3573 */ 3574 case '+COL': 3575 $this->insert_html_element( $this->state->current_token ); 3576 $this->state->stack_of_open_elements->pop(); 3577 return true; 3578 3579 /* 3580 * > An end tag whose tag name is "colgroup" 3581 */ 3582 case '-COLGROUP': 3583 if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) { 3584 // @todo Indicate a parse error once it's possible. 3585 return $this->step(); 3586 } 3587 $this->state->stack_of_open_elements->pop(); 3588 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3589 return true; 3590 3591 /* 3592 * > An end tag whose tag name is "col" 3593 */ 3594 case '-COL': 3595 // Parse error: ignore the token. 3596 return $this->step(); 3597 3598 /* 3599 * > A start tag whose tag name is "template" 3600 * > An end tag whose tag name is "template" 3601 */ 3602 case '+TEMPLATE': 3603 case '-TEMPLATE': 3604 return $this->step_in_head(); 3605 } 3606 3607 in_column_group_anything_else: 3608 /* 3609 * > Anything else 3610 */ 3611 if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) { 3612 // @todo Indicate a parse error once it's possible. 3613 return $this->step(); 3614 } 3615 $this->state->stack_of_open_elements->pop(); 3616 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3617 return $this->step( self::REPROCESS_CURRENT_NODE ); 3618 } 3619 3620 /** 3621 * Parses next element in the 'in table body' insertion mode. 3622 * 3623 * This internal function performs the 'in table body' insertion mode 3624 * logic for the generalized WP_HTML_Processor::step() function. 3625 * 3626 * @since 6.7.0 3627 * 3628 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3629 * 3630 * @see https://html.spec.whatwg.org/#parsing-main-intbody 3631 * @see WP_HTML_Processor::step 3632 * 3633 * @return bool Whether an element was found. 3634 */ 3635 private function step_in_table_body(): bool { 3636 $tag_name = $this->get_tag(); 3637 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3638 $op = "{$op_sigil}{$tag_name}"; 3639 3640 switch ( $op ) { 3641 /* 3642 * > A start tag whose tag name is "tr" 3643 */ 3644 case '+TR': 3645 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3646 $this->insert_html_element( $this->state->current_token ); 3647 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3648 return true; 3649 3650 /* 3651 * > A start tag whose tag name is one of: "th", "td" 3652 */ 3653 case '+TH': 3654 case '+TD': 3655 // @todo Indicate a parse error once it's possible. 3656 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3657 $this->insert_virtual_node( 'TR' ); 3658 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3659 return $this->step( self::REPROCESS_CURRENT_NODE ); 3660 3661 /* 3662 * > An end tag whose tag name is one of: "tbody", "tfoot", "thead" 3663 */ 3664 case '-TBODY': 3665 case '-TFOOT': 3666 case '-THEAD': 3667 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3668 // Parse error: ignore the token. 3669 return $this->step(); 3670 } 3671 3672 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3673 $this->state->stack_of_open_elements->pop(); 3674 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3675 return true; 3676 3677 /* 3678 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead" 3679 * > An end tag whose tag name is "table" 3680 */ 3681 case '+CAPTION': 3682 case '+COL': 3683 case '+COLGROUP': 3684 case '+TBODY': 3685 case '+TFOOT': 3686 case '+THEAD': 3687 case '-TABLE': 3688 if ( 3689 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TBODY' ) && 3690 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'THEAD' ) && 3691 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TFOOT' ) 3692 ) { 3693 // Parse error: ignore the token. 3694 return $this->step(); 3695 } 3696 $this->state->stack_of_open_elements->clear_to_table_body_context(); 3697 $this->state->stack_of_open_elements->pop(); 3698 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 3699 return $this->step( self::REPROCESS_CURRENT_NODE ); 3700 3701 /* 3702 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th", "tr" 3703 */ 3704 case '-BODY': 3705 case '-CAPTION': 3706 case '-COL': 3707 case '-COLGROUP': 3708 case '-HTML': 3709 case '-TD': 3710 case '-TH': 3711 case '-TR': 3712 // Parse error: ignore the token. 3713 return $this->step(); 3714 } 3715 3716 /* 3717 * > Anything else 3718 * > Process the token using the rules for the "in table" insertion mode. 3719 */ 3720 return $this->step_in_table(); 3721 } 3722 3723 /** 3724 * Parses next element in the 'in row' insertion mode. 3725 * 3726 * This internal function performs the 'in row' insertion mode 3727 * logic for the generalized WP_HTML_Processor::step() function. 3728 * 3729 * @since 6.7.0 3730 * 3731 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3732 * 3733 * @see https://html.spec.whatwg.org/#parsing-main-intr 3734 * @see WP_HTML_Processor::step 3735 * 3736 * @return bool Whether an element was found. 3737 */ 3738 private function step_in_row(): bool { 3739 $tag_name = $this->get_tag(); 3740 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3741 $op = "{$op_sigil}{$tag_name}"; 3742 3743 switch ( $op ) { 3744 /* 3745 * > A start tag whose tag name is one of: "th", "td" 3746 */ 3747 case '+TH': 3748 case '+TD': 3749 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3750 $this->insert_html_element( $this->state->current_token ); 3751 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL; 3752 $this->state->active_formatting_elements->insert_marker(); 3753 return true; 3754 3755 /* 3756 * > An end tag whose tag name is "tr" 3757 */ 3758 case '-TR': 3759 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) { 3760 // Parse error: ignore the token. 3761 return $this->step(); 3762 } 3763 3764 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3765 $this->state->stack_of_open_elements->pop(); 3766 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3767 return true; 3768 3769 /* 3770 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead", "tr" 3771 * > An end tag whose tag name is "table" 3772 */ 3773 case '+CAPTION': 3774 case '+COL': 3775 case '+COLGROUP': 3776 case '+TBODY': 3777 case '+TFOOT': 3778 case '+THEAD': 3779 case '+TR': 3780 case '-TABLE': 3781 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) { 3782 // Parse error: ignore the token. 3783 return $this->step(); 3784 } 3785 3786 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3787 $this->state->stack_of_open_elements->pop(); 3788 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3789 return $this->step( self::REPROCESS_CURRENT_NODE ); 3790 3791 /* 3792 * > An end tag whose tag name is one of: "tbody", "tfoot", "thead" 3793 */ 3794 case '-TBODY': 3795 case '-TFOOT': 3796 case '-THEAD': 3797 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3798 // Parse error: ignore the token. 3799 return $this->step(); 3800 } 3801 3802 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) { 3803 // Ignore the token. 3804 return $this->step(); 3805 } 3806 3807 $this->state->stack_of_open_elements->clear_to_table_row_context(); 3808 $this->state->stack_of_open_elements->pop(); 3809 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 3810 return $this->step( self::REPROCESS_CURRENT_NODE ); 3811 3812 /* 3813 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th" 3814 */ 3815 case '-BODY': 3816 case '-CAPTION': 3817 case '-COL': 3818 case '-COLGROUP': 3819 case '-HTML': 3820 case '-TD': 3821 case '-TH': 3822 // Parse error: ignore the token. 3823 return $this->step(); 3824 } 3825 3826 /* 3827 * > Anything else 3828 * > Process the token using the rules for the "in table" insertion mode. 3829 */ 3830 return $this->step_in_table(); 3831 } 3832 3833 /** 3834 * Parses next element in the 'in cell' insertion mode. 3835 * 3836 * This internal function performs the 'in cell' insertion mode 3837 * logic for the generalized WP_HTML_Processor::step() function. 3838 * 3839 * @since 6.7.0 3840 * 3841 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3842 * 3843 * @see https://html.spec.whatwg.org/#parsing-main-intd 3844 * @see WP_HTML_Processor::step 3845 * 3846 * @return bool Whether an element was found. 3847 */ 3848 private function step_in_cell(): bool { 3849 $tag_name = $this->get_tag(); 3850 $op_sigil = $this->is_tag_closer() ? '-' : '+'; 3851 $op = "{$op_sigil}{$tag_name}"; 3852 3853 switch ( $op ) { 3854 /* 3855 * > An end tag whose tag name is one of: "td", "th" 3856 */ 3857 case '-TD': 3858 case '-TH': 3859 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3860 // Parse error: ignore the token. 3861 return $this->step(); 3862 } 3863 3864 $this->generate_implied_end_tags(); 3865 3866 /* 3867 * @todo This needs to check if the current node is an HTML element, meaning that 3868 * when SVG and MathML support is added, this needs to differentiate between an 3869 * HTML element of the given name, such as `<center>`, and a foreign element of 3870 * the same given name. 3871 */ 3872 if ( ! $this->state->stack_of_open_elements->current_node_is( $tag_name ) ) { 3873 // @todo Indicate a parse error once it's possible. 3874 } 3875 3876 $this->state->stack_of_open_elements->pop_until( $tag_name ); 3877 $this->state->active_formatting_elements->clear_up_to_last_marker(); 3878 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 3879 return true; 3880 3881 /* 3882 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", 3883 * > "tfoot", "th", "thead", "tr" 3884 */ 3885 case '+CAPTION': 3886 case '+COL': 3887 case '+COLGROUP': 3888 case '+TBODY': 3889 case '+TD': 3890 case '+TFOOT': 3891 case '+TH': 3892 case '+THEAD': 3893 case '+TR': 3894 /* 3895 * > Assert: The stack of open elements has a td or th element in table scope. 3896 * 3897 * Nothing to do here, except to verify in tests that this never appears. 3898 */ 3899 3900 $this->close_cell(); 3901 return $this->step( self::REPROCESS_CURRENT_NODE ); 3902 3903 /* 3904 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html" 3905 */ 3906 case '-BODY': 3907 case '-CAPTION': 3908 case '-COL': 3909 case '-COLGROUP': 3910 case '-HTML': 3911 // Parse error: ignore the token. 3912 return $this->step(); 3913 3914 /* 3915 * > An end tag whose tag name is one of: "table", "tbody", "tfoot", "thead", "tr" 3916 */ 3917 case '-TABLE': 3918 case '-TBODY': 3919 case '-TFOOT': 3920 case '-THEAD': 3921 case '-TR': 3922 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) { 3923 // Parse error: ignore the token. 3924 return $this->step(); 3925 } 3926 $this->close_cell(); 3927 return $this->step( self::REPROCESS_CURRENT_NODE ); 3928 } 3929 3930 /* 3931 * > Anything else 3932 * > Process the token using the rules for the "in body" insertion mode. 3933 */ 3934 return $this->step_in_body(); 3935 } 3936 3937 /** 3938 * Parses next element in the 'in select' insertion mode. 3939 * 3940 * This internal function performs the 'in select' insertion mode 3941 * logic for the generalized WP_HTML_Processor::step() function. 3942 * 3943 * @since 6.7.0 3944 * 3945 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 3946 * 3947 * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect 3948 * @see WP_HTML_Processor::step 3949 * 3950 * @return bool Whether an element was found. 3951 */ 3952 private function step_in_select(): bool { 3953 $token_name = $this->get_token_name(); 3954 $token_type = $this->get_token_type(); 3955 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 3956 $op = "{$op_sigil}{$token_name}"; 3957 3958 switch ( $op ) { 3959 /* 3960 * > Any other character token 3961 */ 3962 case '#text': 3963 /* 3964 * > A character token that is U+0000 NULL 3965 * 3966 * If a text node only comprises null bytes then it should be 3967 * entirely ignored and should not return to calling code. 3968 */ 3969 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) { 3970 // Parse error: ignore the token. 3971 return $this->step(); 3972 } 3973 3974 $this->insert_html_element( $this->state->current_token ); 3975 return true; 3976 3977 /* 3978 * > A comment token 3979 */ 3980 case '#comment': 3981 case '#funky-comment': 3982 case '#presumptuous-tag': 3983 $this->insert_html_element( $this->state->current_token ); 3984 return true; 3985 3986 /* 3987 * > A DOCTYPE token 3988 */ 3989 case 'html': 3990 // Parse error: ignore the token. 3991 return $this->step(); 3992 3993 /* 3994 * > A start tag whose tag name is "html" 3995 */ 3996 case '+HTML': 3997 return $this->step_in_body(); 3998 3999 /* 4000 * > A start tag whose tag name is "option" 4001 */ 4002 case '+OPTION': 4003 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 4004 $this->state->stack_of_open_elements->pop(); 4005 } 4006 $this->insert_html_element( $this->state->current_token ); 4007 return true; 4008 4009 /* 4010 * > A start tag whose tag name is "optgroup" 4011 * > A start tag whose tag name is "hr" 4012 * 4013 * These rules are identical except for the treatment of the self-closing flag and 4014 * the subsequent pop of the HR void element, all of which is handled elsewhere in the processor. 4015 */ 4016 case '+OPTGROUP': 4017 case '+HR': 4018 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 4019 $this->state->stack_of_open_elements->pop(); 4020 } 4021 4022 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) { 4023 $this->state->stack_of_open_elements->pop(); 4024 } 4025 4026 $this->insert_html_element( $this->state->current_token ); 4027 return true; 4028 4029 /* 4030 * > An end tag whose tag name is "optgroup" 4031 */ 4032 case '-OPTGROUP': 4033 $current_node = $this->state->stack_of_open_elements->current_node(); 4034 if ( $current_node && 'OPTION' === $current_node->node_name ) { 4035 foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) { 4036 break; 4037 } 4038 if ( $parent && 'OPTGROUP' === $parent->node_name ) { 4039 $this->state->stack_of_open_elements->pop(); 4040 } 4041 } 4042 4043 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) { 4044 $this->state->stack_of_open_elements->pop(); 4045 return true; 4046 } 4047 4048 // Parse error: ignore the token. 4049 return $this->step(); 4050 4051 /* 4052 * > An end tag whose tag name is "option" 4053 */ 4054 case '-OPTION': 4055 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) { 4056 $this->state->stack_of_open_elements->pop(); 4057 return true; 4058 } 4059 4060 // Parse error: ignore the token. 4061 return $this->step(); 4062 4063 /* 4064 * > An end tag whose tag name is "select" 4065 * > A start tag whose tag name is "select" 4066 * 4067 * > It just gets treated like an end tag. 4068 */ 4069 case '-SELECT': 4070 case '+SELECT': 4071 if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) { 4072 // Parse error: ignore the token. 4073 return $this->step(); 4074 } 4075 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 4076 $this->reset_insertion_mode_appropriately(); 4077 return true; 4078 4079 /* 4080 * > A start tag whose tag name is one of: "input", "keygen", "textarea" 4081 * 4082 * All three of these tags are considered a parse error when found in this insertion mode. 4083 */ 4084 case '+INPUT': 4085 case '+KEYGEN': 4086 case '+TEXTAREA': 4087 if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) { 4088 // Ignore the token. 4089 return $this->step(); 4090 } 4091 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 4092 $this->reset_insertion_mode_appropriately(); 4093 return $this->step( self::REPROCESS_CURRENT_NODE ); 4094 4095 /* 4096 * > A start tag whose tag name is one of: "script", "template" 4097 * > An end tag whose tag name is "template" 4098 */ 4099 case '+SCRIPT': 4100 case '+TEMPLATE': 4101 case '-TEMPLATE': 4102 return $this->step_in_head(); 4103 } 4104 4105 /* 4106 * > Anything else 4107 * > Parse error: ignore the token. 4108 */ 4109 return $this->step(); 4110 } 4111 4112 /** 4113 * Parses next element in the 'in select in table' insertion mode. 4114 * 4115 * This internal function performs the 'in select in table' insertion mode 4116 * logic for the generalized WP_HTML_Processor::step() function. 4117 * 4118 * @since 6.7.0 4119 * 4120 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4121 * 4122 * @see https://html.spec.whatwg.org/#parsing-main-inselectintable 4123 * @see WP_HTML_Processor::step 4124 * 4125 * @return bool Whether an element was found. 4126 */ 4127 private function step_in_select_in_table(): bool { 4128 $token_name = $this->get_token_name(); 4129 $token_type = $this->get_token_type(); 4130 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; 4131 $op = "{$op_sigil}{$token_name}"; 4132 4133 switch ( $op ) { 4134 /* 4135 * > A start tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" 4136 */ 4137 case '+CAPTION': 4138 case '+TABLE': 4139 case '+TBODY': 4140 case '+TFOOT': 4141 case '+THEAD': 4142 case '+TR': 4143 case '+TD': 4144 case '+TH': 4145 // @todo Indicate a parse error once it's possible. 4146 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 4147 $this->reset_insertion_mode_appropriately(); 4148 return $this->step( self::REPROCESS_CURRENT_NODE ); 4149 4150 /* 4151 * > An end tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" 4152 */ 4153 case '-CAPTION': 4154 case '-TABLE': 4155 case '-TBODY': 4156 case '-TFOOT': 4157 case '-THEAD': 4158 case '-TR': 4159 case '-TD': 4160 case '-TH': 4161 // @todo Indicate a parse error once it's possible. 4162 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $token_name ) ) { 4163 return $this->step(); 4164 } 4165 $this->state->stack_of_open_elements->pop_until( 'SELECT' ); 4166 $this->reset_insertion_mode_appropriately(); 4167 return $this->step( self::REPROCESS_CURRENT_NODE ); 4168 } 4169 4170 /* 4171 * > Anything else 4172 */ 4173 return $this->step_in_select(); 4174 } 4175 4176 /** 4177 * Parses next element in the 'in template' insertion mode. 4178 * 4179 * This internal function performs the 'in template' insertion mode 4180 * logic for the generalized WP_HTML_Processor::step() function. 4181 * 4182 * @since 6.7.0 Stub implementation. 4183 * 4184 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4185 * 4186 * @see https://html.spec.whatwg.org/#parsing-main-intemplate 4187 * @see WP_HTML_Processor::step 4188 * 4189 * @return bool Whether an element was found. 4190 */ 4191 private function step_in_template(): bool { 4192 $token_name = $this->get_token_name(); 4193 $token_type = $this->get_token_type(); 4194 $is_closer = $this->is_tag_closer(); 4195 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : ''; 4196 $op = "{$op_sigil}{$token_name}"; 4197 4198 switch ( $op ) { 4199 /* 4200 * > A character token 4201 * > A comment token 4202 * > A DOCTYPE token 4203 */ 4204 case '#text': 4205 case '#comment': 4206 case '#funky-comment': 4207 case '#presumptuous-tag': 4208 case 'html': 4209 return $this->step_in_body(); 4210 4211 /* 4212 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link", 4213 * > "meta", "noframes", "script", "style", "template", "title" 4214 * > An end tag whose tag name is "template" 4215 */ 4216 case '+BASE': 4217 case '+BASEFONT': 4218 case '+BGSOUND': 4219 case '+LINK': 4220 case '+META': 4221 case '+NOFRAMES': 4222 case '+SCRIPT': 4223 case '+STYLE': 4224 case '+TEMPLATE': 4225 case '+TITLE': 4226 case '-TEMPLATE': 4227 return $this->step_in_head(); 4228 4229 /* 4230 * > A start tag whose tag name is one of: "caption", "colgroup", "tbody", "tfoot", "thead" 4231 */ 4232 case '+CAPTION': 4233 case '+COLGROUP': 4234 case '+TBODY': 4235 case '+TFOOT': 4236 case '+THEAD': 4237 array_pop( $this->state->stack_of_template_insertion_modes ); 4238 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 4239 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 4240 return $this->step( self::REPROCESS_CURRENT_NODE ); 4241 4242 /* 4243 * > A start tag whose tag name is "col" 4244 */ 4245 case '+COL': 4246 array_pop( $this->state->stack_of_template_insertion_modes ); 4247 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 4248 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 4249 return $this->step( self::REPROCESS_CURRENT_NODE ); 4250 4251 /* 4252 * > A start tag whose tag name is "tr" 4253 */ 4254 case '+TR': 4255 array_pop( $this->state->stack_of_template_insertion_modes ); 4256 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 4257 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 4258 return $this->step( self::REPROCESS_CURRENT_NODE ); 4259 4260 /* 4261 * > A start tag whose tag name is one of: "td", "th" 4262 */ 4263 case '+TD': 4264 case '+TH': 4265 array_pop( $this->state->stack_of_template_insertion_modes ); 4266 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 4267 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 4268 return $this->step( self::REPROCESS_CURRENT_NODE ); 4269 } 4270 4271 /* 4272 * > Any other start tag 4273 */ 4274 if ( ! $is_closer ) { 4275 array_pop( $this->state->stack_of_template_insertion_modes ); 4276 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 4277 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 4278 return $this->step( self::REPROCESS_CURRENT_NODE ); 4279 } 4280 4281 /* 4282 * > Any other end tag 4283 */ 4284 if ( $is_closer ) { 4285 // Parse error: ignore the token. 4286 return $this->step(); 4287 } 4288 4289 /* 4290 * > An end-of-file token 4291 */ 4292 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { 4293 // Stop parsing. 4294 return false; 4295 } 4296 4297 // @todo Indicate a parse error once it's possible. 4298 $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' ); 4299 $this->state->active_formatting_elements->clear_up_to_last_marker(); 4300 array_pop( $this->state->stack_of_template_insertion_modes ); 4301 $this->reset_insertion_mode_appropriately(); 4302 return $this->step( self::REPROCESS_CURRENT_NODE ); 4303 } 4304 4305 /** 4306 * Parses next element in the 'after body' insertion mode. 4307 * 4308 * This internal function performs the 'after body' insertion mode 4309 * logic for the generalized WP_HTML_Processor::step() function. 4310 * 4311 * @since 6.7.0 Stub implementation. 4312 * 4313 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4314 * 4315 * @see https://html.spec.whatwg.org/#parsing-main-afterbody 4316 * @see WP_HTML_Processor::step 4317 * 4318 * @return bool Whether an element was found. 4319 */ 4320 private function step_after_body(): bool { 4321 $tag_name = $this->get_token_name(); 4322 $token_type = $this->get_token_type(); 4323 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4324 $op = "{$op_sigil}{$tag_name}"; 4325 4326 switch ( $op ) { 4327 /* 4328 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4329 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4330 * 4331 * > Process the token using the rules for the "in body" insertion mode. 4332 */ 4333 case '#text': 4334 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4335 return $this->step_in_body(); 4336 } 4337 goto after_body_anything_else; 4338 break; 4339 4340 /* 4341 * > A comment token 4342 */ 4343 case '#comment': 4344 case '#funky-comment': 4345 case '#presumptuous-tag': 4346 $this->bail( 'Content outside of BODY is unsupported.' ); 4347 break; 4348 4349 /* 4350 * > A DOCTYPE token 4351 */ 4352 case 'html': 4353 // Parse error: ignore the token. 4354 return $this->step(); 4355 4356 /* 4357 * > A start tag whose tag name is "html" 4358 */ 4359 case '+HTML': 4360 return $this->step_in_body(); 4361 4362 /* 4363 * > An end tag whose tag name is "html" 4364 * 4365 * > If the parser was created as part of the HTML fragment parsing algorithm, 4366 * > this is a parse error; ignore the token. (fragment case) 4367 * > 4368 * > Otherwise, switch the insertion mode to "after after body". 4369 */ 4370 case '-HTML': 4371 if ( isset( $this->context_node ) ) { 4372 return $this->step(); 4373 } 4374 4375 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY; 4376 return true; 4377 } 4378 4379 /* 4380 * > Parse error. Switch the insertion mode to "in body" and reprocess the token. 4381 */ 4382 after_body_anything_else: 4383 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 4384 return $this->step( self::REPROCESS_CURRENT_NODE ); 4385 } 4386 4387 /** 4388 * Parses next element in the 'in frameset' insertion mode. 4389 * 4390 * This internal function performs the 'in frameset' insertion mode 4391 * logic for the generalized WP_HTML_Processor::step() function. 4392 * 4393 * @since 6.7.0 Stub implementation. 4394 * 4395 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4396 * 4397 * @see https://html.spec.whatwg.org/#parsing-main-inframeset 4398 * @see WP_HTML_Processor::step 4399 * 4400 * @return bool Whether an element was found. 4401 */ 4402 private function step_in_frameset(): bool { 4403 $tag_name = $this->get_token_name(); 4404 $token_type = $this->get_token_type(); 4405 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4406 $op = "{$op_sigil}{$tag_name}"; 4407 4408 switch ( $op ) { 4409 /* 4410 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4411 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4412 * > 4413 * > Insert the character. 4414 * 4415 * This algorithm effectively strips non-whitespace characters from text and inserts 4416 * them under HTML. This is not supported at this time. 4417 */ 4418 case '#text': 4419 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4420 return $this->step_in_body(); 4421 } 4422 $this->bail( 'Non-whitespace characters cannot be handled in frameset.' ); 4423 break; 4424 4425 /* 4426 * > A comment token 4427 */ 4428 case '#comment': 4429 case '#funky-comment': 4430 case '#presumptuous-tag': 4431 $this->insert_html_element( $this->state->current_token ); 4432 return true; 4433 4434 /* 4435 * > A DOCTYPE token 4436 */ 4437 case 'html': 4438 // Parse error: ignore the token. 4439 return $this->step(); 4440 4441 /* 4442 * > A start tag whose tag name is "html" 4443 */ 4444 case '+HTML': 4445 return $this->step_in_body(); 4446 4447 /* 4448 * > A start tag whose tag name is "frameset" 4449 */ 4450 case '+FRAMESET': 4451 $this->insert_html_element( $this->state->current_token ); 4452 return true; 4453 4454 /* 4455 * > An end tag whose tag name is "frameset" 4456 */ 4457 case '-FRAMESET': 4458 /* 4459 * > If the current node is the root html element, then this is a parse error; 4460 * > ignore the token. (fragment case) 4461 */ 4462 if ( $this->state->stack_of_open_elements->current_node_is( 'HTML' ) ) { 4463 return $this->step(); 4464 } 4465 4466 /* 4467 * > Otherwise, pop the current node from the stack of open elements. 4468 */ 4469 $this->state->stack_of_open_elements->pop(); 4470 4471 /* 4472 * > If the parser was not created as part of the HTML fragment parsing algorithm 4473 * > (fragment case), and the current node is no longer a frameset element, then 4474 * > switch the insertion mode to "after frameset". 4475 */ 4476 if ( ! isset( $this->context_node ) && ! $this->state->stack_of_open_elements->current_node_is( 'FRAMESET' ) ) { 4477 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET; 4478 } 4479 4480 return true; 4481 4482 /* 4483 * > A start tag whose tag name is "frame" 4484 * 4485 * > Insert an HTML element for the token. Immediately pop the 4486 * > current node off the stack of open elements. 4487 * > 4488 * > Acknowledge the token's self-closing flag, if it is set. 4489 */ 4490 case '+FRAME': 4491 $this->insert_html_element( $this->state->current_token ); 4492 $this->state->stack_of_open_elements->pop(); 4493 return true; 4494 4495 /* 4496 * > A start tag whose tag name is "noframes" 4497 */ 4498 case '+NOFRAMES': 4499 return $this->step_in_head(); 4500 } 4501 4502 // Parse error: ignore the token. 4503 return $this->step(); 4504 } 4505 4506 /** 4507 * Parses next element in the 'after frameset' insertion mode. 4508 * 4509 * This internal function performs the 'after frameset' insertion mode 4510 * logic for the generalized WP_HTML_Processor::step() function. 4511 * 4512 * @since 6.7.0 Stub implementation. 4513 * 4514 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4515 * 4516 * @see https://html.spec.whatwg.org/#parsing-main-afterframeset 4517 * @see WP_HTML_Processor::step 4518 * 4519 * @return bool Whether an element was found. 4520 */ 4521 private function step_after_frameset(): bool { 4522 $tag_name = $this->get_token_name(); 4523 $token_type = $this->get_token_type(); 4524 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4525 $op = "{$op_sigil}{$tag_name}"; 4526 4527 switch ( $op ) { 4528 /* 4529 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4530 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4531 * > 4532 * > Insert the character. 4533 * 4534 * This algorithm effectively strips non-whitespace characters from text and inserts 4535 * them under HTML. This is not supported at this time. 4536 */ 4537 case '#text': 4538 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4539 return $this->step_in_body(); 4540 } 4541 $this->bail( 'Non-whitespace characters cannot be handled in after frameset' ); 4542 break; 4543 4544 /* 4545 * > A comment token 4546 */ 4547 case '#comment': 4548 case '#funky-comment': 4549 case '#presumptuous-tag': 4550 $this->insert_html_element( $this->state->current_token ); 4551 return true; 4552 4553 /* 4554 * > A DOCTYPE token 4555 */ 4556 case 'html': 4557 // Parse error: ignore the token. 4558 return $this->step(); 4559 4560 /* 4561 * > A start tag whose tag name is "html" 4562 */ 4563 case '+HTML': 4564 return $this->step_in_body(); 4565 4566 /* 4567 * > An end tag whose tag name is "html" 4568 */ 4569 case '-HTML': 4570 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET; 4571 return true; 4572 4573 /* 4574 * > A start tag whose tag name is "noframes" 4575 */ 4576 case '+NOFRAMES': 4577 return $this->step_in_head(); 4578 } 4579 4580 // Parse error: ignore the token. 4581 return $this->step(); 4582 } 4583 4584 /** 4585 * Parses next element in the 'after after body' insertion mode. 4586 * 4587 * This internal function performs the 'after after body' insertion mode 4588 * logic for the generalized WP_HTML_Processor::step() function. 4589 * 4590 * @since 6.7.0 Stub implementation. 4591 * 4592 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4593 * 4594 * @see https://html.spec.whatwg.org/#the-after-after-body-insertion-mode 4595 * @see WP_HTML_Processor::step 4596 * 4597 * @return bool Whether an element was found. 4598 */ 4599 private function step_after_after_body(): bool { 4600 $tag_name = $this->get_token_name(); 4601 $token_type = $this->get_token_type(); 4602 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4603 $op = "{$op_sigil}{$tag_name}"; 4604 4605 switch ( $op ) { 4606 /* 4607 * > A comment token 4608 */ 4609 case '#comment': 4610 case '#funky-comment': 4611 case '#presumptuous-tag': 4612 $this->bail( 'Content outside of HTML is unsupported.' ); 4613 break; 4614 4615 /* 4616 * > A DOCTYPE token 4617 * > A start tag whose tag name is "html" 4618 * 4619 * > Process the token using the rules for the "in body" insertion mode. 4620 */ 4621 case 'html': 4622 case '+HTML': 4623 return $this->step_in_body(); 4624 4625 /* 4626 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4627 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4628 * > 4629 * > Process the token using the rules for the "in body" insertion mode. 4630 */ 4631 case '#text': 4632 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4633 return $this->step_in_body(); 4634 } 4635 goto after_after_body_anything_else; 4636 break; 4637 } 4638 4639 /* 4640 * > Parse error. Switch the insertion mode to "in body" and reprocess the token. 4641 */ 4642 after_after_body_anything_else: 4643 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 4644 return $this->step( self::REPROCESS_CURRENT_NODE ); 4645 } 4646 4647 /** 4648 * Parses next element in the 'after after frameset' insertion mode. 4649 * 4650 * This internal function performs the 'after after frameset' insertion mode 4651 * logic for the generalized WP_HTML_Processor::step() function. 4652 * 4653 * @since 6.7.0 Stub implementation. 4654 * 4655 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4656 * 4657 * @see https://html.spec.whatwg.org/#the-after-after-frameset-insertion-mode 4658 * @see WP_HTML_Processor::step 4659 * 4660 * @return bool Whether an element was found. 4661 */ 4662 private function step_after_after_frameset(): bool { 4663 $tag_name = $this->get_token_name(); 4664 $token_type = $this->get_token_type(); 4665 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4666 $op = "{$op_sigil}{$tag_name}"; 4667 4668 switch ( $op ) { 4669 /* 4670 * > A comment token 4671 */ 4672 case '#comment': 4673 case '#funky-comment': 4674 case '#presumptuous-tag': 4675 $this->bail( 'Content outside of HTML is unsupported.' ); 4676 break; 4677 4678 /* 4679 * > A DOCTYPE token 4680 * > A start tag whose tag name is "html" 4681 * 4682 * > Process the token using the rules for the "in body" insertion mode. 4683 */ 4684 case 'html': 4685 case '+HTML': 4686 return $this->step_in_body(); 4687 4688 /* 4689 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), 4690 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE 4691 * > 4692 * > Process the token using the rules for the "in body" insertion mode. 4693 * 4694 * This algorithm effectively strips non-whitespace characters from text and inserts 4695 * them under HTML. This is not supported at this time. 4696 */ 4697 case '#text': 4698 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) { 4699 return $this->step_in_body(); 4700 } 4701 $this->bail( 'Non-whitespace characters cannot be handled in after after frameset.' ); 4702 break; 4703 4704 /* 4705 * > A start tag whose tag name is "noframes" 4706 */ 4707 case '+NOFRAMES': 4708 return $this->step_in_head(); 4709 } 4710 4711 // Parse error: ignore the token. 4712 return $this->step(); 4713 } 4714 4715 /** 4716 * Parses next element in the 'in foreign content' insertion mode. 4717 * 4718 * This internal function performs the 'in foreign content' insertion mode 4719 * logic for the generalized WP_HTML_Processor::step() function. 4720 * 4721 * @since 6.7.0 Stub implementation. 4722 * 4723 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 4724 * 4725 * @see https://html.spec.whatwg.org/#parsing-main-inforeign 4726 * @see WP_HTML_Processor::step 4727 * 4728 * @return bool Whether an element was found. 4729 */ 4730 private function step_in_foreign_content(): bool { 4731 $tag_name = $this->get_token_name(); 4732 $token_type = $this->get_token_type(); 4733 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : ''; 4734 $op = "{$op_sigil}{$tag_name}"; 4735 4736 /* 4737 * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size" 4738 * 4739 * This section drawn out above the switch to more easily incorporate 4740 * the additional rules based on the presence of the attributes. 4741 */ 4742 if ( 4743 '+FONT' === $op && 4744 ( 4745 null !== $this->get_attribute( 'color' ) || 4746 null !== $this->get_attribute( 'face' ) || 4747 null !== $this->get_attribute( 'size' ) 4748 ) 4749 ) { 4750 $op = '+FONT with attributes'; 4751 } 4752 4753 switch ( $op ) { 4754 case '#text': 4755 /* 4756 * > A character token that is U+0000 NULL 4757 * 4758 * This is handled by `get_modifiable_text()`. 4759 */ 4760 4761 /* 4762 * Whitespace-only text does not affect the frameset-ok flag. 4763 * It is probably inter-element whitespace, but it may also 4764 * contain character references which decode only to whitespace. 4765 */ 4766 if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) { 4767 $this->state->frameset_ok = false; 4768 } 4769 4770 $this->insert_foreign_element( $this->state->current_token, false ); 4771 return true; 4772 4773 /* 4774 * CDATA sections are alternate wrappers for text content and therefore 4775 * ought to follow the same rules as text nodes. 4776 */ 4777 case '#cdata-section': 4778 /* 4779 * NULL bytes and whitespace do not change the frameset-ok flag. 4780 */ 4781 $current_token = $this->bookmarks[ $this->state->current_token->bookmark_name ]; 4782 $cdata_content_start = $current_token->start + 9; 4783 $cdata_content_length = $current_token->length - 12; 4784 if ( strspn( $this->html, "\0 \t\n\f\r", $cdata_content_start, $cdata_content_length ) !== $cdata_content_length ) { 4785 $this->state->frameset_ok = false; 4786 } 4787 4788 $this->insert_foreign_element( $this->state->current_token, false ); 4789 return true; 4790 4791 /* 4792 * > A comment token 4793 */ 4794 case '#comment': 4795 case '#funky-comment': 4796 case '#presumptuous-tag': 4797 $this->insert_foreign_element( $this->state->current_token, false ); 4798 return true; 4799 4800 /* 4801 * > A DOCTYPE token 4802 */ 4803 case 'html': 4804 // Parse error: ignore the token. 4805 return $this->step(); 4806 4807 /* 4808 * > A start tag whose tag name is "b", "big", "blockquote", "body", "br", "center", 4809 * > "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5", 4810 * > "h6", "head", "hr", "i", "img", "li", "listing", "menu", "meta", "nobr", "ol", 4811 * > "p", "pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup", 4812 * > "table", "tt", "u", "ul", "var" 4813 * 4814 * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size" 4815 * 4816 * > An end tag whose tag name is "br", "p" 4817 * 4818 * Closing BR tags are always reported by the Tag Processor as opening tags. 4819 */ 4820 case '+B': 4821 case '+BIG': 4822 case '+BLOCKQUOTE': 4823 case '+BODY': 4824 case '+BR': 4825 case '+CENTER': 4826 case '+CODE': 4827 case '+DD': 4828 case '+DIV': 4829 case '+DL': 4830 case '+DT': 4831 case '+EM': 4832 case '+EMBED': 4833 case '+H1': 4834 case '+H2': 4835 case '+H3': 4836 case '+H4': 4837 case '+H5': 4838 case '+H6': 4839 case '+HEAD': 4840 case '+HR': 4841 case '+I': 4842 case '+IMG': 4843 case '+LI': 4844 case '+LISTING': 4845 case '+MENU': 4846 case '+META': 4847 case '+NOBR': 4848 case '+OL': 4849 case '+P': 4850 case '+PRE': 4851 case '+RUBY': 4852 case '+S': 4853 case '+SMALL': 4854 case '+SPAN': 4855 case '+STRONG': 4856 case '+STRIKE': 4857 case '+SUB': 4858 case '+SUP': 4859 case '+TABLE': 4860 case '+TT': 4861 case '+U': 4862 case '+UL': 4863 case '+VAR': 4864 case '+FONT with attributes': 4865 case '-BR': 4866 case '-P': 4867 // @todo Indicate a parse error once it's possible. 4868 foreach ( $this->state->stack_of_open_elements->walk_up() as $current_node ) { 4869 if ( 4870 'math' === $current_node->integration_node_type || 4871 'html' === $current_node->integration_node_type || 4872 'html' === $current_node->namespace 4873 ) { 4874 break; 4875 } 4876 4877 $this->state->stack_of_open_elements->pop(); 4878 } 4879 goto in_foreign_content_process_in_current_insertion_mode; 4880 } 4881 4882 /* 4883 * > Any other start tag 4884 */ 4885 if ( ! $this->is_tag_closer() ) { 4886 $this->insert_foreign_element( $this->state->current_token, false ); 4887 4888 /* 4889 * > If the token has its self-closing flag set, then run 4890 * > the appropriate steps from the following list: 4891 * > 4892 * > ↪ the token's tag name is "script", and the new current node is in the SVG namespace 4893 * > Acknowledge the token's self-closing flag, and then act as 4894 * > described in the steps for a "script" end tag below. 4895 * > 4896 * > ↪ Otherwise 4897 * > Pop the current node off the stack of open elements and 4898 * > acknowledge the token's self-closing flag. 4899 * 4900 * Since the rules for SCRIPT below indicate to pop the element off of the stack of 4901 * open elements, which is the same for the Otherwise condition, there's no need to 4902 * separate these checks. The difference comes when a parser operates with the scripting 4903 * flag enabled, and executes the script, which this parser does not support. 4904 */ 4905 if ( $this->state->current_token->has_self_closing_flag ) { 4906 $this->state->stack_of_open_elements->pop(); 4907 } 4908 return true; 4909 } 4910 4911 /* 4912 * > An end tag whose name is "script", if the current node is an SVG script element. 4913 */ 4914 if ( $this->is_tag_closer() && 'SCRIPT' === $this->state->current_token->node_name && 'svg' === $this->state->current_token->namespace ) { 4915 $this->state->stack_of_open_elements->pop(); 4916 return true; 4917 } 4918 4919 /* 4920 * > Any other end tag 4921 */ 4922 if ( $this->is_tag_closer() ) { 4923 $node = $this->state->stack_of_open_elements->current_node(); 4924 if ( $tag_name !== $node->node_name ) { 4925 // @todo Indicate a parse error once it's possible. 4926 } 4927 in_foreign_content_end_tag_loop: 4928 if ( $node === $this->state->stack_of_open_elements->at( 1 ) ) { 4929 return true; 4930 } 4931 4932 /* 4933 * > If node's tag name, converted to ASCII lowercase, is the same as the tag name 4934 * > of the token, pop elements from the stack of open elements until node has 4935 * > been popped from the stack, and then return. 4936 */ 4937 if ( 0 === strcasecmp( $node->node_name, $tag_name ) ) { 4938 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { 4939 $this->state->stack_of_open_elements->pop(); 4940 if ( $node === $item ) { 4941 return true; 4942 } 4943 } 4944 } 4945 4946 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) { 4947 $node = $item; 4948 break; 4949 } 4950 4951 if ( 'html' !== $node->namespace ) { 4952 goto in_foreign_content_end_tag_loop; 4953 } 4954 4955 in_foreign_content_process_in_current_insertion_mode: 4956 switch ( $this->state->insertion_mode ) { 4957 case WP_HTML_Processor_State::INSERTION_MODE_INITIAL: 4958 return $this->step_initial(); 4959 4960 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML: 4961 return $this->step_before_html(); 4962 4963 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD: 4964 return $this->step_before_head(); 4965 4966 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD: 4967 return $this->step_in_head(); 4968 4969 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT: 4970 return $this->step_in_head_noscript(); 4971 4972 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD: 4973 return $this->step_after_head(); 4974 4975 case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY: 4976 return $this->step_in_body(); 4977 4978 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE: 4979 return $this->step_in_table(); 4980 4981 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT: 4982 return $this->step_in_table_text(); 4983 4984 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION: 4985 return $this->step_in_caption(); 4986 4987 case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP: 4988 return $this->step_in_column_group(); 4989 4990 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY: 4991 return $this->step_in_table_body(); 4992 4993 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW: 4994 return $this->step_in_row(); 4995 4996 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL: 4997 return $this->step_in_cell(); 4998 4999 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT: 5000 return $this->step_in_select(); 5001 5002 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE: 5003 return $this->step_in_select_in_table(); 5004 5005 case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE: 5006 return $this->step_in_template(); 5007 5008 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY: 5009 return $this->step_after_body(); 5010 5011 case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET: 5012 return $this->step_in_frameset(); 5013 5014 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET: 5015 return $this->step_after_frameset(); 5016 5017 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY: 5018 return $this->step_after_after_body(); 5019 5020 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET: 5021 return $this->step_after_after_frameset(); 5022 5023 // This should be unreachable but PHP doesn't have total type checking on switch. 5024 default: 5025 $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." ); 5026 } 5027 } 5028 5029 $this->bail( 'Should not have been able to reach end of IN FOREIGN CONTENT processing. Check HTML API code.' ); 5030 // This unnecessary return prevents tools from inaccurately reporting type errors. 5031 return false; 5032 } 5033 5034 /* 5035 * Internal helpers 5036 */ 5037 5038 /** 5039 * Creates a new bookmark for the currently-matched token and returns the generated name. 5040 * 5041 * @since 6.4.0 5042 * @since 6.5.0 Renamed from bookmark_tag() to bookmark_token(). 5043 * 5044 * @throws Exception When unable to allocate requested bookmark. 5045 * 5046 * @return string|false Name of created bookmark, or false if unable to create. 5047 */ 5048 private function bookmark_token() { 5049 if ( ! parent::set_bookmark( ++$this->bookmark_counter ) ) { 5050 $this->last_error = self::ERROR_EXCEEDED_MAX_BOOKMARKS; 5051 throw new Exception( 'could not allocate bookmark' ); 5052 } 5053 5054 return "{$this->bookmark_counter}"; 5055 } 5056 5057 /* 5058 * HTML semantic overrides for Tag Processor 5059 */ 5060 5061 /** 5062 * Indicates the namespace of the current token, or "html" if there is none. 5063 * 5064 * @return string One of "html", "math", or "svg". 5065 */ 5066 public function get_namespace(): string { 5067 if ( ! isset( $this->current_element ) ) { 5068 return parent::get_namespace(); 5069 } 5070 5071 return $this->current_element->token->namespace; 5072 } 5073 5074 /** 5075 * Returns the uppercase name of the matched tag. 5076 * 5077 * The semantic rules for HTML specify that certain tags be reprocessed 5078 * with a different tag name. Because of this, the tag name presented 5079 * by the HTML Processor may differ from the one reported by the HTML 5080 * Tag Processor, which doesn't apply these semantic rules. 5081 * 5082 * Example: 5083 * 5084 * $processor = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' ); 5085 * $processor->next_tag() === true; 5086 * $processor->get_tag() === 'DIV'; 5087 * 5088 * $processor->next_tag() === false; 5089 * $processor->get_tag() === null; 5090 * 5091 * @since 6.4.0 5092 * 5093 * @return string|null Name of currently matched tag in input HTML, or `null` if none found. 5094 */ 5095 public function get_tag(): ?string { 5096 if ( null !== $this->last_error ) { 5097 return null; 5098 } 5099 5100 if ( $this->is_virtual() ) { 5101 return $this->current_element->token->node_name; 5102 } 5103 5104 $tag_name = parent::get_tag(); 5105 5106 /* 5107 * > A start tag whose tag name is "image" 5108 * > Change the token's tag name to "img" and reprocess it. (Don't ask.) 5109 */ 5110 return ( 'IMAGE' === $tag_name && 'html' === $this->get_namespace() ) 5111 ? 'IMG' 5112 : $tag_name; 5113 } 5114 5115 /** 5116 * Indicates if the currently matched tag contains the self-closing flag. 5117 * 5118 * No HTML elements ought to have the self-closing flag and for those, the self-closing 5119 * flag will be ignored. For void elements this is benign because they "self close" 5120 * automatically. For non-void HTML elements though problems will appear if someone 5121 * intends to use a self-closing element in place of that element with an empty body. 5122 * For HTML foreign elements and custom elements the self-closing flag determines if 5123 * they self-close or not. 5124 * 5125 * This function does not determine if a tag is self-closing, 5126 * but only if the self-closing flag is present in the syntax. 5127 * 5128 * @since 6.6.0 Subclassed for the HTML Processor. 5129 * 5130 * @return bool Whether the currently matched tag contains the self-closing flag. 5131 */ 5132 public function has_self_closing_flag(): bool { 5133 return $this->is_virtual() ? false : parent::has_self_closing_flag(); 5134 } 5135 5136 /** 5137 * Returns the node name represented by the token. 5138 * 5139 * This matches the DOM API value `nodeName`. Some values 5140 * are static, such as `#text` for a text node, while others 5141 * are dynamically generated from the token itself. 5142 * 5143 * Dynamic names: 5144 * - Uppercase tag name for tag matches. 5145 * - `html` for DOCTYPE declarations. 5146 * 5147 * Note that if the Tag Processor is not matched on a token 5148 * then this function will return `null`, either because it 5149 * hasn't yet found a token or because it reached the end 5150 * of the document without matching a token. 5151 * 5152 * @since 6.6.0 Subclassed for the HTML Processor. 5153 * 5154 * @return string|null Name of the matched token. 5155 */ 5156 public function get_token_name(): ?string { 5157 return $this->is_virtual() 5158 ? $this->current_element->token->node_name 5159 : parent::get_token_name(); 5160 } 5161 5162 /** 5163 * Indicates the kind of matched token, if any. 5164 * 5165 * This differs from `get_token_name()` in that it always 5166 * returns a static string indicating the type, whereas 5167 * `get_token_name()` may return values derived from the 5168 * token itself, such as a tag name or processing 5169 * instruction tag. 5170 * 5171 * Possible values: 5172 * - `#tag` when matched on a tag. 5173 * - `#text` when matched on a text node. 5174 * - `#cdata-section` when matched on a CDATA node. 5175 * - `#comment` when matched on a comment. 5176 * - `#doctype` when matched on a DOCTYPE declaration. 5177 * - `#presumptuous-tag` when matched on an empty tag closer. 5178 * - `#funky-comment` when matched on a funky comment. 5179 * 5180 * @since 6.6.0 Subclassed for the HTML Processor. 5181 * 5182 * @return string|null What kind of token is matched, or null. 5183 */ 5184 public function get_token_type(): ?string { 5185 if ( $this->is_virtual() ) { 5186 /* 5187 * This logic comes from the Tag Processor. 5188 * 5189 * @todo It would be ideal not to repeat this here, but it's not clearly 5190 * better to allow passing a token name to `get_token_type()`. 5191 */ 5192 $node_name = $this->current_element->token->node_name; 5193 $starting_char = $node_name[0]; 5194 if ( 'A' <= $starting_char && 'Z' >= $starting_char ) { 5195 return '#tag'; 5196 } 5197 5198 if ( 'html' === $node_name ) { 5199 return '#doctype'; 5200 } 5201 5202 return $node_name; 5203 } 5204 5205 return parent::get_token_type(); 5206 } 5207 5208 /** 5209 * Returns the value of a requested attribute from a matched tag opener if that attribute exists. 5210 * 5211 * Example: 5212 * 5213 * $p = WP_HTML_Processor::create_fragment( '<div enabled class="test" data-test-id="14">Test</div>' ); 5214 * $p->next_token() === true; 5215 * $p->get_attribute( 'data-test-id' ) === '14'; 5216 * $p->get_attribute( 'enabled' ) === true; 5217 * $p->get_attribute( 'aria-label' ) === null; 5218 * 5219 * $p->next_tag() === false; 5220 * $p->get_attribute( 'class' ) === null; 5221 * 5222 * @since 6.6.0 Subclassed for HTML Processor. 5223 * 5224 * @param string $name Name of attribute whose value is requested. 5225 * @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`. 5226 */ 5227 public function get_attribute( $name ) { 5228 return $this->is_virtual() ? null : parent::get_attribute( $name ); 5229 } 5230 5231 /** 5232 * Updates or creates a new attribute on the currently matched tag with the passed value. 5233 * 5234 * For boolean attributes special handling is provided: 5235 * - When `true` is passed as the value, then only the attribute name is added to the tag. 5236 * - When `false` is passed, the attribute gets removed if it existed before. 5237 * 5238 * For string attributes, the value is escaped using the `esc_attr` function. 5239 * 5240 * @since 6.6.0 Subclassed for the HTML Processor. 5241 * 5242 * @param string $name The attribute name to target. 5243 * @param string|bool $value The new attribute value. 5244 * @return bool Whether an attribute value was set. 5245 */ 5246 public function set_attribute( $name, $value ): bool { 5247 return $this->is_virtual() ? false : parent::set_attribute( $name, $value ); 5248 } 5249 5250 /** 5251 * Remove an attribute from the currently-matched tag. 5252 * 5253 * @since 6.6.0 Subclassed for HTML Processor. 5254 * 5255 * @param string $name The attribute name to remove. 5256 * @return bool Whether an attribute was removed. 5257 */ 5258 public function remove_attribute( $name ): bool { 5259 return $this->is_virtual() ? false : parent::remove_attribute( $name ); 5260 } 5261 5262 /** 5263 * Gets lowercase names of all attributes matching a given prefix in the current tag. 5264 * 5265 * Note that matching is case-insensitive. This is in accordance with the spec: 5266 * 5267 * > There must never be two or more attributes on 5268 * > the same start tag whose names are an ASCII 5269 * > case-insensitive match for each other. 5270 * - HTML 5 spec 5271 * 5272 * Example: 5273 * 5274 * $p = new WP_HTML_Tag_Processor( '<div data-ENABLED class="test" DATA-test-id="14">Test</div>' ); 5275 * $p->next_tag( array( 'class_name' => 'test' ) ) === true; 5276 * $p->get_attribute_names_with_prefix( 'data-' ) === array( 'data-enabled', 'data-test-id' ); 5277 * 5278 * $p->next_tag() === false; 5279 * $p->get_attribute_names_with_prefix( 'data-' ) === null; 5280 * 5281 * @since 6.6.0 Subclassed for the HTML Processor. 5282 * 5283 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive 5284 * 5285 * @param string $prefix Prefix of requested attribute names. 5286 * @return array|null List of attribute names, or `null` when no tag opener is matched. 5287 */ 5288 public function get_attribute_names_with_prefix( $prefix ): ?array { 5289 return $this->is_virtual() ? null : parent::get_attribute_names_with_prefix( $prefix ); 5290 } 5291 5292 /** 5293 * Adds a new class name to the currently matched tag. 5294 * 5295 * @since 6.6.0 Subclassed for the HTML Processor. 5296 * 5297 * @param string $class_name The class name to add. 5298 * @return bool Whether the class was set to be added. 5299 */ 5300 public function add_class( $class_name ): bool { 5301 return $this->is_virtual() ? false : parent::add_class( $class_name ); 5302 } 5303 5304 /** 5305 * Removes a class name from the currently matched tag. 5306 * 5307 * @since 6.6.0 Subclassed for the HTML Processor. 5308 * 5309 * @param string $class_name The class name to remove. 5310 * @return bool Whether the class was set to be removed. 5311 */ 5312 public function remove_class( $class_name ): bool { 5313 return $this->is_virtual() ? false : parent::remove_class( $class_name ); 5314 } 5315 5316 /** 5317 * Returns if a matched tag contains the given ASCII case-insensitive class name. 5318 * 5319 * @since 6.6.0 Subclassed for the HTML Processor. 5320 * 5321 * @todo When reconstructing active formatting elements with attributes, find a way 5322 * to indicate if the virtually-reconstructed formatting elements contain the 5323 * wanted class name. 5324 * 5325 * @param string $wanted_class Look for this CSS class name, ASCII case-insensitive. 5326 * @return bool|null Whether the matched tag contains the given class name, or null if not matched. 5327 */ 5328 public function has_class( $wanted_class ): ?bool { 5329 return $this->is_virtual() ? null : parent::has_class( $wanted_class ); 5330 } 5331 5332 /** 5333 * Generator for a foreach loop to step through each class name for the matched tag. 5334 * 5335 * This generator function is designed to be used inside a "foreach" loop. 5336 * 5337 * Example: 5338 * 5339 * $p = WP_HTML_Processor::create_fragment( "<div class='free <egg<\tlang-en'>" ); 5340 * $p->next_tag(); 5341 * foreach ( $p->class_list() as $class_name ) { 5342 * echo "{$class_name} "; 5343 * } 5344 * // Outputs: "free <egg> lang-en " 5345 * 5346 * @since 6.6.0 Subclassed for the HTML Processor. 5347 */ 5348 public function class_list() { 5349 return $this->is_virtual() ? null : parent::class_list(); 5350 } 5351 5352 /** 5353 * Returns the modifiable text for a matched token, or an empty string. 5354 * 5355 * Modifiable text is text content that may be read and changed without 5356 * changing the HTML structure of the document around it. This includes 5357 * the contents of `#text` nodes in the HTML as well as the inner 5358 * contents of HTML comments, Processing Instructions, and others, even 5359 * though these nodes aren't part of a parsed DOM tree. They also contain 5360 * the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any 5361 * other section in an HTML document which cannot contain HTML markup (DATA). 5362 * 5363 * If a token has no modifiable text then an empty string is returned to 5364 * avoid needless crashing or type errors. An empty string does not mean 5365 * that a token has modifiable text, and a token with modifiable text may 5366 * have an empty string (e.g. a comment with no contents). 5367 * 5368 * @since 6.6.0 Subclassed for the HTML Processor. 5369 * 5370 * @return string 5371 */ 5372 public function get_modifiable_text(): string { 5373 return $this->is_virtual() ? '' : parent::get_modifiable_text(); 5374 } 5375 5376 /** 5377 * Indicates what kind of comment produced the comment node. 5378 * 5379 * Because there are different kinds of HTML syntax which produce 5380 * comments, the Tag Processor tracks and exposes this as a type 5381 * for the comment. Nominally only regular HTML comments exist as 5382 * they are commonly known, but a number of unrelated syntax errors 5383 * also produce comments. 5384 * 5385 * @see self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT 5386 * @see self::COMMENT_AS_CDATA_LOOKALIKE 5387 * @see self::COMMENT_AS_INVALID_HTML 5388 * @see self::COMMENT_AS_HTML_COMMENT 5389 * @see self::COMMENT_AS_PI_NODE_LOOKALIKE 5390 * 5391 * @since 6.6.0 Subclassed for the HTML Processor. 5392 * 5393 * @return string|null 5394 */ 5395 public function get_comment_type(): ?string { 5396 return $this->is_virtual() ? null : parent::get_comment_type(); 5397 } 5398 5399 /** 5400 * Removes a bookmark that is no longer needed. 5401 * 5402 * Releasing a bookmark frees up the small 5403 * performance overhead it requires. 5404 * 5405 * @since 6.4.0 5406 * 5407 * @param string $bookmark_name Name of the bookmark to remove. 5408 * @return bool Whether the bookmark already existed before removal. 5409 */ 5410 public function release_bookmark( $bookmark_name ): bool { 5411 return parent::release_bookmark( "_{$bookmark_name}" ); 5412 } 5413 5414 /** 5415 * Moves the internal cursor in the HTML Processor to a given bookmark's location. 5416 * 5417 * Be careful! Seeking backwards to a previous location resets the parser to the 5418 * start of the document and reparses the entire contents up until it finds the 5419 * sought-after bookmarked location. 5420 * 5421 * In order to prevent accidental infinite loops, there's a 5422 * maximum limit on the number of times seek() can be called. 5423 * 5424 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. 5425 * 5426 * @since 6.4.0 5427 * 5428 * @param string $bookmark_name Jump to the place in the document identified by this bookmark name. 5429 * @return bool Whether the internal cursor was successfully moved to the bookmark's location. 5430 */ 5431 public function seek( $bookmark_name ): bool { 5432 // Flush any pending updates to the document before beginning. 5433 $this->get_updated_html(); 5434 5435 $actual_bookmark_name = "_{$bookmark_name}"; 5436 $processor_started_at = $this->state->current_token 5437 ? $this->bookmarks[ $this->state->current_token->bookmark_name ]->start 5438 : 0; 5439 $bookmark_starts_at = $this->bookmarks[ $actual_bookmark_name ]->start; 5440 $direction = $bookmark_starts_at > $processor_started_at ? 'forward' : 'backward'; 5441 5442 /* 5443 * If seeking backwards, it's possible that the sought-after bookmark exists within an element 5444 * which has been closed before the current cursor; in other words, it has already been removed 5445 * from the stack of open elements. This means that it's insufficient to simply pop off elements 5446 * from the stack of open elements which appear after the bookmarked location and then jump to 5447 * that location, as the elements which were open before won't be re-opened. 5448 * 5449 * In order to maintain consistency, the HTML Processor rewinds to the start of the document 5450 * and reparses everything until it finds the sought-after bookmark. 5451 * 5452 * There are potentially better ways to do this: cache the parser state for each bookmark and 5453 * restore it when seeking; store an immutable and idempotent register of where elements open 5454 * and close. 5455 * 5456 * If caching the parser state it will be essential to properly maintain the cached stack of 5457 * open elements and active formatting elements when modifying the document. This could be a 5458 * tedious and time-consuming process as well, and so for now will not be performed. 5459 * 5460 * It may be possible to track bookmarks for where elements open and close, and in doing so 5461 * be able to quickly recalculate breadcrumbs for any element in the document. It may even 5462 * be possible to remove the stack of open elements and compute it on the fly this way. 5463 * If doing this, the parser would need to track the opening and closing locations for all 5464 * tokens in the breadcrumb path for any and all bookmarks. By utilizing bookmarks themselves 5465 * this list could be automatically maintained while modifying the document. Finding the 5466 * breadcrumbs would then amount to traversing that list from the start until the token 5467 * being inspected. Once an element closes, if there are no bookmarks pointing to locations 5468 * within that element, then all of these locations may be forgotten to save on memory use 5469 * and computation time. 5470 */ 5471 if ( 'backward' === $direction ) { 5472 5473 /* 5474 * When moving backward, stateful stacks should be cleared. 5475 */ 5476 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { 5477 $this->state->stack_of_open_elements->remove_node( $item ); 5478 } 5479 5480 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { 5481 $this->state->active_formatting_elements->remove_node( $item ); 5482 } 5483 5484 /* 5485 * **After** clearing stacks, more processor state can be reset. 5486 * This must be done after clearing the stack because those stacks generate events that 5487 * would appear on a subsequent call to `next_token()`. 5488 */ 5489 $this->state->frameset_ok = true; 5490 $this->state->stack_of_template_insertion_modes = array(); 5491 $this->state->head_element = null; 5492 $this->state->form_element = null; 5493 $this->state->current_token = null; 5494 $this->current_element = null; 5495 $this->element_queue = array(); 5496 5497 /* 5498 * The absence of a context node indicates a full parse. 5499 * The presence of a context node indicates a fragment parser. 5500 */ 5501 if ( null === $this->context_node ) { 5502 $this->change_parsing_namespace( 'html' ); 5503 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_INITIAL; 5504 $this->breadcrumbs = array(); 5505 5506 $this->bookmarks['initial'] = new WP_HTML_Span( 0, 0 ); 5507 parent::seek( 'initial' ); 5508 unset( $this->bookmarks['initial'] ); 5509 } else { 5510 5511 /* 5512 * Push the root-node (HTML) back onto the stack of open elements. 5513 * 5514 * Fragment parsers require this extra bit of setup. 5515 * It's handled in full parsers by advancing the processor state. 5516 */ 5517 $this->state->stack_of_open_elements->push( 5518 new WP_HTML_Token( 5519 'root-node', 5520 'HTML', 5521 false 5522 ) 5523 ); 5524 5525 $this->change_parsing_namespace( 5526 $this->context_node->integration_node_type 5527 ? 'html' 5528 : $this->context_node->namespace 5529 ); 5530 5531 if ( 'TEMPLATE' === $this->context_node->node_name ) { 5532 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE; 5533 } 5534 5535 $this->reset_insertion_mode_appropriately(); 5536 $this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 ); 5537 parent::seek( $this->context_node->bookmark_name ); 5538 } 5539 } 5540 5541 /* 5542 * Here, the processor moves forward through the document until it matches the bookmark. 5543 * do-while is used here because the processor is expected to already be stopped on 5544 * a token than may match the bookmarked location. 5545 */ 5546 do { 5547 /* 5548 * The processor will stop on virtual tokens, but bookmarks may not be set on them. 5549 * They should not be matched when seeking a bookmark, skip them. 5550 */ 5551 if ( $this->is_virtual() ) { 5552 continue; 5553 } 5554 if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) { 5555 return true; 5556 } 5557 } while ( $this->next_token() ); 5558 5559 return false; 5560 } 5561 5562 /** 5563 * Sets a bookmark in the HTML document. 5564 * 5565 * Bookmarks represent specific places or tokens in the HTML 5566 * document, such as a tag opener or closer. When applying 5567 * edits to a document, such as setting an attribute, the 5568 * text offsets of that token may shift; the bookmark is 5569 * kept updated with those shifts and remains stable unless 5570 * the entire span of text in which the token sits is removed. 5571 * 5572 * Release bookmarks when they are no longer needed. 5573 * 5574 * Example: 5575 * 5576 * <main><h2>Surprising fact you may not know!</h2></main> 5577 * ^ ^ 5578 * \-|-- this `H2` opener bookmark tracks the token 5579 * 5580 * <main class="clickbait"><h2>Surprising fact you may no… 5581 * ^ ^ 5582 * \-|-- it shifts with edits 5583 * 5584 * Bookmarks provide the ability to seek to a previously-scanned 5585 * place in the HTML document. This avoids the need to re-scan 5586 * the entire document. 5587 * 5588 * Example: 5589 * 5590 * <ul><li>One</li><li>Two</li><li>Three</li></ul> 5591 * ^^^^ 5592 * want to note this last item 5593 * 5594 * $p = new WP_HTML_Tag_Processor( $html ); 5595 * $in_list = false; 5596 * while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) { 5597 * if ( 'UL' === $p->get_tag() ) { 5598 * if ( $p->is_tag_closer() ) { 5599 * $in_list = false; 5600 * $p->set_bookmark( 'resume' ); 5601 * if ( $p->seek( 'last-li' ) ) { 5602 * $p->add_class( 'last-li' ); 5603 * } 5604 * $p->seek( 'resume' ); 5605 * $p->release_bookmark( 'last-li' ); 5606 * $p->release_bookmark( 'resume' ); 5607 * } else { 5608 * $in_list = true; 5609 * } 5610 * } 5611 * 5612 * if ( 'LI' === $p->get_tag() ) { 5613 * $p->set_bookmark( 'last-li' ); 5614 * } 5615 * } 5616 * 5617 * Bookmarks intentionally hide the internal string offsets 5618 * to which they refer. They are maintained internally as 5619 * updates are applied to the HTML document and therefore 5620 * retain their "position" - the location to which they 5621 * originally pointed. The inability to use bookmarks with 5622 * functions like `substr` is therefore intentional to guard 5623 * against accidentally breaking the HTML. 5624 * 5625 * Because bookmarks allocate memory and require processing 5626 * for every applied update, they are limited and require 5627 * a name. They should not be created with programmatically-made 5628 * names, such as "li_{$index}" with some loop. As a general 5629 * rule they should only be created with string-literal names 5630 * like "start-of-section" or "last-paragraph". 5631 * 5632 * Bookmarks are a powerful tool to enable complicated behavior. 5633 * Consider double-checking that you need this tool if you are 5634 * reaching for it, as inappropriate use could lead to broken 5635 * HTML structure or unwanted processing overhead. 5636 * 5637 * @since 6.4.0 5638 * 5639 * @param string $bookmark_name Identifies this particular bookmark. 5640 * @return bool Whether the bookmark was successfully created. 5641 */ 5642 public function set_bookmark( $bookmark_name ): bool { 5643 return parent::set_bookmark( "_{$bookmark_name}" ); 5644 } 5645 5646 /** 5647 * Checks whether a bookmark with the given name exists. 5648 * 5649 * @since 6.5.0 5650 * 5651 * @param string $bookmark_name Name to identify a bookmark that potentially exists. 5652 * @return bool Whether that bookmark exists. 5653 */ 5654 public function has_bookmark( $bookmark_name ): bool { 5655 return parent::has_bookmark( "_{$bookmark_name}" ); 5656 } 5657 5658 /* 5659 * HTML Parsing Algorithms 5660 */ 5661 5662 /** 5663 * Closes a P element. 5664 * 5665 * @since 6.4.0 5666 * 5667 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 5668 * 5669 * @see https://html.spec.whatwg.org/#close-a-p-element 5670 */ 5671 private function close_a_p_element(): void { 5672 $this->generate_implied_end_tags( 'P' ); 5673 $this->state->stack_of_open_elements->pop_until( 'P' ); 5674 } 5675 5676 /** 5677 * Closes elements that have implied end tags. 5678 * 5679 * @since 6.4.0 5680 * @since 6.7.0 Full spec support. 5681 * 5682 * @see https://html.spec.whatwg.org/#generate-implied-end-tags 5683 * 5684 * @param string|null $except_for_this_element Perform as if this element doesn't exist in the stack of open elements. 5685 */ 5686 private function generate_implied_end_tags( ?string $except_for_this_element = null ): void { 5687 $elements_with_implied_end_tags = array( 5688 'DD', 5689 'DT', 5690 'LI', 5691 'OPTGROUP', 5692 'OPTION', 5693 'P', 5694 'RB', 5695 'RP', 5696 'RT', 5697 'RTC', 5698 ); 5699 5700 $no_exclusions = ! isset( $except_for_this_element ); 5701 5702 while ( 5703 ( $no_exclusions || ! $this->state->stack_of_open_elements->current_node_is( $except_for_this_element ) ) && 5704 in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true ) 5705 ) { 5706 $this->state->stack_of_open_elements->pop(); 5707 } 5708 } 5709 5710 /** 5711 * Closes elements that have implied end tags, thoroughly. 5712 * 5713 * See the HTML specification for an explanation why this is 5714 * different from generating end tags in the normal sense. 5715 * 5716 * @since 6.4.0 5717 * @since 6.7.0 Full spec support. 5718 * 5719 * @see WP_HTML_Processor::generate_implied_end_tags 5720 * @see https://html.spec.whatwg.org/#generate-implied-end-tags 5721 */ 5722 private function generate_implied_end_tags_thoroughly(): void { 5723 $elements_with_implied_end_tags = array( 5724 'CAPTION', 5725 'COLGROUP', 5726 'DD', 5727 'DT', 5728 'LI', 5729 'OPTGROUP', 5730 'OPTION', 5731 'P', 5732 'RB', 5733 'RP', 5734 'RT', 5735 'RTC', 5736 'TBODY', 5737 'TD', 5738 'TFOOT', 5739 'TH', 5740 'THEAD', 5741 'TR', 5742 ); 5743 5744 while ( in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true ) ) { 5745 $this->state->stack_of_open_elements->pop(); 5746 } 5747 } 5748 5749 /** 5750 * Returns the adjusted current node. 5751 * 5752 * > The adjusted current node is the context element if the parser was created as 5753 * > part of the HTML fragment parsing algorithm and the stack of open elements 5754 * > has only one element in it (fragment case); otherwise, the adjusted current 5755 * > node is the current node. 5756 * 5757 * @see https://html.spec.whatwg.org/#adjusted-current-node 5758 * 5759 * @since 6.7.0 5760 * 5761 * @return WP_HTML_Token|null The adjusted current node. 5762 */ 5763 private function get_adjusted_current_node(): ?WP_HTML_Token { 5764 if ( isset( $this->context_node ) && 1 === $this->state->stack_of_open_elements->count() ) { 5765 return $this->context_node; 5766 } 5767 5768 return $this->state->stack_of_open_elements->current_node(); 5769 } 5770 5771 /** 5772 * Reconstructs the active formatting elements. 5773 * 5774 * > This has the effect of reopening all the formatting elements that were opened 5775 * > in the current body, cell, or caption (whichever is youngest) that haven't 5776 * > been explicitly closed. 5777 * 5778 * @since 6.4.0 5779 * 5780 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 5781 * 5782 * @see https://html.spec.whatwg.org/#reconstruct-the-active-formatting-elements 5783 * 5784 * @return bool Whether any formatting elements needed to be reconstructed. 5785 */ 5786 private function reconstruct_active_formatting_elements(): bool { 5787 /* 5788 * > If there are no entries in the list of active formatting elements, then there is nothing 5789 * > to reconstruct; stop this algorithm. 5790 */ 5791 if ( 0 === $this->state->active_formatting_elements->count() ) { 5792 return false; 5793 } 5794 5795 $last_entry = $this->state->active_formatting_elements->current_node(); 5796 if ( 5797 5798 /* 5799 * > If the last (most recently added) entry in the list of active formatting elements is a marker; 5800 * > stop this algorithm. 5801 */ 5802 'marker' === $last_entry->node_name || 5803 5804 /* 5805 * > If the last (most recently added) entry in the list of active formatting elements is an 5806 * > element that is in the stack of open elements, then there is nothing to reconstruct; 5807 * > stop this algorithm. 5808 */ 5809 $this->state->stack_of_open_elements->contains_node( $last_entry ) 5810 ) { 5811 return false; 5812 } 5813 5814 $this->bail( 'Cannot reconstruct active formatting elements when advancing and rewinding is required.' ); 5815 } 5816 5817 /** 5818 * Runs the reset the insertion mode appropriately algorithm. 5819 * 5820 * @since 6.7.0 5821 * 5822 * @see https://html.spec.whatwg.org/multipage/parsing.html#reset-the-insertion-mode-appropriately 5823 */ 5824 private function reset_insertion_mode_appropriately(): void { 5825 // Set the first node. 5826 $first_node = null; 5827 foreach ( $this->state->stack_of_open_elements->walk_down() as $first_node ) { 5828 break; 5829 } 5830 5831 /* 5832 * > 1. Let _last_ be false. 5833 */ 5834 $last = false; 5835 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) { 5836 /* 5837 * > 2. Let _node_ be the last node in the stack of open elements. 5838 * > 3. _Loop_: If _node_ is the first node in the stack of open elements, then set _last_ 5839 * > to true, and, if the parser was created as part of the HTML fragment parsing 5840 * > algorithm (fragment case), set node to the context element passed to 5841 * > that algorithm. 5842 * > … 5843 */ 5844 if ( $node === $first_node ) { 5845 $last = true; 5846 if ( isset( $this->context_node ) ) { 5847 $node = $this->context_node; 5848 } 5849 } 5850 5851 // All of the following rules are for matching HTML elements. 5852 if ( 'html' !== $node->namespace ) { 5853 continue; 5854 } 5855 5856 switch ( $node->node_name ) { 5857 /* 5858 * > 4. If node is a `select` element, run these substeps: 5859 * > 1. If _last_ is true, jump to the step below labeled done. 5860 * > 2. Let _ancestor_ be _node_. 5861 * > 3. _Loop_: If _ancestor_ is the first node in the stack of open elements, 5862 * > jump to the step below labeled done. 5863 * > 4. Let ancestor be the node before ancestor in the stack of open elements. 5864 * > … 5865 * > 7. Jump back to the step labeled _loop_. 5866 * > 8. _Done_: Switch the insertion mode to "in select" and return. 5867 */ 5868 case 'SELECT': 5869 if ( ! $last ) { 5870 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $ancestor ) { 5871 if ( 'html' !== $ancestor->namespace ) { 5872 continue; 5873 } 5874 5875 switch ( $ancestor->node_name ) { 5876 /* 5877 * > 5. If _ancestor_ is a `template` node, jump to the step below 5878 * > labeled _done_. 5879 */ 5880 case 'TEMPLATE': 5881 break 2; 5882 5883 /* 5884 * > 6. If _ancestor_ is a `table` node, switch the insertion mode to 5885 * > "in select in table" and return. 5886 */ 5887 case 'TABLE': 5888 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE; 5889 return; 5890 } 5891 } 5892 } 5893 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT; 5894 return; 5895 5896 /* 5897 * > 5. If _node_ is a `td` or `th` element and _last_ is false, then switch the 5898 * > insertion mode to "in cell" and return. 5899 */ 5900 case 'TD': 5901 case 'TH': 5902 if ( ! $last ) { 5903 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL; 5904 return; 5905 } 5906 break; 5907 5908 /* 5909 * > 6. If _node_ is a `tr` element, then switch the insertion mode to "in row" 5910 * > and return. 5911 */ 5912 case 'TR': 5913 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 5914 return; 5915 5916 /* 5917 * > 7. If _node_ is a `tbody`, `thead`, or `tfoot` element, then switch the 5918 * > insertion mode to "in table body" and return. 5919 */ 5920 case 'TBODY': 5921 case 'THEAD': 5922 case 'TFOOT': 5923 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY; 5924 return; 5925 5926 /* 5927 * > 8. If _node_ is a `caption` element, then switch the insertion mode to 5928 * > "in caption" and return. 5929 */ 5930 case 'CAPTION': 5931 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION; 5932 return; 5933 5934 /* 5935 * > 9. If _node_ is a `colgroup` element, then switch the insertion mode to 5936 * > "in column group" and return. 5937 */ 5938 case 'COLGROUP': 5939 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP; 5940 return; 5941 5942 /* 5943 * > 10. If _node_ is a `table` element, then switch the insertion mode to 5944 * > "in table" and return. 5945 */ 5946 case 'TABLE': 5947 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE; 5948 return; 5949 5950 /* 5951 * > 11. If _node_ is a `template` element, then switch the insertion mode to the 5952 * > current template insertion mode and return. 5953 */ 5954 case 'TEMPLATE': 5955 $this->state->insertion_mode = end( $this->state->stack_of_template_insertion_modes ); 5956 return; 5957 5958 /* 5959 * > 12. If _node_ is a `head` element and _last_ is false, then switch the 5960 * > insertion mode to "in head" and return. 5961 */ 5962 case 'HEAD': 5963 if ( ! $last ) { 5964 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD; 5965 return; 5966 } 5967 break; 5968 5969 /* 5970 * > 13. If _node_ is a `body` element, then switch the insertion mode to "in body" 5971 * > and return. 5972 */ 5973 case 'BODY': 5974 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 5975 return; 5976 5977 /* 5978 * > 14. If _node_ is a `frameset` element, then switch the insertion mode to 5979 * > "in frameset" and return. (fragment case) 5980 */ 5981 case 'FRAMESET': 5982 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET; 5983 return; 5984 5985 /* 5986 * > 15. If _node_ is an `html` element, run these substeps: 5987 * > 1. If the head element pointer is null, switch the insertion mode to 5988 * > "before head" and return. (fragment case) 5989 * > 2. Otherwise, the head element pointer is not null, switch the insertion 5990 * > mode to "after head" and return. 5991 */ 5992 case 'HTML': 5993 $this->state->insertion_mode = isset( $this->state->head_element ) 5994 ? WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD 5995 : WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD; 5996 return; 5997 } 5998 } 5999 6000 /* 6001 * > 16. If _last_ is true, then switch the insertion mode to "in body" 6002 * > and return. (fragment case) 6003 * 6004 * This is only reachable if `$last` is true, as per the fragment parsing case. 6005 */ 6006 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; 6007 } 6008 6009 /** 6010 * Runs the adoption agency algorithm. 6011 * 6012 * @since 6.4.0 6013 * 6014 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. 6015 * 6016 * @see https://html.spec.whatwg.org/#adoption-agency-algorithm 6017 */ 6018 private function run_adoption_agency_algorithm(): void { 6019 $budget = 1000; 6020 $subject = $this->get_tag(); 6021 $current_node = $this->state->stack_of_open_elements->current_node(); 6022 6023 if ( 6024 // > If the current node is an HTML element whose tag name is subject 6025 $current_node && $subject === $current_node->node_name && 6026 // > the current node is not in the list of active formatting elements 6027 ! $this->state->active_formatting_elements->contains_node( $current_node ) 6028 ) { 6029 $this->state->stack_of_open_elements->pop(); 6030 return; 6031 } 6032 6033 $outer_loop_counter = 0; 6034 while ( $budget-- > 0 ) { 6035 if ( $outer_loop_counter++ >= 8 ) { 6036 return; 6037 } 6038 6039 /* 6040 * > Let formatting element be the last element in the list of active formatting elements that: 6041 * > - is between the end of the list and the last marker in the list, 6042 * > if any, or the start of the list otherwise, 6043 * > - and has the tag name subject. 6044 */ 6045 $formatting_element = null; 6046 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { 6047 if ( 'marker' === $item->node_name ) { 6048 break; 6049 } 6050 6051 if ( $subject === $item->node_name ) { 6052 $formatting_element = $item; 6053 break; 6054 } 6055 } 6056 6057 // > If there is no such element, then return and instead act as described in the "any other end tag" entry above. 6058 if ( null === $formatting_element ) { 6059 $this->bail( 'Cannot run adoption agency when "any other end tag" is required.' ); 6060 } 6061 6062 // > If formatting element is not in the stack of open elements, then this is a parse error; remove the element from the list, and return. 6063 if ( ! $this->state->stack_of_open_elements->contains_node( $formatting_element ) ) { 6064 $this->state->active_formatting_elements->remove_node( $formatting_element ); 6065 return; 6066 } 6067 6068 // > If formatting element is in the stack of open elements, but the element is not in scope, then this is a parse error; return. 6069 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $formatting_element->node_name ) ) { 6070 return; 6071 } 6072 6073 /* 6074 * > Let furthest block be the topmost node in the stack of open elements that is lower in the stack 6075 * > than formatting element, and is an element in the special category. There might not be one. 6076 */ 6077 $is_above_formatting_element = true; 6078 $furthest_block = null; 6079 foreach ( $this->state->stack_of_open_elements->walk_down() as $item ) { 6080 if ( $is_above_formatting_element && $formatting_element->bookmark_name !== $item->bookmark_name ) { 6081 continue; 6082 } 6083 6084 if ( $is_above_formatting_element ) { 6085 $is_above_formatting_element = false; 6086 continue; 6087 } 6088 6089 if ( self::is_special( $item ) ) { 6090 $furthest_block = $item; 6091 break; 6092 } 6093 } 6094 6095 /* 6096 * > If there is no furthest block, then the UA must first pop all the nodes from the bottom of the 6097 * > stack of open elements, from the current node up to and including formatting element, then 6098 * > remove formatting element from the list of active formatting elements, and finally return. 6099 */ 6100 if ( null === $furthest_block ) { 6101 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { 6102 $this->state->stack_of_open_elements->pop(); 6103 6104 if ( $formatting_element->bookmark_name === $item->bookmark_name ) { 6105 $this->state->active_formatting_elements->remove_node( $formatting_element ); 6106 return; 6107 } 6108 } 6109 } 6110 6111 $this->bail( 'Cannot extract common ancestor in adoption agency algorithm.' ); 6112 } 6113 6114 $this->bail( 'Cannot run adoption agency when looping required.' ); 6115 } 6116 6117 /** 6118 * Runs the "close the cell" algorithm. 6119 * 6120 * > Where the steps above say to close the cell, they mean to run the following algorithm: 6121 * > 1. Generate implied end tags. 6122 * > 2. If the current node is not now a td element or a th element, then this is a parse error. 6123 * > 3. Pop elements from the stack of open elements stack until a td element or a th element has been popped from the stack. 6124 * > 4. Clear the list of active formatting elements up to the last marker. 6125 * > 5. Switch the insertion mode to "in row". 6126 * 6127 * @see https://html.spec.whatwg.org/multipage/parsing.html#close-the-cell 6128 * 6129 * @since 6.7.0 6130 */ 6131 private function close_cell(): void { 6132 $this->generate_implied_end_tags(); 6133 // @todo Parse error if the current node is a "td" or "th" element. 6134 foreach ( $this->state->stack_of_open_elements->walk_up() as $element ) { 6135 $this->state->stack_of_open_elements->pop(); 6136 if ( 'TD' === $element->node_name || 'TH' === $element->node_name ) { 6137 break; 6138 } 6139 } 6140 $this->state->active_formatting_elements->clear_up_to_last_marker(); 6141 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW; 6142 } 6143 6144 /** 6145 * Inserts an HTML element on the stack of open elements. 6146 * 6147 * @since 6.4.0 6148 * 6149 * @see https://html.spec.whatwg.org/#insert-a-foreign-element 6150 * 6151 * @param WP_HTML_Token $token Name of bookmark pointing to element in original input HTML. 6152 */ 6153 private function insert_html_element( WP_HTML_Token $token ): void { 6154 $this->state->stack_of_open_elements->push( $token ); 6155 } 6156 6157 /** 6158 * Inserts a foreign element on to the stack of open elements. 6159 * 6160 * @since 6.7.0 6161 * 6162 * @see https://html.spec.whatwg.org/#insert-a-foreign-element 6163 * 6164 * @param WP_HTML_Token $token Insert this token. The token's namespace and 6165 * insertion point will be updated correctly. 6166 * @param bool $only_add_to_element_stack Whether to skip the "insert an element at the adjusted 6167 * insertion location" algorithm when adding this element. 6168 */ 6169 private function insert_foreign_element( WP_HTML_Token $token, bool $only_add_to_element_stack ): void { 6170 $adjusted_current_node = $this->get_adjusted_current_node(); 6171 6172 $token->namespace = $adjusted_current_node ? $adjusted_current_node->namespace : 'html'; 6173 6174 if ( $this->is_mathml_integration_point() ) { 6175 $token->integration_node_type = 'math'; 6176 } elseif ( $this->is_html_integration_point() ) { 6177 $token->integration_node_type = 'html'; 6178 } 6179 6180 if ( false === $only_add_to_element_stack ) { 6181 /* 6182 * @todo Implement the "appropriate place for inserting a node" and the 6183 * "insert an element at the adjusted insertion location" algorithms. 6184 * 6185 * These algorithms mostly impacts DOM tree construction and not the HTML API. 6186 * Here, there's no DOM node onto which the element will be appended, so the 6187 * parser will skip this step. 6188 * 6189 * @see https://html.spec.whatwg.org/#insert-an-element-at-the-adjusted-insertion-location 6190 */ 6191 } 6192 6193 $this->insert_html_element( $token ); 6194 } 6195 6196 /** 6197 * Inserts a virtual element on the stack of open elements. 6198 * 6199 * @since 6.7.0 6200 * 6201 * @param string $token_name Name of token to create and insert into the stack of open elements. 6202 * @param string|null $bookmark_name Optional. Name to give bookmark for created virtual node. 6203 * Defaults to auto-creating a bookmark name. 6204 * @return WP_HTML_Token Newly-created virtual token. 6205 */ 6206 private function insert_virtual_node( $token_name, $bookmark_name = null ): WP_HTML_Token { 6207 $here = $this->bookmarks[ $this->state->current_token->bookmark_name ]; 6208 $name = $bookmark_name ?? $this->bookmark_token(); 6209 6210 $this->bookmarks[ $name ] = new WP_HTML_Span( $here->start, 0 ); 6211 6212 $token = new WP_HTML_Token( $name, $token_name, false ); 6213 $this->insert_html_element( $token ); 6214 return $token; 6215 } 6216 6217 /* 6218 * HTML Specification Helpers 6219 */ 6220 6221 /** 6222 * Indicates if the current token is a MathML integration point. 6223 * 6224 * @since 6.7.0 6225 * 6226 * @see https://html.spec.whatwg.org/#mathml-text-integration-point 6227 * 6228 * @return bool Whether the current token is a MathML integration point. 6229 */ 6230 private function is_mathml_integration_point(): bool { 6231 $current_token = $this->state->current_token; 6232 if ( ! isset( $current_token ) ) { 6233 return false; 6234 } 6235 6236 if ( 'math' !== $current_token->namespace || 'M' !== $current_token->node_name[0] ) { 6237 return false; 6238 } 6239 6240 $tag_name = $current_token->node_name; 6241 6242 return ( 6243 'MI' === $tag_name || 6244 'MO' === $tag_name || 6245 'MN' === $tag_name || 6246 'MS' === $tag_name || 6247 'MTEXT' === $tag_name 6248 ); 6249 } 6250 6251 /** 6252 * Indicates if the current token is an HTML integration point. 6253 * 6254 * Note that this method must be an instance method with access 6255 * to the current token, since it needs to examine the attributes 6256 * of the currently-matched tag, if it's in the MathML namespace. 6257 * Otherwise it would be required to scan the HTML and ensure that 6258 * no other accounting is overlooked. 6259 * 6260 * @since 6.7.0 6261 * 6262 * @see https://html.spec.whatwg.org/#html-integration-point 6263 * 6264 * @return bool Whether the current token is an HTML integration point. 6265 */ 6266 private function is_html_integration_point(): bool { 6267 $current_token = $this->state->current_token; 6268 if ( ! isset( $current_token ) ) { 6269 return false; 6270 } 6271 6272 if ( 'html' === $current_token->namespace ) { 6273 return false; 6274 } 6275 6276 $tag_name = $current_token->node_name; 6277 6278 if ( 'svg' === $current_token->namespace ) { 6279 return ( 6280 'DESC' === $tag_name || 6281 'FOREIGNOBJECT' === $tag_name || 6282 'TITLE' === $tag_name 6283 ); 6284 } 6285 6286 if ( 'math' === $current_token->namespace ) { 6287 if ( 'ANNOTATION-XML' !== $tag_name ) { 6288 return false; 6289 } 6290 6291 $encoding = $this->get_attribute( 'encoding' ); 6292 6293 return ( 6294 is_string( $encoding ) && 6295 ( 6296 0 === strcasecmp( $encoding, 'application/xhtml+xml' ) || 6297 0 === strcasecmp( $encoding, 'text/html' ) 6298 ) 6299 ); 6300 } 6301 6302 $this->bail( 'Should not have reached end of HTML Integration Point detection: check HTML API code.' ); 6303 // This unnecessary return prevents tools from inaccurately reporting type errors. 6304 return false; 6305 } 6306 6307 /** 6308 * Returns whether an element of a given name is in the HTML special category. 6309 * 6310 * @since 6.4.0 6311 * 6312 * @see https://html.spec.whatwg.org/#special 6313 * 6314 * @param WP_HTML_Token|string $tag_name Node to check, or only its name if in the HTML namespace. 6315 * @return bool Whether the element of the given name is in the special category. 6316 */ 6317 public static function is_special( $tag_name ): bool { 6318 if ( is_string( $tag_name ) ) { 6319 $tag_name = strtoupper( $tag_name ); 6320 } else { 6321 $tag_name = 'html' === $tag_name->namespace 6322 ? strtoupper( $tag_name->node_name ) 6323 : "{$tag_name->namespace} {$tag_name->node_name}"; 6324 } 6325 6326 return ( 6327 'ADDRESS' === $tag_name || 6328 'APPLET' === $tag_name || 6329 'AREA' === $tag_name || 6330 'ARTICLE' === $tag_name || 6331 'ASIDE' === $tag_name || 6332 'BASE' === $tag_name || 6333 'BASEFONT' === $tag_name || 6334 'BGSOUND' === $tag_name || 6335 'BLOCKQUOTE' === $tag_name || 6336 'BODY' === $tag_name || 6337 'BR' === $tag_name || 6338 'BUTTON' === $tag_name || 6339 'CAPTION' === $tag_name || 6340 'CENTER' === $tag_name || 6341 'COL' === $tag_name || 6342 'COLGROUP' === $tag_name || 6343 'DD' === $tag_name || 6344 'DETAILS' === $tag_name || 6345 'DIR' === $tag_name || 6346 'DIV' === $tag_name || 6347 'DL' === $tag_name || 6348 'DT' === $tag_name || 6349 'EMBED' === $tag_name || 6350 'FIELDSET' === $tag_name || 6351 'FIGCAPTION' === $tag_name || 6352 'FIGURE' === $tag_name || 6353 'FOOTER' === $tag_name || 6354 'FORM' === $tag_name || 6355 'FRAME' === $tag_name || 6356 'FRAMESET' === $tag_name || 6357 'H1' === $tag_name || 6358 'H2' === $tag_name || 6359 'H3' === $tag_name || 6360 'H4' === $tag_name || 6361 'H5' === $tag_name || 6362 'H6' === $tag_name || 6363 'HEAD' === $tag_name || 6364 'HEADER' === $tag_name || 6365 'HGROUP' === $tag_name || 6366 'HR' === $tag_name || 6367 'HTML' === $tag_name || 6368 'IFRAME' === $tag_name || 6369 'IMG' === $tag_name || 6370 'INPUT' === $tag_name || 6371 'KEYGEN' === $tag_name || 6372 'LI' === $tag_name || 6373 'LINK' === $tag_name || 6374 'LISTING' === $tag_name || 6375 'MAIN' === $tag_name || 6376 'MARQUEE' === $tag_name || 6377 'MENU' === $tag_name || 6378 'META' === $tag_name || 6379 'NAV' === $tag_name || 6380 'NOEMBED' === $tag_name || 6381 'NOFRAMES' === $tag_name || 6382 'NOSCRIPT' === $tag_name || 6383 'OBJECT' === $tag_name || 6384 'OL' === $tag_name || 6385 'P' === $tag_name || 6386 'PARAM' === $tag_name || 6387 'PLAINTEXT' === $tag_name || 6388 'PRE' === $tag_name || 6389 'SCRIPT' === $tag_name || 6390 'SEARCH' === $tag_name || 6391 'SECTION' === $tag_name || 6392 'SELECT' === $tag_name || 6393 'SOURCE' === $tag_name || 6394 'STYLE' === $tag_name || 6395 'SUMMARY' === $tag_name || 6396 'TABLE' === $tag_name || 6397 'TBODY' === $tag_name || 6398 'TD' === $tag_name || 6399 'TEMPLATE' === $tag_name || 6400 'TEXTAREA' === $tag_name || 6401 'TFOOT' === $tag_name || 6402 'TH' === $tag_name || 6403 'THEAD' === $tag_name || 6404 'TITLE' === $tag_name || 6405 'TR' === $tag_name || 6406 'TRACK' === $tag_name || 6407 'UL' === $tag_name || 6408 'WBR' === $tag_name || 6409 'XMP' === $tag_name || 6410 6411 // MathML. 6412 'math MI' === $tag_name || 6413 'math MO' === $tag_name || 6414 'math MN' === $tag_name || 6415 'math MS' === $tag_name || 6416 'math MTEXT' === $tag_name || 6417 'math ANNOTATION-XML' === $tag_name || 6418 6419 // SVG. 6420 'svg DESC' === $tag_name || 6421 'svg FOREIGNOBJECT' === $tag_name || 6422 'svg TITLE' === $tag_name 6423 ); 6424 } 6425 6426 /** 6427 * Returns whether a given element is an HTML Void Element 6428 * 6429 * > area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr 6430 * 6431 * @since 6.4.0 6432 * 6433 * @see https://html.spec.whatwg.org/#void-elements 6434 * 6435 * @param string $tag_name Name of HTML tag to check. 6436 * @return bool Whether the given tag is an HTML Void Element. 6437 */ 6438 public static function is_void( $tag_name ): bool { 6439 $tag_name = strtoupper( $tag_name ); 6440 6441 return ( 6442 'AREA' === $tag_name || 6443 'BASE' === $tag_name || 6444 'BASEFONT' === $tag_name || // Obsolete but still treated as void. 6445 'BGSOUND' === $tag_name || // Obsolete but still treated as void. 6446 'BR' === $tag_name || 6447 'COL' === $tag_name || 6448 'EMBED' === $tag_name || 6449 'FRAME' === $tag_name || 6450 'HR' === $tag_name || 6451 'IMG' === $tag_name || 6452 'INPUT' === $tag_name || 6453 'KEYGEN' === $tag_name || // Obsolete but still treated as void. 6454 'LINK' === $tag_name || 6455 'META' === $tag_name || 6456 'PARAM' === $tag_name || // Obsolete but still treated as void. 6457 'SOURCE' === $tag_name || 6458 'TRACK' === $tag_name || 6459 'WBR' === $tag_name 6460 ); 6461 } 6462 6463 /** 6464 * Gets an encoding from a given string. 6465 * 6466 * This is an algorithm defined in the WHAT-WG specification. 6467 * 6468 * Example: 6469 * 6470 * 'UTF-8' === self::get_encoding( 'utf8' ); 6471 * 'UTF-8' === self::get_encoding( " \tUTF-8 " ); 6472 * null === self::get_encoding( 'UTF-7' ); 6473 * null === self::get_encoding( 'utf8; charset=' ); 6474 * 6475 * @see https://encoding.spec.whatwg.org/#concept-encoding-get 6476 * 6477 * @todo As this parser only supports UTF-8, only the UTF-8 6478 * encodings are detected. Add more as desired, but the 6479 * parser will bail on non-UTF-8 encodings. 6480 * 6481 * @since 6.7.0 6482 * 6483 * @param string $label A string which may specify a known encoding. 6484 * @return string|null Known encoding if matched, otherwise null. 6485 */ 6486 protected static function get_encoding( string $label ): ?string { 6487 /* 6488 * > Remove any leading and trailing ASCII whitespace from label. 6489 */ 6490 $label = trim( $label, " \t\f\r\n" ); 6491 6492 /* 6493 * > If label is an ASCII case-insensitive match for any of the labels listed in the 6494 * > table below, then return the corresponding encoding; otherwise return failure. 6495 */ 6496 switch ( strtolower( $label ) ) { 6497 case 'unicode-1-1-utf-8': 6498 case 'unicode11utf8': 6499 case 'unicode20utf8': 6500 case 'utf-8': 6501 case 'utf8': 6502 case 'x-unicode20utf8': 6503 return 'UTF-8'; 6504 6505 default: 6506 return null; 6507 } 6508 } 6509 6510 /* 6511 * Constants that would pollute the top of the class if they were found there. 6512 */ 6513 6514 /** 6515 * Indicates that the next HTML token should be parsed and processed. 6516 * 6517 * @since 6.4.0 6518 * 6519 * @var string 6520 */ 6521 const PROCESS_NEXT_NODE = 'process-next-node'; 6522 6523 /** 6524 * Indicates that the current HTML token should be reprocessed in the newly-selected insertion mode. 6525 * 6526 * @since 6.4.0 6527 * 6528 * @var string 6529 */ 6530 const REPROCESS_CURRENT_NODE = 'reprocess-current-node'; 6531 6532 /** 6533 * Indicates that the current HTML token should be processed without advancing the parser. 6534 * 6535 * @since 6.5.0 6536 * 6537 * @var string 6538 */ 6539 const PROCESS_CURRENT_NODE = 'process-current-node'; 6540 6541 /** 6542 * Indicates that the parser encountered unsupported markup and has bailed. 6543 * 6544 * @since 6.4.0 6545 * 6546 * @var string 6547 */ 6548 const ERROR_UNSUPPORTED = 'unsupported'; 6549 6550 /** 6551 * Indicates that the parser encountered more HTML tokens than it 6552 * was able to process and has bailed. 6553 * 6554 * @since 6.4.0 6555 * 6556 * @var string 6557 */ 6558 const ERROR_EXCEEDED_MAX_BOOKMARKS = 'exceeded-max-bookmarks'; 6559 6560 /** 6561 * Unlock code that must be passed into the constructor to create this class. 6562 * 6563 * This class extends the WP_HTML_Tag_Processor, which has a public class 6564 * constructor. Therefore, it's not possible to have a private constructor here. 6565 * 6566 * This unlock code is used to ensure that anyone calling the constructor is 6567 * doing so with a full understanding that it's intended to be a private API. 6568 * 6569 * @access private 6570 */ 6571 const CONSTRUCTOR_UNLOCK_CODE = 'Use WP_HTML_Processor::create_fragment() instead of calling the class constructor directly.'; 6572 }
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated : Sat Nov 23 08:20:01 2024 | Cross-referenced by PHPXref |