[ Index ] |
PHP Cross Reference of WordPress Trunk (Updated Daily) |
[Summary view] [Print] [Text view]
1 <?php 2 /** 3 * Efficiently scan through block structure in document without parsing 4 * the entire block tree and all of its JSON attributes into memory. 5 * 6 * @package WordPress 7 * @subpackage Blocks 8 * @since 6.9.0 9 */ 10 11 /** 12 * Class for efficiently scanning through block structure in a document 13 * without parsing the entire block tree and JSON attributes into memory. 14 * 15 * ## Overview 16 * 17 * This class is designed to help analyze and modify block structure in a 18 * streaming fashion and to bridge the gap between parsed block trees and 19 * the text representing them. 20 * 21 * Use-cases for this class include but are not limited to: 22 * 23 * - Counting block types in a document. 24 * - Queuing stylesheets based on the presence of various block types. 25 * - Modifying blocks of a given type, i.e. migrations, updates, and styling. 26 * - Searching for content of specific kinds, e.g. checking for blocks 27 * with certain theme support attributes, or block bindings. 28 * - Adding CSS class names to the element wrapping a block’s inner blocks. 29 * 30 * > *Note!* If a fully-parsed block tree of a document is necessary, including 31 * > all the parsed JSON attributes, nested blocks, and HTML, consider 32 * > using {@see \parse_blocks()} instead which will parse the document 33 * > in one swift pass. 34 * 35 * For typical usage, jump first to the methods {@see self::next_block()}, 36 * {@see self::next_delimiter()}, or {@see self::next_token()}. 37 * 38 * ### Values 39 * 40 * As a lower-level interface than {@see parse_blocks()} this class follows 41 * different performance-focused values: 42 * 43 * - Minimize allocations so that documents of any size may be processed 44 * on a fixed or marginal amount of memory. 45 * - Make hidden costs explicit so that calling code only has to pay the 46 * performance penalty for features it needs. 47 * - Operate with a streaming and re-entrant design to make it possible 48 * to operate on chunks of a document and to resume after pausing. 49 * 50 * This means that some operations might appear more cumbersome than one 51 * might expect. This design tradeoff opens up opportunity to wrap this in 52 * a convenience class to add higher-level functionality. 53 * 54 * ## Concepts 55 * 56 * All text documents can be considered a block document containing a combination 57 * of “freeform HTML” and explicit block structure. Block structure forms through 58 * special HTML comments called _delimiters_ which include a block type and, 59 * optionally, block attributes encoded as a JSON object payload. 60 * 61 * This processor is designed to scan through a block document from delimiter to 62 * delimiter, tracking how the delimiters impact the structure of the document. 63 * Spans of HTML appear between delimiters. If these spans exist at the top level 64 * of the document, meaning there is no containing block around them, they are 65 * considered freeform HTML content. If, however, they appear _inside_ block 66 * structure they are interpreted as `innerHTML` for the containing block. 67 * 68 * ### Tokens and scanning 69 * 70 * As the processor scans through a document is reports information about the token 71 * on which is pauses. Tokens represent spans of text in the input comprising block 72 * delimiters and spans of HTML. 73 * 74 * - {@see self::next_token()} visits every contiguous subspan of text in the 75 * input document. This includes all explicit block comment delimiters and spans 76 * of HTML content (whether freeform or inner HTML). 77 * - {@see self::next_delimiter()} visits every explicit block comment delimiter 78 * unless passed a block type which covers freeform HTML content. In these cases 79 * it will stop at top-level spans of HTML and report a `null` block type. 80 * - {@see self::next_block()} visits every block delimiter which _opens_ a block. 81 * This includes opening block delimiters as well as void block delimiters. With 82 * the same exception as above for freeform HTML block types, this will visit 83 * top-level spans of HTML content. 84 * 85 * When matched on a particular token, the following methods provide structural 86 * and textual information about it: 87 * 88 * - {@see self::get_delimiter_type()} reports whether the delimiter is an opener, 89 * a closer, or if it represents a whole void block. 90 * - {@see self::get_block_type()} reports the fully-qualified block type which 91 * the delimiter represents. 92 * - {@see self::get_printable_block_type()} reports the fully-qualified block type, 93 * but returns `core/freeform` instead of `null` for top-level freeform HTML content. 94 * - {@see self::is_block_type()} indicates if the delimiter represents a block of 95 * the given block type, or wildcard or pseudo-block type described below. 96 * - {@see self::opens_block()} indicates if the delimiter opens a block of one 97 * of the provided block types. Opening, void, and top-level freeform HTML content 98 * all open blocks. 99 * - {@see static::get_attributes()} is currently reserved for a future streaming 100 * JSON parser class. 101 * - {@see self::allocate_and_return_parsed_attributes()} extracts the JSON attributes 102 * for delimiters which open blocks and return the fully-parsed attributes as an 103 * associative array. {@see static::get_last_json_error()} for when this fails. 104 * - {@see self::is_html()} indicates if the token is a span of HTML which might 105 * be top-level freeform content or a block’s inner HTML. 106 * - {@see self::get_html_content()} returns the span of HTML. 107 * - {@see self::get_span()} for the byte offset and length into the input document 108 * representing the token. 109 * 110 * It’s possible for the processor to fail to scan forward if the input document ends 111 * in a proper prefix of an explicit block comment delimiter. For example, if the input 112 * ends in `<!-- wp:` then it _might_ be the start of another delimiter. The parser 113 * cannot know, however, and therefore refuses to proceed. {@see static::get_last_error()} 114 * to distinguish between a failure to find the next token and an incomplete input. 115 * 116 * ### Block types 117 * 118 * A block’s “type” comprises an optional _namespace_ and _name_. If the namespace 119 * isn’t provided it will be interpreted as the implicit `core` namespace. For example, 120 * the type `gallery` is the name of the block in the `core` namespace, but the type 121 * `abc/gallery` is the _fully-qualified_ block type for the block whose name is still 122 * `gallery`, but in the `abc` namespace. 123 * 124 * Methods on this class are aware of this block naming semantic and anywhere a block 125 * type is an argument to a method it will be normalized to account for implicit namespaces. 126 * Passing `paragraph` is the same as passing `core/paragraph`. On the contrary, anywhere 127 * this class returns a block type, it will return the fully-qualified and normalized form. 128 * For example, for the `<!-- wp:group -->` delimiter it will return `core/group` as the 129 * block type. 130 * 131 * There are two special block types that change the behavior of the processor: 132 * 133 * - The wildcard `*` represents _any block_. In addition to matching all block types, 134 * it also represents top-level freeform HTML whose block type is reported as `null`. 135 * 136 * - The `core/freeform` block type is a pseudo-block type which explicitly matches 137 * top-level freeform HTML. 138 * 139 * These special block types can be passed into any method which searches for blocks. 140 * 141 * There is one additional special block type which may be returned from 142 * {@see self::get_printable_block_type()}. This is the `#innerHTML` type, which 143 * indicates that the HTML span on which the processor is paused is inner HTML for 144 * a containing block. 145 * 146 * ### Spans of HTML 147 * 148 * Non-block content plays a complicated role in processing block documents. This 149 * processor exposes tools to help work with these spans of HTML. 150 * 151 * - {@see self::is_html()} indicates if the processor is paused at a span of 152 * HTML but does not differentiate between top-level freeform content and inner HTML. 153 * - {@see self::is_non_whitespace_html()} indicates not only if the processor 154 * is paused at a span of HTML, but also whether that span incorporates more than 155 * whitespace characters. Because block serialization often inserts newlines between 156 * block comment delimiters, this is useful for distinguishing “real” freeform 157 * content from purely aesthetic syntax. 158 * - {@see self::is_block_type()} matches top-level freeform HTML content when 159 * provided one of the special block types described above. 160 * 161 * ### Block structure 162 * 163 * As the processor traverses block delimiters it maintains a stack of which blocks are 164 * open at the given place in the document where it’s paused. This stack represents the 165 * block structure of a document and is used to determine where blocks end, which blocks 166 * represent inner blocks, whether a span of HTML is top-level freeform content, and 167 * more. Investigate the stack with {@see self::get_breadcrumbs()}, which returns an 168 * array of block types starting at the outermost-open block and descending to the 169 * currently-visited block. 170 * 171 * Unlike {@parse_blocks()}, spans of HTML appear in this structure as the special 172 * reported block type `#html`. Such a span represents inner HTML for a block if the 173 * depth reported by {@see self::get_depth()} is greater than one. 174 * 175 * It will generally not be necessary to inspect the stack of open blocks, though 176 * depth may be important for finding where blocks end. When visiting a block opener, 177 * the depth will have been increased before pausing; in contrast the depth is 178 * decremented before visiting a closer. This makes the following an easy way to 179 * determine if a block is still open. 180 * 181 * Example: 182 * 183 * $depth = $processor->get_depth(); 184 * while ( $processor->next_token() && $processor->get_depth() > $depth ) { 185 * continue 186 * } 187 * // Processor is now paused at the token immediately following the closed block. 188 * 189 * #### Extracting blocks 190 * 191 * A unique feature of this processor is the ability to return the same output as 192 * {@see \parse_blocks()} would produce, but for a subset of the input document. 193 * For example, it’s possible to extract an image block, manipulate that parsed 194 * block, and re-serialize it into the original document. It’s possible to do so 195 * while skipping over the parse of the rest of the document. 196 * 197 * {@see self::extract_block()} will scan forward from the current block opener 198 * and build the parsed block structure until the current block is closed. It will 199 * include all inner HTML and inner blocks, and parse all of the inner blocks. It 200 * can be used to extract a block at any depth in the document, helpful for operating 201 * on blocks within nested structure. 202 * 203 * Example: 204 * 205 * if ( ! $processor->next_block( 'gallery' ) ) { 206 * return $post_content; 207 * } 208 * 209 * $gallery_at = $processor->get_span()->start; 210 * $gallery_block = $processor->extract_block(); 211 * $after_gallery = $processor->get_span()->start; 212 * return ( 213 * substr( $post_content, 0, $gallery_at ) . 214 * serialize_block( modify_gallery( $gallery_block ) . 215 * substr( $post_content, $after_gallery ) 216 * ); 217 * 218 * #### Handling of malformed structure 219 * 220 * There are situations where closing block delimiters appear for which no open block 221 * exists, or where a document ends before a block is closed, or where a closing block 222 * delimiter appears but references a different block type than the most-recently 223 * opened block does. In all of these cases, the stack of open blocks should mirror 224 * the behavior in {@see \parse_blocks()}. 225 * 226 * Unlike {@see \parse_blocks()}, however, this processor can still operate on the 227 * invalid block delimiters. It provides a few functions which can be used for building 228 * custom and non-spec-compliant error handling. 229 * 230 * - {@see self::has_closing_flag()} indicates if the block delimiter contains the 231 * closing flag at the end. Some invalid block delimiters might contain both the 232 * void and closing flag, in which case {@see self::get_delimiter_type()} will 233 * report that it’s a void block. 234 * - {@see static::get_last_error()} indicates if the processor reached an invalid 235 * block closing. Depending on the context, {@see \parse_blocks()} might instead 236 * ignore the token or treat it as freeform HTML content. 237 * 238 * ## Static helpers 239 * 240 * This class provides helpers for performing semantic block-related operations. 241 * 242 * - {@see self::normalize_block_type()} takes a block type with or without the 243 * implicit `core` namespace and returns a fully-qualified block type. 244 * - {@see self::are_equal_block_types()} indicates if two spans across one or 245 * more input texts represent the same fully-qualified block type. 246 * 247 * ## Subclassing 248 * 249 * This processor is designed to accurately parse a block document. Therefore, many 250 * of its methods are not meant for subclassing. However, overall this class supports 251 * building higher-level convenience classes which may choose to subclass it. For those 252 * classes, avoid re-implementing methods except for the list below. Instead, create 253 * new names representing the higher-level concepts being introduced. For example, instead 254 * of creating a new method named `next_block()` which only advances to blocks of a given 255 * kind, consider creating a new method named something like `next_layout_block()` which 256 * won’t interfere with the base class method. 257 * 258 * - {@see static::get_last_error()} may be reimplemented to report new errors in the subclass 259 * which aren’t intrinsic to block parsing. 260 * - {@see static::get_attributes()} may be reimplemented to provide a streaming interface 261 * to reading and modifying a block’s JSON attributes. It should be fast and memory efficient. 262 * - {@see static::get_last_json_error()} may be reimplemented to report new errors introduced 263 * with a reimplementation of {@see static::get_attributes()}. 264 * 265 * @since 6.9.0 266 */ 267 class WP_Block_Processor { 268 /** 269 * Indicates if the last operation failed, otherwise 270 * will be `null` for success. 271 * 272 * @since 6.9.0 273 * 274 * @var string|null 275 */ 276 private $last_error = null; 277 278 /** 279 * Indicates failures from decoding JSON attributes. 280 * 281 * @since 6.9.0 282 * 283 * @see \json_last_error() 284 * 285 * @var int 286 */ 287 private $last_json_error = JSON_ERROR_NONE; 288 289 /** 290 * Source text provided to processor. 291 * 292 * @since 6.9.0 293 * 294 * @var string 295 */ 296 protected $source_text; 297 298 /** 299 * Byte offset into source text where a matched delimiter starts. 300 * 301 * Example: 302 * 303 * 5 10 15 20 25 30 35 40 45 50 304 * <!-- wp:group --><!-- wp:void /--><!-- /wp:group --> 305 * ╰─ Starts at byte offset 17. 306 * 307 * @since 6.9.0 308 * 309 * @var int 310 */ 311 private $matched_delimiter_at = 0; 312 313 /** 314 * Byte length of full span of a matched delimiter. 315 * 316 * Example: 317 * 318 * 5 10 15 20 25 30 35 40 45 50 319 * <!-- wp:group --><!-- wp:void /--><!-- /wp:group --> 320 * ╰───────────────╯ 321 * 17 bytes long. 322 * 323 * @since 6.9.0 324 * 325 * @var int 326 */ 327 private $matched_delimiter_length = 0; 328 329 /** 330 * First byte offset into source text following any previously-matched delimiter. 331 * Used to indicate where an HTML span starts. 332 * 333 * Example: 334 * 335 * 5 10 15 20 25 30 35 40 45 50 55 336 * <!-- wp:paragraph --><p>Content</p><⃨!⃨-⃨-⃨ ⃨/⃨w⃨p⃨:⃨p⃨a⃨r⃨a⃨g⃨r⃨a⃨p⃨h⃨ ⃨-⃨-⃨>⃨ 337 * │ ╰─ This delimiter was matched, and after matching, 338 * │ revealed the preceding HTML span. 339 * │ 340 * ╰─ The first byte offset after the previous matched delimiter 341 * is 21. Because the matched delimiter starts at 55, which is after 342 * this, a span of HTML must exist between these boundaries. 343 * 344 * @since 6.9.0 345 * 346 * @var int 347 */ 348 private $after_previous_delimiter = 0; 349 350 /** 351 * Byte offset where namespace span begins. 352 * 353 * When no namespace is present, this will be the same as the starting 354 * byte offset for the block name. 355 * 356 * Example: 357 * 358 * <!-- wp:core/gallery --> 359 * │ ╰─ Name starts here. 360 * ╰─ Namespace starts here. 361 * 362 * <!-- wp:gallery --> 363 * ├─ The namespace would start here but is implied as “core.” 364 * ╰─ The name starts here. 365 * 366 * @since 6.9.0 367 * 368 * @var int 369 */ 370 private $namespace_at = 0; 371 372 /** 373 * Byte offset where block name span begins. 374 * 375 * When no namespace is present, this will be the same as the starting 376 * byte offset for the block namespace. 377 * 378 * Example: 379 * 380 * <!-- wp:core/gallery --> 381 * │ ╰─ Name starts here. 382 * ╰─ Namespace starts here. 383 * 384 * <!-- wp:gallery --> 385 * ├─ The namespace would start here but is implied as “core.” 386 * ╰─ The name starts here. 387 * 388 * @since 6.9.0 389 * 390 * @var int 391 */ 392 private $name_at = 0; 393 394 /** 395 * Byte length of block name span. 396 * 397 * Example: 398 * 399 * 5 10 15 20 25 400 * <!-- wp:core/gallery --> 401 * ╰─────╯ 402 * 7 bytes long. 403 * 404 * @since 6.9.0 405 * 406 * @var int 407 */ 408 private $name_length = 0; 409 410 /** 411 * Whether the delimiter contains the block-closing flag. 412 * 413 * This may be erroneous if present within a void block, 414 * therefore the {@see self::has_closing_flag()} can be used by 415 * calling code to perform custom error-handling. 416 * 417 * @since 6.9.0 418 * 419 * @var bool 420 */ 421 private $has_closing_flag = false; 422 423 /** 424 * Byte offset where JSON attributes span begins. 425 * 426 * Example: 427 * 428 * 5 10 15 20 25 30 35 40 429 * <!-- wp:paragraph {"dropCaps":true} --> 430 * ╰─ Starts at byte offset 18. 431 * 432 * @since 6.9.0 433 * 434 * @var int 435 */ 436 private $json_at; 437 438 /** 439 * Byte length of JSON attributes span, or 0 if none are present. 440 * 441 * Example: 442 * 443 * 5 10 15 20 25 30 35 40 444 * <!-- wp:paragraph {"dropCaps":true} --> 445 * ╰───────────────╯ 446 * 17 bytes long. 447 * 448 * @since 6.9.0 449 * 450 * @var int 451 */ 452 private $json_length = 0; 453 454 /** 455 * Internal parser state, differentiating whether the instance is currently matched, 456 * on an implicit freeform node, in error, or ready to begin parsing. 457 * 458 * @see self::READY 459 * @see self::MATCHED 460 * @see self::HTML_SPAN 461 * @see self::INCOMPLETE_INPUT 462 * @see self::COMPLETE 463 * 464 * @since 6.9.0 465 * 466 * @var string 467 */ 468 protected $state = self::READY; 469 470 /** 471 * Indicates what kind of block comment delimiter was matched. 472 * 473 * One of: 474 * 475 * - {@see self::OPENER} If the delimiter is opening a block. 476 * - {@see self::CLOSER} If the delimiter is closing an open block. 477 * - {@see self::VOID} If the delimiter represents a void block with no inner content. 478 * 479 * If a parsed comment delimiter contains both the closing and the void 480 * flags then it will be interpreted as a void block to match the behavior 481 * of the official block parser, however, this is a syntax error and probably 482 * the block ought to close an open block of the same name, if one is open. 483 * 484 * @since 6.9.0 485 * 486 * @var string 487 */ 488 private $type; 489 490 /** 491 * Whether the last-matched delimiter acts like a void block and should be 492 * popped from the stack of open blocks as soon as the parser advances. 493 * 494 * This applies to void block delimiters and to HTML spans. 495 * 496 * @since 6.9.0 497 * 498 * @var bool 499 */ 500 private $was_void = false; 501 502 /** 503 * For every open block, in hierarchical order, this stores the byte offset 504 * into the source text where the block type starts, including for HTML spans. 505 * 506 * To avoid allocating and normalizing block names when they aren’t requested, 507 * the stack of open blocks is stored as the byte offsets and byte lengths of 508 * each open block’s block type. This allows for minimal tracking and quick 509 * reading or comparison of block types when requested. 510 * 511 * @since 6.9.0 512 * 513 * @see self::$open_blocks_length 514 * 515 * @var int[] 516 */ 517 private $open_blocks_at = array(); 518 519 /** 520 * For every open block, in hierarchical order, this stores the byte length 521 * of the block’s block type in the source text. For HTML spans this is 0. 522 * 523 * @since 6.9.0 524 * 525 * @see self::$open_blocks_at 526 * 527 * @var int[] 528 */ 529 private $open_blocks_length = array(); 530 531 /** 532 * Indicates which operation should apply to the stack of open blocks after 533 * processing any pending spans of HTML. 534 * 535 * Since HTML spans are discovered after matching block delimiters, those 536 * delimiters need to defer modifying the stack of open blocks. This value, 537 * if set, indicates what operation should be applied. The properties 538 * associated with token boundaries still point to the delimiters even 539 * when processing HTML spans, so there’s no need to track them independently. 540 * 541 * @var 'push'|'void'|'pop'|null 542 */ 543 private $next_stack_op = null; 544 545 /** 546 * Creates a new block processor. 547 * 548 * Example: 549 * 550 * $processor = new WP_Block_Processor( $post_content ); 551 * if ( $processor->next_block( 'core/image' ) ) { 552 * echo "Found an image!\n"; 553 * } 554 * 555 * @see self::next_block() to advance to the start of the next block (skips closers). 556 * @see self::next_delimiter() to advance to the next explicit block delimiter. 557 * @see self::next_token() to advance to the next block delimiter or HTML span. 558 * 559 * @since 6.9.0 560 * 561 * @param string $source_text Input document potentially containing block content. 562 */ 563 public function __construct( string $source_text ) { 564 $this->source_text = $source_text; 565 } 566 567 /** 568 * Advance to the next block delimiter which opens a block, indicating if one was found. 569 * 570 * Delimiters which open blocks include opening and void block delimiters. To visit 571 * freeform HTML content, pass the wildcard “*” as the block type. 572 * 573 * Use this function to walk through the blocks in a document, pausing where they open. 574 * 575 * Example blocks: 576 * 577 * // The first delimiter opens the paragraph block. 578 * <⃨!⃨-⃨-⃨ ⃨w⃨p⃨:⃨p⃨a⃨r⃨a⃨g⃨r⃨a⃨p⃨h⃨ ⃨-⃨-⃨>⃨<p>Content</p><!-- /wp:paragraph--> 579 * 580 * // The void block is the first opener in this sequence of closers. 581 * <!-- /wp:group --><⃨!⃨-⃨-⃨ ⃨w⃨p⃨:⃨s⃨p⃨a⃨c⃨e⃨r⃨ ⃨{⃨"⃨h⃨e⃨i⃨g⃨h⃨t⃨"⃨:⃨"⃨2⃨0⃨0⃨p⃨x⃨"⃨}⃨ ⃨/⃨-⃨-⃨>⃨<!-- /wp:group --> 582 * 583 * // If, however, `*` is provided as the block type, freeform content is matched. 584 * <⃨h⃨2⃨>⃨M⃨y⃨ ⃨s⃨y⃨n⃨o⃨p⃨s⃨i⃨s⃨<⃨/⃨h⃨2⃨>⃨\⃨n⃨<!-- wp:my/table-of-contents /--> 585 * 586 * // Inner HTML is never freeform content, and will not be matched even with the wildcard. 587 * <!-- /wp:list-item --></ul><!-- /wp:list --><⃨!⃨-⃨-⃨ ⃨w⃨p⃨:⃨p⃨a⃨r⃨a⃨g⃨r⃨a⃨p⃨h⃨ ⃨-⃨>⃨<p> 588 * 589 * Example: 590 * 591 * // Find all textual ranges of image block opening delimiters. 592 * $images = array(); 593 * $processor = new WP_Block_Processor( $html ); 594 * while ( $processor->next_block( 'core/image' ) ) { 595 * $images[] = $processor->get_span(); 596 * } 597 * 598 * In some cases it may be useful to conditionally visit the implicit freeform 599 * blocks, such as when determining if a post contains freeform content that 600 * isn’t purely whitespace. 601 * 602 * Example: 603 * 604 * $seen_block_types = []; 605 * $block_type = '*'; 606 * $processor = new WP_Block_Processor( $html ); 607 * while ( $processor->next_block( $block_type ) { 608 * // Stop wasting time visiting freeform blocks after one has been found. 609 * if ( 610 * '*' === $block_type && 611 * null === $processor->get_block_type() && 612 * $processor->is_non_whitespace_html() 613 * ) { 614 * $block_type = null; 615 * $seen_block_types['core/freeform'] = true; 616 * continue; 617 * } 618 * 619 * $seen_block_types[ $processor->get_block_type() ] = true; 620 * } 621 * 622 * @since 6.9.0 623 * 624 * @see self::next_delimiter() to advance to the next explicit block delimiter. 625 * @see self::next_token() to advance to the next block delimiter or HTML span. 626 * 627 * @param string|null $block_type Optional. If provided, advance until a block of this type is found. 628 * Default is to stop at any block regardless of its type. 629 * @return bool Whether an opening delimiter for a block was found. 630 */ 631 public function next_block( ?string $block_type = null ): bool { 632 while ( $this->next_delimiter( $block_type ) ) { 633 if ( self::CLOSER !== $this->get_delimiter_type() ) { 634 return true; 635 } 636 } 637 638 return false; 639 } 640 641 /** 642 * Advance to the next block delimiter in a document, indicating if one was found. 643 * 644 * Delimiters may include invalid JSON. This parser does not attempt to parse the 645 * JSON attributes until requested; when invalid, the attributes will be null. This 646 * matches the behavior of {@see \parse_blocks()}. To visit freeform HTML content, 647 * pass the wildcard “*” as the block type. 648 * 649 * Use this function to walk through the block delimiters in a document. 650 * 651 * Example delimiters: 652 * 653 * <!-- wp:paragraph {"dropCap": true} --> 654 * <!-- wp:separator /--> 655 * <!-- /wp:paragraph --> 656 * 657 * // If the wildcard `*` is provided as the block type, freeform content is matched. 658 * <⃨h⃨2⃨>⃨M⃨y⃨ ⃨s⃨y⃨n⃨o⃨p⃨s⃨i⃨s⃨<⃨/⃨h⃨2⃨>⃨\⃨n⃨<!-- wp:my/table-of-contents /--> 659 * 660 * // Inner HTML is never freeform content, and will not be matched even with the wildcard. 661 * ...</ul><⃨!⃨-⃨-⃨ ⃨/⃨w⃨p⃨:⃨l⃨i⃨s⃨t⃨ ⃨-⃨-⃨>⃨<!-- wp:paragraph --><p> 662 * 663 * Example: 664 * 665 * $html = '<!-- wp:void /-->\n<!-- wp:void /-->'; 666 * $processor = new WP_Block_Processor( $html ); 667 * while ( $processor->next_delimiter() { 668 * // Runs twice, seeing both void blocks of type “core/void.” 669 * } 670 * 671 * $processor = new WP_Block_Processor( $html ); 672 * while ( $processor->next_delimiter( '*' ) ) { 673 * // Runs thrice, seeing the void block, the newline span, and the void block. 674 * } 675 * 676 * @since 6.9.0 677 * 678 * @param string|null $block_name Optional. Keep searching until a block of this name is found. 679 * Defaults to visit every block regardless of type. 680 * @return bool Whether a block delimiter was matched. 681 */ 682 public function next_delimiter( ?string $block_name = null ): bool { 683 if ( ! isset( $block_name ) ) { 684 while ( $this->next_token() ) { 685 if ( ! $this->is_html() ) { 686 return true; 687 } 688 } 689 690 return false; 691 } 692 693 while ( $this->next_token() ) { 694 if ( $this->is_block_type( $block_name ) ) { 695 return true; 696 } 697 } 698 699 return false; 700 } 701 702 /** 703 * Advance to the next block delimiter or HTML span in a document, indicating if one was found. 704 * 705 * This function steps through every syntactic chunk in a document. This includes explicit 706 * block comment delimiters, freeform non-block content, and inner HTML segments. 707 * 708 * Example tokens: 709 * 710 * <!-- wp:paragraph {"dropCap": true} --> 711 * <!-- wp:separator /--> 712 * <!-- /wp:paragraph --> 713 * <p>Normal HTML content</p> 714 * Plaintext content too! 715 * 716 * Example: 717 * 718 * // Find span containing wrapping HTML element surrounding inner blocks. 719 * $processor = new WP_Block_Processor( $html ); 720 * if ( ! $processor->next_block( 'gallery' ) ) { 721 * return null; 722 * } 723 * 724 * $containing_span = null; 725 * while ( $processor->next_token() && $processor->is_html() ) { 726 * $containing_span = $processor->get_span(); 727 * } 728 * 729 * This method will visit all HTML spans including those forming freeform non-block 730 * content as well as those which are part of a block’s inner HTML. 731 * 732 * @since 6.9.0 733 * 734 * @return bool Whether a token was matched or the end of the document was reached without finding any. 735 */ 736 public function next_token(): bool { 737 if ( $this->last_error || self::COMPLETE === $this->state || self::INCOMPLETE_INPUT === $this->state ) { 738 return false; 739 } 740 741 // Void tokens automatically pop off the stack of open blocks. 742 if ( $this->was_void ) { 743 array_pop( $this->open_blocks_at ); 744 array_pop( $this->open_blocks_length ); 745 $this->was_void = false; 746 } 747 748 $text = $this->source_text; 749 $end = strlen( $text ); 750 751 /* 752 * Because HTML spans are inferred after finding the next delimiter, it means that 753 * the parser must transition out of that HTML state and reuse the token boundaries 754 * it found after the HTML span. If those boundaries are before the end of the 755 * document it implies that a real delimiter was found; otherwise this must be the 756 * terminating HTML span and the parsing is complete. 757 */ 758 if ( self::HTML_SPAN === $this->state ) { 759 if ( $this->matched_delimiter_at >= $end ) { 760 $this->state = self::COMPLETE; 761 return false; 762 } 763 764 switch ( $this->next_stack_op ) { 765 case 'void': 766 $this->was_void = true; 767 $this->open_blocks_at[] = $this->namespace_at; 768 $this->open_blocks_length[] = $this->name_at + $this->name_length - $this->namespace_at; 769 break; 770 771 case 'push': 772 $this->open_blocks_at[] = $this->namespace_at; 773 $this->open_blocks_length[] = $this->name_at + $this->name_length - $this->namespace_at; 774 break; 775 776 case 'pop': 777 array_pop( $this->open_blocks_at ); 778 array_pop( $this->open_blocks_length ); 779 break; 780 } 781 782 $this->next_stack_op = null; 783 $this->state = self::MATCHED; 784 return true; 785 } 786 787 $this->state = self::READY; 788 $after_prev_delimiter = $this->matched_delimiter_at + $this->matched_delimiter_length; 789 $at = $after_prev_delimiter; 790 791 while ( $at < $end ) { 792 /* 793 * Find the next possible start of a delimiter. 794 * 795 * This follows the behavior in the official block parser, which segments a post 796 * by the block comment delimiters. It is possible for an HTML attribute to contain 797 * what looks like a block comment delimiter but which is actually an HTML attribute 798 * value. In such a case, the parser here will break apart the HTML and create the 799 * block boundary inside the HTML attribute. In other words, the block parser 800 * isolates sections of HTML from each other, even if that leads to malformed markup. 801 * 802 * For a more robust parse, scan through the document with the HTML API and parse 803 * comments once they are matched to see if they are also block delimiters. In 804 * practice, this nuance has not caused any known problems since developing blocks. 805 * 806 * <⃨!⃨-⃨-⃨ /wp:core/paragraph {"dropCap":true} /--> 807 */ 808 $comment_opening_at = strpos( $text, '<!--', $at ); 809 810 /* 811 * Even if the start of a potential block delimiter is not found, the document 812 * might end in a prefix of such, and in that case there is incomplete input. 813 */ 814 if ( false === $comment_opening_at ) { 815 if ( str_ends_with( $text, '<!-' ) ) { 816 $backup = 3; 817 } elseif ( str_ends_with( $text, '<!' ) ) { 818 $backup = 2; 819 } elseif ( str_ends_with( $text, '<' ) ) { 820 $backup = 1; 821 } else { 822 $backup = 0; 823 } 824 825 // Whether or not there is a potential delimiter, there might be an HTML span. 826 if ( $after_prev_delimiter < ( $end - $backup ) ) { 827 $this->state = self::HTML_SPAN; 828 $this->after_previous_delimiter = $after_prev_delimiter; 829 $this->matched_delimiter_at = $end - $backup; 830 $this->matched_delimiter_length = $backup; 831 $this->open_blocks_at[] = $after_prev_delimiter; 832 $this->open_blocks_length[] = 0; 833 $this->was_void = true; 834 return true; 835 } 836 837 /* 838 * In the case that there is the start of an HTML comment, it means that there 839 * might be a block delimiter, but it’s not possible know, therefore it’s incomplete. 840 */ 841 if ( $backup > 0 ) { 842 goto incomplete; 843 } 844 845 // Otherwise this is the end. 846 $this->state = self::COMPLETE; 847 return false; 848 } 849 850 // <!-- ⃨/wp:core/paragraph {"dropCap":true} /--> 851 $opening_whitespace_at = $comment_opening_at + 4; 852 if ( $opening_whitespace_at >= $end ) { 853 goto incomplete; 854 } 855 856 $opening_whitespace_length = strspn( $text, " \t\f\r\n", $opening_whitespace_at ); 857 858 /* 859 * The `wp` prefix cannot come before this point, but it may come after it 860 * depending on the presence of the closer. This is detected next. 861 */ 862 $wp_prefix_at = $opening_whitespace_at + $opening_whitespace_length; 863 if ( $wp_prefix_at >= $end ) { 864 goto incomplete; 865 } 866 867 if ( 0 === $opening_whitespace_length ) { 868 $at = $this->find_html_comment_end( $comment_opening_at, $end ); 869 continue; 870 } 871 872 // <!-- /⃨wp:core/paragraph {"dropCap":true} /--> 873 $has_closer = false; 874 if ( '/' === $text[ $wp_prefix_at ] ) { 875 $has_closer = true; 876 ++$wp_prefix_at; 877 } 878 879 // <!-- /w⃨p⃨:⃨core/paragraph {"dropCap":true} /--> 880 if ( $wp_prefix_at < $end && 0 !== substr_compare( $text, 'wp:', $wp_prefix_at, 3 ) ) { 881 if ( 882 ( $wp_prefix_at + 2 >= $end && str_ends_with( $text, 'wp' ) ) || 883 ( $wp_prefix_at + 1 >= $end && str_ends_with( $text, 'w' ) ) 884 ) { 885 goto incomplete; 886 } 887 888 $at = $this->find_html_comment_end( $comment_opening_at, $end ); 889 continue; 890 } 891 892 /* 893 * If the block contains no namespace, this will end up masquerading with 894 * the block name. It’s easier to first detect the span and then determine 895 * if it’s a namespace of a name. 896 * 897 * <!-- /wp:c⃨o⃨r⃨e⃨/paragraph {"dropCap":true} /--> 898 */ 899 $namespace_at = $wp_prefix_at + 3; 900 if ( $namespace_at >= $end ) { 901 goto incomplete; 902 } 903 904 $start_of_namespace = $text[ $namespace_at ]; 905 906 // The namespace must start with a-z. 907 if ( 'a' > $start_of_namespace || 'z' < $start_of_namespace ) { 908 $at = $this->find_html_comment_end( $comment_opening_at, $end ); 909 continue; 910 } 911 912 $namespace_length = 1 + strspn( $text, 'abcdefghijklmnopqrstuvwxyz0123456789-_', $namespace_at + 1 ); 913 $separator_at = $namespace_at + $namespace_length; 914 if ( $separator_at >= $end ) { 915 goto incomplete; 916 } 917 918 // <!-- /wp:core/⃨paragraph {"dropCap":true} /--> 919 $has_separator = '/' === $text[ $separator_at ]; 920 if ( $has_separator ) { 921 $name_at = $separator_at + 1; 922 923 if ( $name_at >= $end ) { 924 goto incomplete; 925 } 926 927 // <!-- /wp:core/p⃨a⃨r⃨a⃨g⃨r⃨a⃨p⃨h⃨ {"dropCap":true} /--> 928 $start_of_name = $text[ $name_at ]; 929 if ( 'a' > $start_of_name || 'z' < $start_of_name ) { 930 $at = $this->find_html_comment_end( $comment_opening_at, $end ); 931 continue; 932 } 933 934 $name_length = 1 + strspn( $text, 'abcdefghijklmnopqrstuvwxyz0123456789-_', $name_at + 1 ); 935 } else { 936 $name_at = $namespace_at; 937 $name_length = $namespace_length; 938 } 939 940 if ( $name_at + $name_length >= $end ) { 941 goto incomplete; 942 } 943 944 /* 945 * For this next section of the delimiter, it could be the JSON attributes 946 * or it could be the end of the comment. Assume that the JSON is there and 947 * update if it’s not. 948 */ 949 950 // <!-- /wp:core/paragraph ⃨{"dropCap":true} /--> 951 $after_name_whitespace_at = $name_at + $name_length; 952 $after_name_whitespace_length = strspn( $text, " \t\f\r\n", $after_name_whitespace_at ); 953 $json_at = $after_name_whitespace_at + $after_name_whitespace_length; 954 955 if ( $json_at >= $end ) { 956 goto incomplete; 957 } 958 959 if ( 0 === $after_name_whitespace_length ) { 960 $at = $this->find_html_comment_end( $comment_opening_at, $end ); 961 continue; 962 } 963 964 // <!-- /wp:core/paragraph {⃨"dropCap":true} /--> 965 $has_json = '{' === $text[ $json_at ]; 966 $json_length = 0; 967 968 /* 969 * For the final span of the delimiter it's most efficient to find the end of the 970 * HTML comment and work backwards. This prevents complicated parsing inside the 971 * JSON span, which is not allowed to contain the HTML comment terminator. 972 * 973 * This also matches the behavior in the official block parser, 974 * even though it allows for matching invalid JSON content. 975 * 976 * <!-- /wp:core/paragraph {"dropCap":true} /-⃨-⃨>⃨ 977 */ 978 $comment_closing_at = strpos( $text, '-->', $json_at ); 979 if ( false === $comment_closing_at ) { 980 goto incomplete; 981 } 982 983 // <!-- /wp:core/paragraph {"dropCap":true} /⃨--> 984 if ( '/' === $text[ $comment_closing_at - 1 ] ) { 985 $has_void_flag = true; 986 $void_flag_length = 1; 987 } else { 988 $has_void_flag = false; 989 $void_flag_length = 0; 990 } 991 992 /* 993 * If there's no JSON, then the span of text after the name 994 * until the comment closing must be completely whitespace. 995 * Otherwise it’s a normal HTML comment. 996 */ 997 if ( ! $has_json ) { 998 if ( $after_name_whitespace_at + $after_name_whitespace_length === $comment_closing_at - $void_flag_length ) { 999 // This must be a block delimiter! 1000 $this->state = self::MATCHED; 1001 break; 1002 } 1003 1004 $at = $this->find_html_comment_end( $comment_opening_at, $end ); 1005 continue; 1006 } 1007 1008 /* 1009 * There's JSON, so attempt to find its boundary. 1010 * 1011 * @todo It’s likely faster to scan forward instead of in reverse. 1012 * 1013 * <!-- /wp:core/paragraph {"dropCap":true}⃨ ⃨/--> 1014 */ 1015 $after_json_whitespace_length = 0; 1016 for ( $char_at = $comment_closing_at - $void_flag_length - 1; $char_at > $json_at; $char_at-- ) { 1017 $char = $text[ $char_at ]; 1018 1019 switch ( $char ) { 1020 case ' ': 1021 case "\t": 1022 case "\f": 1023 case "\r": 1024 case "\n": 1025 ++$after_json_whitespace_length; 1026 continue 2; 1027 1028 case '}': 1029 $json_length = $char_at - $json_at + 1; 1030 break 2; 1031 1032 default: 1033 ++$at; 1034 continue 3; 1035 } 1036 } 1037 1038 /* 1039 * This covers cases where there is no terminating “}” or where 1040 * mandatory whitespace is missing. 1041 */ 1042 if ( 0 === $json_length || 0 === $after_json_whitespace_length ) { 1043 $at = $this->find_html_comment_end( $comment_opening_at, $end ); 1044 continue; 1045 } 1046 1047 // This must be a block delimiter! 1048 $this->state = self::MATCHED; 1049 break; 1050 } 1051 1052 // The end of the document was reached without a match. 1053 if ( self::MATCHED !== $this->state ) { 1054 $this->state = self::COMPLETE; 1055 return false; 1056 } 1057 1058 /* 1059 * From this point forward, a delimiter has been matched. There 1060 * might also be an HTML span that appears before the delimiter. 1061 */ 1062 1063 $this->after_previous_delimiter = $after_prev_delimiter; 1064 1065 $this->matched_delimiter_at = $comment_opening_at; 1066 $this->matched_delimiter_length = $comment_closing_at + 3 - $comment_opening_at; 1067 1068 $this->namespace_at = $namespace_at; 1069 $this->name_at = $name_at; 1070 $this->name_length = $name_length; 1071 1072 $this->json_at = $json_at; 1073 $this->json_length = $json_length; 1074 1075 /* 1076 * When delimiters contain both the void flag and the closing flag 1077 * they shall be interpreted as void blocks, per the spec parser. 1078 */ 1079 if ( $has_void_flag ) { 1080 $this->type = self::VOID; 1081 $this->next_stack_op = 'void'; 1082 } elseif ( $has_closer ) { 1083 $this->type = self::CLOSER; 1084 $this->next_stack_op = 'pop'; 1085 1086 /* 1087 * @todo Check if the name matches and bail according to the spec parser. 1088 * The default parser doesn’t examine the names. 1089 */ 1090 } else { 1091 $this->type = self::OPENER; 1092 $this->next_stack_op = 'push'; 1093 } 1094 1095 $this->has_closing_flag = $has_closer; 1096 1097 // HTML spans are visited before the delimiter that follows them. 1098 if ( $comment_opening_at > $after_prev_delimiter ) { 1099 $this->state = self::HTML_SPAN; 1100 $this->open_blocks_at[] = $after_prev_delimiter; 1101 $this->open_blocks_length[] = 0; 1102 $this->was_void = true; 1103 1104 return true; 1105 } 1106 1107 // If there were no HTML spans then flush the enqueued stack operations immediately. 1108 switch ( $this->next_stack_op ) { 1109 case 'void': 1110 $this->was_void = true; 1111 $this->open_blocks_at[] = $namespace_at; 1112 $this->open_blocks_length[] = $name_at + $name_length - $namespace_at; 1113 break; 1114 1115 case 'push': 1116 $this->open_blocks_at[] = $namespace_at; 1117 $this->open_blocks_length[] = $name_at + $name_length - $namespace_at; 1118 break; 1119 1120 case 'pop': 1121 array_pop( $this->open_blocks_at ); 1122 array_pop( $this->open_blocks_length ); 1123 break; 1124 } 1125 1126 $this->next_stack_op = null; 1127 1128 return true; 1129 1130 incomplete: 1131 $this->state = self::COMPLETE; 1132 $this->last_error = self::INCOMPLETE_INPUT; 1133 return false; 1134 } 1135 1136 /** 1137 * Returns an array containing the names of the currently-open blocks, in order 1138 * from outermost to innermost, with HTML spans indicated as “#html”. 1139 * 1140 * Example: 1141 * 1142 * // Freeform HTML content is an HTML span. 1143 * $processor = new WP_Block_Processor( 'Just text' ); 1144 * $processor->next_token(); 1145 * array( '#text' ) === $processor->get_breadcrumbs(); 1146 * 1147 * $processor = new WP_Block_Processor( '<!-- wp:a --><!-- wp:b --><!-- wp:c /--><!-- /wp:b --><!-- /wp:a -->' ); 1148 * $processor->next_token(); 1149 * array( 'core/a' ) === $processor->get_breadcrumbs(); 1150 * $processor->next_token(); 1151 * array( 'core/a', 'core/b' ) === $processor->get_breadcrumbs(); 1152 * $processor->next_token(); 1153 * // Void blocks are only open while visiting them. 1154 * array( 'core/a', 'core/b', 'core/c' ) === $processor->get_breadcrumbs(); 1155 * $processor->next_token(); 1156 * // Blocks are closed before visiting their closing delimiter. 1157 * array( 'core/a' ) === $processor->get_breadcrumbs(); 1158 * $processor->next_token(); 1159 * array() === $processor->get_breadcrumbs(); 1160 * 1161 * // Inner HTML is also an HTML span. 1162 * $processor = new WP_Block_Processor( '<!-- wp:a -->Inner HTML<!-- /wp:a -->' ); 1163 * $processor->next_token(); 1164 * $processor->next_token(); 1165 * array( 'core/a', '#html' ) === $processor->get_breadcrumbs(); 1166 * 1167 * @since 6.9.0 1168 * 1169 * @return string[] 1170 */ 1171 public function get_breadcrumbs(): array { 1172 $breadcrumbs = array_fill( 0, count( $this->open_blocks_at ), null ); 1173 1174 /* 1175 * Since HTML spans can only be at the very end, set the normalized block name for 1176 * each open element and then work backwards after creating the array. This allows 1177 * for the elimination of a conditional on each iteration of the loop. 1178 */ 1179 foreach ( $this->open_blocks_at as $i => $at ) { 1180 $block_type = substr( $this->source_text, $at, $this->open_blocks_length[ $i ] ); 1181 $breadcrumbs[ $i ] = self::normalize_block_type( $block_type ); 1182 } 1183 1184 if ( isset( $i ) && 0 === $this->open_blocks_length[ $i ] ) { 1185 $breadcrumbs[ $i ] = '#html'; 1186 } 1187 1188 return $breadcrumbs; 1189 } 1190 1191 /** 1192 * Returns the depth of the open blocks where the processor is currently matched. 1193 * 1194 * Depth increases before visiting openers and void blocks and decreases before 1195 * visiting closers. HTML spans behave like void blocks. 1196 * 1197 * @since 6.9.0 1198 * 1199 * @return int 1200 */ 1201 public function get_depth(): int { 1202 return count( $this->open_blocks_at ); 1203 } 1204 1205 /** 1206 * Extracts a block object, and all inner content, starting at a matched opening 1207 * block delimiter, or at a matched top-level HTML span as freeform HTML content. 1208 * 1209 * Use this function to extract some blocks within a document, but not all. For example, 1210 * one might want to find image galleries, parse them, modify them, and then reserialize 1211 * them in place. 1212 * 1213 * Once this function returns, the parser will be matched on token following the close 1214 * of the given block. 1215 * 1216 * The return type of this method is compatible with the return of {@see \parse_blocks()}. 1217 * 1218 * Example: 1219 * 1220 * $processor = new WP_Block_Processor( $post_content ); 1221 * if ( ! $processor->next_block( 'gallery' ) ) { 1222 * return $post_content; 1223 * } 1224 * 1225 * $gallery_at = $processor->get_span()->start; 1226 * $gallery = $processor->extract_block(); 1227 * $ends_before = $processor->get_span(); 1228 * $ends_before = $ends_before->start ?? strlen( $post_content ); 1229 * 1230 * $new_gallery = update_gallery( $gallery ); 1231 * $new_gallery = serialize_block( $new_gallery ); 1232 * 1233 * return ( 1234 * substr( $post_content, 0, $gallery_at ) . 1235 * $new_gallery . 1236 * substr( $post_content, $ends_before ) 1237 * ); 1238 * 1239 * @since 6.9.0 1240 * 1241 * @return array[]|null { 1242 * Array of block structures. 1243 * 1244 * @type array ...$0 { 1245 * An associative array of a single parsed block object. See WP_Block_Parser_Block. 1246 * 1247 * @type string|null $blockName Name of block. 1248 * @type array $attrs Attributes from block comment delimiters. 1249 * @type array[] $innerBlocks List of inner blocks. An array of arrays that 1250 * have the same structure as this one. 1251 * @type string $innerHTML HTML from inside block comment delimiters. 1252 * @type array $innerContent List of string fragments and null markers where 1253 * inner blocks were found. 1254 * } 1255 * } 1256 */ 1257 public function extract_block(): ?array { 1258 if ( $this->is_html() ) { 1259 $chunk = $this->get_html_content(); 1260 1261 return array( 1262 'blockName' => null, 1263 'attrs' => array(), 1264 'innerBlocks' => array(), 1265 'innerHTML' => $chunk, 1266 'innerContent' => array( $chunk ), 1267 ); 1268 } 1269 1270 $block = array( 1271 'blockName' => $this->get_block_type(), 1272 'attrs' => $this->allocate_and_return_parsed_attributes() ?? array(), 1273 'innerBlocks' => array(), 1274 'innerHTML' => '', 1275 'innerContent' => array(), 1276 ); 1277 1278 $depth = $this->get_depth(); 1279 while ( $this->next_token() && $this->get_depth() > $depth ) { 1280 if ( $this->is_html() ) { 1281 $chunk = $this->get_html_content(); 1282 $block['innerHTML'] .= $chunk; 1283 $block['innerContent'][] = $chunk; 1284 continue; 1285 } 1286 1287 /** 1288 * Inner blocks. 1289 * 1290 * @todo This is a decent place to call {@link \render_block()} 1291 * @todo Use iteration instead of recursion, or at least refactor to tail-call form. 1292 */ 1293 if ( $this->opens_block() ) { 1294 $inner_block = $this->extract_block(); 1295 $block['innerBlocks'][] = $inner_block; 1296 $block['innerContent'][] = null; 1297 } 1298 } 1299 1300 return $block; 1301 } 1302 1303 /** 1304 * Returns the byte-offset after the ending character of an HTML comment, 1305 * assuming the proper starting byte offset. 1306 * 1307 * @since 6.9.0 1308 * 1309 * @param int $comment_starting_at Where the HTML comment started, the leading `<`. 1310 * @param int $search_end Last offset in which to search, for limiting search span. 1311 * @return int Offset after the current HTML comment ends, or `$search_end` if no end was found. 1312 */ 1313 private function find_html_comment_end( int $comment_starting_at, int $search_end ): int { 1314 $text = $this->source_text; 1315 1316 // Find span-of-dashes comments which look like `<!----->`. 1317 $span_of_dashes = strspn( $text, '-', $comment_starting_at + 2 ); 1318 if ( 1319 $comment_starting_at + 2 + $span_of_dashes < $search_end && 1320 '>' === $text[ $comment_starting_at + 2 + $span_of_dashes ] 1321 ) { 1322 return $comment_starting_at + $span_of_dashes + 1; 1323 } 1324 1325 // Otherwise, there are other characters inside the comment, find the first `-->` or `--!>`. 1326 $now_at = $comment_starting_at + 4; 1327 while ( $now_at < $search_end ) { 1328 $dashes_at = strpos( $text, '--', $now_at ); 1329 if ( false === $dashes_at ) { 1330 return $search_end; 1331 } 1332 1333 $closer_must_be_at = $dashes_at + 2 + strspn( $text, '-', $dashes_at + 2 ); 1334 if ( $closer_must_be_at < $search_end && '!' === $text[ $closer_must_be_at ] ) { 1335 $closer_must_be_at++; 1336 } 1337 1338 if ( $closer_must_be_at < $search_end && '>' === $text[ $closer_must_be_at ] ) { 1339 return $closer_must_be_at + 1; 1340 } 1341 1342 $now_at++; 1343 } 1344 1345 return $search_end; 1346 } 1347 1348 /** 1349 * Indicates if the last attempt to parse a block comment delimiter 1350 * failed, if set, otherwise `null` if the last attempt succeeded. 1351 * 1352 * @since 6.9.0 1353 * 1354 * @return string|null Error from last attempt at parsing next block delimiter, 1355 * or `null` if last attempt succeeded. 1356 */ 1357 public function get_last_error(): ?string { 1358 return $this->last_error; 1359 } 1360 1361 /** 1362 * Indicates if the last attempt to parse a block’s JSON attributes failed. 1363 * 1364 * @see \json_last_error() 1365 * 1366 * @since 6.9.0 1367 * 1368 * @return int JSON_ERROR_ code from last attempt to parse block JSON attributes. 1369 */ 1370 public function get_last_json_error(): int { 1371 return $this->last_json_error; 1372 } 1373 1374 /** 1375 * Returns the type of the block comment delimiter. 1376 * 1377 * One of: 1378 * 1379 * - {@see self::OPENER} 1380 * - {@see self::CLOSER} 1381 * - {@see self::VOID} 1382 * - `null` 1383 * 1384 * @since 6.9.0 1385 * 1386 * @return string|null type of the block comment delimiter, if currently matched. 1387 */ 1388 public function get_delimiter_type(): ?string { 1389 switch ( $this->state ) { 1390 case self::HTML_SPAN: 1391 return self::VOID; 1392 1393 case self::MATCHED: 1394 return $this->type; 1395 1396 default: 1397 return null; 1398 } 1399 } 1400 1401 /** 1402 * Returns whether the delimiter contains the closing flag. 1403 * 1404 * This should be avoided except in cases of custom error-handling 1405 * with block closers containing the void flag. For normative use, 1406 * {@see self::get_delimiter_type()}. 1407 * 1408 * @since 6.9.0 1409 * 1410 * @return bool Whether the currently-matched block delimiter contains the closing flag. 1411 */ 1412 public function has_closing_flag(): bool { 1413 return $this->has_closing_flag; 1414 } 1415 1416 /** 1417 * Indicates if the block delimiter represents a block of the given type. 1418 * 1419 * Since the “core” namespace may be implicit, it’s allowable to pass 1420 * either the fully-qualified block type with namespace and block name 1421 * as well as the shorthand version only containing the block name, if 1422 * the desired block is in the “core” namespace. 1423 * 1424 * Since freeform HTML content is non-block content, it has no block type. 1425 * Passing the wildcard “*” will, however, return true for all block types, 1426 * even the implicit freeform content, though not for spans of inner HTML. 1427 * 1428 * Example: 1429 * 1430 * $is_core_paragraph = $processor->is_block_type( 'paragraph' ); 1431 * $is_core_paragraph = $processor->is_block_type( 'core/paragraph' ); 1432 * $is_formula = $processor->is_block_type( 'math-block/formula' ); 1433 * 1434 * @param string $block_type Block type name for the desired block. 1435 * E.g. "paragraph", "core/paragraph", "math-blocks/formula". 1436 * @return bool Whether this delimiter represents a block of the given type. 1437 */ 1438 public function is_block_type( string $block_type ): bool { 1439 if ( '*' === $block_type ) { 1440 return true; 1441 } 1442 1443 // This is a core/freeform text block, it’s special. 1444 if ( $this->is_html() && 0 === ( $this->open_blocks_length[0] ?? null ) ) { 1445 return ( 1446 'core/freeform' === $block_type || 1447 'freeform' === $block_type 1448 ); 1449 } 1450 1451 return $this->are_equal_block_types( $this->source_text, $this->namespace_at, $this->name_at - $this->namespace_at + $this->name_length, $block_type, 0, strlen( $block_type ) ); 1452 } 1453 1454 /** 1455 * Given two spans of text, indicate if they represent identical block types. 1456 * 1457 * This function normalizes block types to account for implicit core namespacing. 1458 * 1459 * Note! This function only returns valid results when the complete block types are 1460 * represented in the span offsets and lengths. This means that the full optional 1461 * namespace and block name must be represented in the input arguments. 1462 * 1463 * Example: 1464 * 1465 * 0 5 10 15 20 25 30 35 40 1466 * $text = '<!-- wp:block --><!-- /wp:core/block -->'; 1467 * 1468 * true === WP_Block_Processor::are_equal_block_types( $text, 9, 5, $text, 27, 10 ); 1469 * false === WP_Block_Processor::are_equal_block_types( $text, 9, 5, 'my/block', 0, 8 ); 1470 * 1471 * @since 6.9.0 1472 * 1473 * @param string $a_text Text in which first block type appears. 1474 * @param int $a_at Byte offset into text in which first block type starts. 1475 * @param int $a_length Byte length of first block type. 1476 * @param string $b_text Text in which second block type appears (may be the same as the first text). 1477 * @param int $b_at Byte offset into text in which second block type starts. 1478 * @param int $b_length Byte length of second block type. 1479 * @return bool Whether the spans of text represent identical block types, normalized for namespacing. 1480 */ 1481 public static function are_equal_block_types( string $a_text, int $a_at, int $a_length, string $b_text, int $b_at, int $b_length ): bool { 1482 $a_ns_length = strcspn( $a_text, '/', $a_at, $a_length ); 1483 $b_ns_length = strcspn( $b_text, '/', $b_at, $b_length ); 1484 1485 $a_has_ns = $a_ns_length !== $a_length; 1486 $b_has_ns = $b_ns_length !== $b_length; 1487 1488 // Both contain namespaces. 1489 if ( $a_has_ns && $b_has_ns ) { 1490 if ( $a_length !== $b_length ) { 1491 return false; 1492 } 1493 1494 $a_block_type = substr( $a_text, $a_at, $a_length ); 1495 1496 return 0 === substr_compare( $b_text, $a_block_type, $b_at, $b_length ); 1497 } 1498 1499 if ( $a_has_ns ) { 1500 $b_block_type = 'core/' . substr( $b_text, $b_at, $b_length ); 1501 1502 return ( 1503 strlen( $b_block_type ) === $a_length && 1504 0 === substr_compare( $a_text, $b_block_type, $a_at, $a_length ) 1505 ); 1506 } 1507 1508 if ( $b_has_ns ) { 1509 $a_block_type = 'core/' . substr( $a_text, $a_at, $a_length ); 1510 1511 return ( 1512 strlen( $a_block_type ) === $b_length && 1513 0 === substr_compare( $b_text, $a_block_type, $b_at, $b_length ) 1514 ); 1515 } 1516 1517 // Neither contains a namespace. 1518 if ( $a_length !== $b_length ) { 1519 return false; 1520 } 1521 1522 $a_name = substr( $a_text, $a_at, $a_length ); 1523 1524 return 0 === substr_compare( $b_text, $a_name, $b_at, $b_length ); 1525 } 1526 1527 /** 1528 * Indicates if the matched delimiter is an opening or void delimiter of the given type, 1529 * if a type is provided, otherwise if it opens any block or implicit freeform HTML content. 1530 * 1531 * This is a helper method to ease handling of code inspecting where blocks start, and for 1532 * checking if the blocks are of a given type. The function is variadic to allow for 1533 * checking if the delimiter opens one of many possible block types. 1534 * 1535 * To advance to the start of a block {@see self::next_block()}. 1536 * 1537 * Example: 1538 * 1539 * $processor = new WP_Block_Processor( $html ); 1540 * while ( $processor->next_delimiter() ) { 1541 * if ( $processor->opens_block( 'core/code', 'syntaxhighlighter/code' ) ) { 1542 * echo "Found code!"; 1543 * continue; 1544 * } 1545 * 1546 * if ( $processor->opens_block( 'core/image' ) ) { 1547 * echo "Found an image!"; 1548 * continue; 1549 * } 1550 * 1551 * if ( $processor->opens_block() ) { 1552 * echo "Found a new block!"; 1553 * } 1554 * } 1555 * 1556 * @since 6.9.0 1557 * 1558 * @see self::is_block_type() 1559 * 1560 * @param string[] $block_type Optional. Is the matched block type one of these? 1561 * If none are provided, will not test block type. 1562 * @return bool Whether the matched block delimiter opens a block, and whether it 1563 * opens a block of one of the given block types, if provided. 1564 */ 1565 public function opens_block( string ...$block_type ): bool { 1566 // HTML spans only open implicit freeform content at the top level. 1567 if ( self::HTML_SPAN === $this->state && 1 !== count( $this->open_blocks_at ) ) { 1568 return false; 1569 } 1570 1571 /* 1572 * Because HTML spans are discovered after the next delimiter is found, 1573 * the delimiter type when visiting HTML spans refers to the type of the 1574 * following delimiter. Therefore the HTML case is handled by checking 1575 * the state and depth of the stack of open block. 1576 */ 1577 if ( self::CLOSER === $this->type && ! $this->is_html() ) { 1578 return false; 1579 } 1580 1581 if ( count( $block_type ) === 0 ) { 1582 return true; 1583 } 1584 1585 foreach ( $block_type as $block ) { 1586 if ( $this->is_block_type( $block ) ) { 1587 return true; 1588 } 1589 } 1590 1591 return false; 1592 } 1593 1594 /** 1595 * Indicates if the matched delimiter is an HTML span. 1596 * 1597 * @since 6.9.0 1598 * 1599 * @see self::is_non_whitespace_html() 1600 * 1601 * @return bool Whether the processor is matched on an HTML span. 1602 */ 1603 public function is_html(): bool { 1604 return self::HTML_SPAN === $this->state; 1605 } 1606 1607 /** 1608 * Indicates if the matched delimiter is an HTML span and comprises more 1609 * than whitespace characters, i.e. contains real content. 1610 * 1611 * Many block serializers introduce newlines between block delimiters, 1612 * so the presence of top-level non-block content does not imply that 1613 * there are “real” freeform HTML blocks. Checking if there is content 1614 * beyond whitespace is a more certain check, such as for determining 1615 * whether to load CSS for the freeform or fallback block type. 1616 * 1617 * @since 6.9.0 1618 * 1619 * @see self::is_html() 1620 * 1621 * @return bool Whether the currently-matched delimiter is an HTML 1622 * span containing non-whitespace text. 1623 */ 1624 public function is_non_whitespace_html(): bool { 1625 if ( ! $this->is_html() ) { 1626 return false; 1627 } 1628 1629 $length = $this->matched_delimiter_at - $this->after_previous_delimiter; 1630 1631 $whitespace_length = strspn( 1632 $this->source_text, 1633 " \t\f\r\n", 1634 $this->after_previous_delimiter, 1635 $length 1636 ); 1637 1638 return $whitespace_length !== $length; 1639 } 1640 1641 /** 1642 * Returns the string content of a matched HTML span, or `null` otherwise. 1643 * 1644 * @since 6.9.0 1645 * 1646 * @return string|null Raw HTML content, or `null` if not currently matched on HTML. 1647 */ 1648 public function get_html_content(): ?string { 1649 if ( ! $this->is_html() ) { 1650 return null; 1651 } 1652 1653 return substr( 1654 $this->source_text, 1655 $this->after_previous_delimiter, 1656 $this->matched_delimiter_at - $this->after_previous_delimiter 1657 ); 1658 } 1659 1660 /** 1661 * Allocates a substring for the block type and returns the fully-qualified 1662 * name, including the namespace, if matched on a delimiter, otherwise `null`. 1663 * 1664 * This function is like {@see self::get_printable_block_type()} but when 1665 * paused on a freeform HTML block, will return `null` instead of “core/freeform”. 1666 * The `null` behavior matches what {@see \parse_blocks()} returns but may not 1667 * be as useful as having a string value. 1668 * 1669 * This function allocates a substring for the given block type. This 1670 * allocation will be small and likely fine in most cases, but it's 1671 * preferable to call {@see self::is_block_type()} if only needing 1672 * to know whether the delimiter is for a given block type, as that 1673 * function is more efficient for this purpose and avoids the allocation. 1674 * 1675 * Example: 1676 * 1677 * // Avoid. 1678 * 'core/paragraph' = $processor->get_block_type(); 1679 * 1680 * // Prefer. 1681 * $processor->is_block_type( 'core/paragraph' ); 1682 * $processor->is_block_type( 'paragraph' ); 1683 * $processor->is_block_type( 'core/freeform' ); 1684 * 1685 * // Freeform HTML content has no block type. 1686 * $processor = new WP_Block_Processor( 'non-block content' ); 1687 * $processor->next_token(); 1688 * null === $processor->get_block_type(); 1689 * 1690 * @since 6.9.0 1691 * 1692 * @see self::are_equal_block_types() 1693 * 1694 * @return string|null Fully-qualified block namespace and type, e.g. "core/paragraph", 1695 * if matched on an explicit delimiter, otherwise `null`. 1696 */ 1697 public function get_block_type(): ?string { 1698 if ( 1699 self::READY === $this->state || 1700 self::COMPLETE === $this->state || 1701 self::INCOMPLETE_INPUT === $this->state 1702 ) { 1703 return null; 1704 } 1705 1706 // This is a core/freeform text block, it’s special. 1707 if ( $this->is_html() ) { 1708 return null; 1709 } 1710 1711 $block_type = substr( $this->source_text, $this->namespace_at, $this->name_at - $this->namespace_at + $this->name_length ); 1712 return self::normalize_block_type( $block_type ); 1713 } 1714 1715 /** 1716 * Allocates a printable substring for the block type and returns the fully-qualified 1717 * name, including the namespace, if matched on a delimiter or freeform block, otherwise `null`. 1718 * 1719 * This function is like {@see self::get_block_type()} but when paused on a freeform 1720 * HTML block, will return “core/freeform” instead of `null`. The `null` behavior matches 1721 * what {@see \parse_blocks()} returns but may not be as useful as having a string value. 1722 * 1723 * This function allocates a substring for the given block type. This 1724 * allocation will be small and likely fine in most cases, but it's 1725 * preferable to call {@see self::is_block_type()} if only needing 1726 * to know whether the delimiter is for a given block type, as that 1727 * function is more efficient for this purpose and avoids the allocation. 1728 * 1729 * Example: 1730 * 1731 * // Avoid. 1732 * 'core/paragraph' = $processor->get_printable_block_type(); 1733 * 1734 * // Prefer. 1735 * $processor->is_block_type( 'core/paragraph' ); 1736 * $processor->is_block_type( 'paragraph' ); 1737 * $processor->is_block_type( 'core/freeform' ); 1738 * 1739 * // Freeform HTML content is given an implicit type. 1740 * $processor = new WP_Block_Processor( 'non-block content' ); 1741 * $processor->next_token(); 1742 * 'core/freeform' === $processor->get_printable_block_type(); 1743 * 1744 * @since 6.9.0 1745 * 1746 * @see self::are_equal_block_types() 1747 * 1748 * @return string|null Fully-qualified block namespace and type, e.g. "core/paragraph", 1749 * if matched on an explicit delimiter or freeform block, otherwise `null`. 1750 */ 1751 public function get_printable_block_type(): ?string { 1752 if ( 1753 self::READY === $this->state || 1754 self::COMPLETE === $this->state || 1755 self::INCOMPLETE_INPUT === $this->state 1756 ) { 1757 return null; 1758 } 1759 1760 // This is a core/freeform text block, it’s special. 1761 if ( $this->is_html() ) { 1762 return 1 === count( $this->open_blocks_at ) 1763 ? 'core/freeform' 1764 : '#innerHTML'; 1765 } 1766 1767 $block_type = substr( $this->source_text, $this->namespace_at, $this->name_at - $this->namespace_at + $this->name_length ); 1768 return self::normalize_block_type( $block_type ); 1769 } 1770 1771 /** 1772 * Normalizes a block name to ensure that missing implicit “core” namespaces are present. 1773 * 1774 * Example: 1775 * 1776 * 'core/paragraph' === WP_Block_Processor::normalize_block_byte( 'paragraph' ); 1777 * 'core/paragraph' === WP_Block_Processor::normalize_block_byte( 'core/paragraph' ); 1778 * 'my/paragraph' === WP_Block_Processor::normalize_block_byte( 'my/paragraph' ); 1779 * 1780 * @since 6.9.0 1781 * 1782 * @param string $block_type Valid block name, potentially without a namespace. 1783 * @return string Fully-qualified block type including namespace. 1784 */ 1785 public static function normalize_block_type( string $block_type ): string { 1786 return false === strpos( $block_type, '/' ) 1787 ? "core/{$block_type}" 1788 : $block_type; 1789 } 1790 1791 /** 1792 * Returns a lazy wrapper around the block attributes, which can be used 1793 * for efficiently interacting with the JSON attributes. 1794 * 1795 * This stub hints that there should be a lazy interface for parsing 1796 * block attributes but doesn’t define it. It serves both as a placeholder 1797 * for one to come as well as a guard against implementing an eager 1798 * function in its place. 1799 * 1800 * @throws Exception This function is a stub for subclasses to implement 1801 * when providing streaming attribute parsing. 1802 * 1803 * @since 6.9.0 1804 * 1805 * @see self::allocate_and_return_parsed_attributes() 1806 * 1807 * @return never 1808 */ 1809 public function get_attributes() { 1810 throw new Exception( 'Lazy attribute parsing not yet supported' ); 1811 } 1812 1813 /** 1814 * Attempts to parse and return the entire JSON attributes from the delimiter, 1815 * allocating memory and processing the JSON span in the process. 1816 * 1817 * This does not return any parsed attributes for a closing block delimiter 1818 * even if there is a span of JSON content; this JSON is a parsing error. 1819 * 1820 * Consider calling {@see static::get_attributes()} instead if it's not 1821 * necessary to read all the attributes at the same time, as that provides 1822 * a more efficient mechanism for typical use cases. 1823 * 1824 * Since the JSON span inside the comment delimiter may not be valid JSON, 1825 * this function will return `null` if it cannot parse the span and set the 1826 * {@see static::get_last_json_error()} to the appropriate JSON_ERROR_ constant. 1827 * 1828 * If the delimiter contains no JSON span, it will also return `null`, 1829 * but the last error will be set to {@see \JSON_ERROR_NONE}. 1830 * 1831 * Example: 1832 * 1833 * $processor = new WP_Block_Processor( '<!-- wp:image {"url": "https://wordpress.org/favicon.ico"} -->' ); 1834 * $processor->next_delimiter(); 1835 * $memory_hungry_and_slow_attributes = $processor->allocate_and_return_parsed_attributes(); 1836 * $memory_hungry_and_slow_attributes === array( 'url' => 'https://wordpress.org/favicon.ico' ); 1837 * 1838 * $processor = new WP_Block_Processor( '<!-- /wp:image {"url": "https://wordpress.org/favicon.ico"} -->' ); 1839 * $processor->next_delimiter(); 1840 * null = $processor->allocate_and_return_parsed_attributes(); 1841 * JSON_ERROR_NONE = $processor->get_last_json_error(); 1842 * 1843 * $processor = new WP_Block_Processor( '<!-- wp:separator {} /-->' ); 1844 * $processor->next_delimiter(); 1845 * array() === $processor->allocate_and_return_parsed_attributes(); 1846 * 1847 * $processor = new WP_Block_Processor( '<!-- wp:separator /-->' ); 1848 * $processor->next_delimiter(); 1849 * null = $processor->allocate_and_return_parsed_attributes(); 1850 * 1851 * $processor = new WP_Block_Processor( '<!-- wp:image {"url} -->' ); 1852 * $processor->next_delimiter(); 1853 * null = $processor->allocate_and_return_parsed_attributes(); 1854 * JSON_ERROR_CTRL_CHAR = $processor->get_last_json_error(); 1855 * 1856 * @since 6.9.0 1857 * 1858 * @return array|null Parsed JSON attributes, if present and valid, otherwise `null`. 1859 */ 1860 public function allocate_and_return_parsed_attributes(): ?array { 1861 $this->last_json_error = JSON_ERROR_NONE; 1862 1863 if ( self::CLOSER === $this->type || $this->is_html() || 0 === $this->json_length ) { 1864 return null; 1865 } 1866 1867 $json_span = substr( $this->source_text, $this->json_at, $this->json_length ); 1868 $parsed = json_decode( $json_span, null, 512, JSON_OBJECT_AS_ARRAY | JSON_INVALID_UTF8_SUBSTITUTE ); 1869 1870 $last_error = json_last_error(); 1871 $this->last_json_error = $last_error; 1872 1873 return ( JSON_ERROR_NONE === $last_error && is_array( $parsed ) ) 1874 ? $parsed 1875 : null; 1876 } 1877 1878 /** 1879 * Returns the span representing the currently-matched delimiter, if matched, otherwise `null`. 1880 * 1881 * Example: 1882 * 1883 * $processor = new WP_Block_Processor( '<!-- wp:void /-->' ); 1884 * null === $processor->get_span(); 1885 * 1886 * $processor->next_delimiter(); 1887 * WP_HTML_Span( 0, 17 ) === $processor->get_span(); 1888 * 1889 * @since 6.9.0 1890 * 1891 * @return WP_HTML_Span|null Span of text in source text spanning matched delimiter. 1892 */ 1893 public function get_span(): ?WP_HTML_Span { 1894 switch ( $this->state ) { 1895 case self::HTML_SPAN: 1896 return new WP_HTML_Span( $this->after_previous_delimiter, $this->matched_delimiter_at - $this->after_previous_delimiter ); 1897 1898 case self::MATCHED: 1899 return new WP_HTML_Span( $this->matched_delimiter_at, $this->matched_delimiter_length ); 1900 1901 default: 1902 return null; 1903 } 1904 } 1905 1906 // 1907 // Constant declarations that would otherwise pollute the top of the class. 1908 // 1909 1910 /** 1911 * Indicates that the block comment delimiter closes an open block. 1912 * 1913 * @see self::$type 1914 * 1915 * @since 6.9.0 1916 */ 1917 const CLOSER = 'closer'; 1918 1919 /** 1920 * Indicates that the block comment delimiter opens a block. 1921 * 1922 * @see self::$type 1923 * 1924 * @since 6.9.0 1925 */ 1926 const OPENER = 'opener'; 1927 1928 /** 1929 * Indicates that the block comment delimiter represents a void block 1930 * with no inner content of any kind. 1931 * 1932 * @see self::$type 1933 * 1934 * @since 6.9.0 1935 */ 1936 const VOID = 'void'; 1937 1938 /** 1939 * Indicates that the processor is ready to start parsing but hasn’t yet begun. 1940 * 1941 * @see self::$state 1942 * 1943 * @since 6.9.0 1944 */ 1945 const READY = 'processor-ready'; 1946 1947 /** 1948 * Indicates that the processor is matched on an explicit block delimiter. 1949 * 1950 * @see self::$state 1951 * 1952 * @since 6.9.0 1953 */ 1954 const MATCHED = 'processor-matched'; 1955 1956 /** 1957 * Indicates that the processor is matched on the opening of an implicit freeform delimiter. 1958 * 1959 * @see self::$state 1960 * 1961 * @since 6.9.0 1962 */ 1963 const HTML_SPAN = 'processor-html-span'; 1964 1965 /** 1966 * Indicates that the parser started parsing a block comment delimiter, but 1967 * the input document ended before it could finish. The document was likely truncated. 1968 * 1969 * @see self::$state 1970 * 1971 * @since 6.9.0 1972 */ 1973 const INCOMPLETE_INPUT = 'incomplete-input'; 1974 1975 /** 1976 * Indicates that the processor has finished parsing and has nothing left to scan. 1977 * 1978 * @see self::$state 1979 * 1980 * @since 6.9.0 1981 */ 1982 const COMPLETE = 'processor-complete'; 1983 }
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated : Sat Oct 18 08:20:04 2025 | Cross-referenced by PHPXref |