| [ Index ] |
PHP Cross Reference of WordPress Trunk (Updated Daily) |
[Summary view] [Print] [Text view]
1 <?php 2 3 /** 4 * Class for efficiently looking up and mapping string keys to string values, with limits. 5 * 6 * @package WordPress 7 * @since 6.6.0 8 */ 9 10 /** 11 * WP_Token_Map class. 12 * 13 * Use this class in specific circumstances with a static set of lookup keys which map to 14 * a static set of transformed values. For example, this class is used to map HTML named 15 * character references to their equivalent UTF-8 values. 16 * 17 * This class works differently than code calling `in_array()` and other methods. It 18 * internalizes lookup logic and provides helper interfaces to optimize lookup and 19 * transformation. It provides a method for precomputing the lookup tables and storing 20 * them as PHP source code. 21 * 22 * All tokens and substitutions must be shorter than 256 bytes. 23 * 24 * Example: 25 * 26 * $smilies = WP_Token_Map::from_array( array( 27 * '8O' => 'π―', 28 * ':(' => 'π', 29 * ':)' => 'π', 30 * ':?' => 'π', 31 * ) ); 32 * 33 * true === $smilies->contains( ':)' ); 34 * false === $smilies->contains( 'simile' ); 35 * 36 * 'π' === $smilies->read_token( 'Not sure :?.', 9, $length_of_smily_syntax ); 37 * 2 === $length_of_smily_syntax; 38 * 39 * ## Precomputing the Token Map. 40 * 41 * Creating the class involves some work sorting and organizing the tokens and their 42 * replacement values. In order to skip this, it's possible for the class to export 43 * its state and be used as actual PHP source code. 44 * 45 * Example: 46 * 47 * // Export with four spaces as the indent, only for the sake of this docblock. 48 * // The default indent is a tab character. 49 * $indent = ' '; 50 * echo $smilies->precomputed_php_source_table( $indent ); 51 * 52 * // Output, to be pasted into a PHP source file: 53 * WP_Token_Map::from_precomputed_table( 54 * array( 55 * "storage_version" => "6.6.0", 56 * "key_length" => 2, 57 * "groups" => "", 58 * "long_words" => array(), 59 * "small_words" => "8O\x00:)\x00:(\x00:?\x00", 60 * "small_mappings" => array( "π―", "π", "π", "π" ) 61 * ) 62 * ); 63 * 64 * ## Large vs. small words. 65 * 66 * This class uses a short prefix called the "key" to optimize lookup of its tokens. 67 * This means that some tokens may be shorter than or equal in length to that key. 68 * Those words that are longer than the key are called "large" while those shorter 69 * than or equal to the key length are called "small." 70 * 71 * This separation of large and small words is incidental to the way this class 72 * optimizes lookup, and should be considered an internal implementation detail 73 * of the class. It may still be important to be aware of it, however. 74 * 75 * ## Determining Key Length. 76 * 77 * The choice of the size of the key length should be based on the data being stored in 78 * the token map. It should divide the data as evenly as possible, but should not create 79 * so many groups that a large fraction of the groups only contain a single token. 80 * 81 * For the HTML5 named character references, a key length of 2 was found to provide a 82 * sufficient spread and should be a good default for relatively large sets of tokens. 83 * 84 * However, for some data sets this might be too long. For example, a list of smilies 85 * may be too small for a key length of 2. Perhaps 1 would be more appropriate. It's 86 * best to experiment and determine empirically which values are appropriate. 87 * 88 * ## Generate Pre-Computed Source Code. 89 * 90 * Since the `WP_Token_Map` is designed for relatively static lookups, it can be 91 * advantageous to precompute the values and instantiate a table that has already 92 * sorted and grouped the tokens and built the lookup strings. 93 * 94 * This can be done with `WP_Token_Map::precomputed_php_source_table()`. 95 * 96 * Note that if there is a leading character that all tokens need, such as `&` for 97 * HTML named character references, it can be beneficial to exclude this from the 98 * token map. Instead, find occurrences of the leading character and then use the 99 * token map to see if the following characters complete the token. 100 * 101 * Example: 102 * 103 * $map = WP_Token_Map::from_array( array( 'simple_smile:' => 'π', 'sob:' => 'π', 'soba:' => 'π' ) ); 104 * echo $map->precomputed_php_source_table(); 105 * // Output 106 * WP_Token_Map::from_precomputed_table( 107 * array( 108 * "storage_version" => "6.6.0", 109 * "key_length" => 2, 110 * "groups" => "si\x00so\x00", 111 * "long_words" => array( 112 * // simple_smile:[π]. 113 * "\x0bmple_smile:\x04π", 114 * // soba:[π] sob:[π]. 115 * "\x03ba:\x04π\x02b:\x04π", 116 * ), 117 * "short_words" => "", 118 * "short_mappings" => array() 119 * } 120 * ); 121 * 122 * This precomputed value can be stored directly in source code and will skip the 123 * startup cost of generating the lookup strings. See `$html5_named_character_entities`. 124 * 125 * Note that any updates to the precomputed format should update the storage version 126 * constant. It would also be best to provide an update function to take older known 127 * versions and upgrade them in place when loading into `from_precomputed_table()`. 128 * 129 * ## Future Direction. 130 * 131 * It may be viable to dynamically increase the length limits such that there's no need to impose them. 132 * The limit appears because of the packing structure, which indicates how many bytes each segment of 133 * text in the lookup tables spans. If, however, care were taken to track the longest word length, then 134 * the packing structure could change its representation to allow for that. Each additional byte storing 135 * length, however, increases the memory overhead and lookup runtime. 136 * 137 * An alternative approach could be to borrow the UTF-8 variable-length encoding and store lengths of less 138 * than 127 as a single byte with the high bit unset, storing longer lengths as the combination of 139 * continuation bytes. 140 * 141 * Since it has not been shown during the development of this class that longer strings are required, this 142 * update is deferred until such a need is clear. 143 * 144 * @since 6.6.0 145 */ 146 class WP_Token_Map { 147 /** 148 * Denotes the version of the code which produces pre-computed source tables. 149 * 150 * This version will be used not only to verify pre-computed data, but also 151 * to upgrade pre-computed data from older versions. Choosing a name that 152 * corresponds to the WordPress release will help people identify where an 153 * old copy of data came from. 154 */ 155 const STORAGE_VERSION = '6.6.0-trunk'; 156 157 /** 158 * Maximum length for each key and each transformed value in the table (in bytes). 159 * 160 * @since 6.6.0 161 */ 162 const MAX_LENGTH = 256; 163 164 /** 165 * How many bytes of each key are used to form a group key for lookup. 166 * This also determines whether a word is considered short or long. 167 * 168 * @since 6.6.0 169 * 170 * @var int 171 */ 172 private $key_length = 2; 173 174 /** 175 * Stores an optimized form of the word set, where words are grouped 176 * by a prefix of the `$key_length` and then collapsed into a string. 177 * 178 * In each group, the keys and lookups form a packed data structure. 179 * The keys in the string are stripped of their "group key," which is 180 * the prefix of length `$this->key_length` shared by all of the items 181 * in the group. Each word in the string is prefixed by a single byte 182 * whose raw unsigned integer value represents how many bytes follow. 183 * 184 * ββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββ 185 * β Length of rest β Rest of key β Length of value β Value β 186 * β of key (bytes) β β (bytes) β β 187 * ββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββ€ 188 * β 0x08 β nterDot; β 0x02 β Β· β 189 * ββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββ 190 * 191 * In this example, the key `CenterDot;` has a group key `Ce`, leaving 192 * eight bytes for the rest of the key, `nterDot;`, and two bytes for 193 * the transformed value `Β·` (or U+B7 or "\xC2\xB7"). 194 * 195 * Example: 196 * 197 * // Stores array( 'CenterDot;' => 'Β·', 'Cedilla;' => 'ΒΈ' ). 198 * $groups = "Ce\x00"; 199 * $large_words = array( "\x08nterDot;\x02Β·\x06dilla;\x02ΒΈ" ) 200 * 201 * The prefixes appear in the `$groups` string, each followed by a null 202 * byte. This makes for quick lookup of where in the group string the key 203 * is found, and then a simple division converts that offset into the index 204 * in the `$large_words` array where the group string is to be found. 205 * 206 * This lookup data structure is designed to optimize cache locality and 207 * minimize indirect memory reads when matching strings in the set. 208 * 209 * @since 6.6.0 210 * 211 * @var array 212 */ 213 private $large_words = array(); 214 215 /** 216 * Stores the group keys for sequential string lookup. 217 * 218 * The offset into this string where the group key appears corresponds with the index 219 * into the group array where the rest of the group string appears. This is an optimization 220 * to improve cache locality while searching and minimize indirect memory accesses. 221 * 222 * @since 6.6.0 223 * 224 * @var string 225 */ 226 private $groups = ''; 227 228 /** 229 * Stores an optimized row of small words, where every entry is 230 * `$this->key_size + 1` bytes long and zero-extended. 231 * 232 * This packing allows for direct lookup of a short word followed 233 * by the null byte, if extended to `$this->key_size + 1`. 234 * 235 * Example: 236 * 237 * // Stores array( 'GT', 'LT', 'gt', 'lt' ). 238 * "GT\x00LT\x00gt\x00lt\x00" 239 * 240 * @since 6.6.0 241 * 242 * @var string 243 */ 244 private $small_words = ''; 245 246 /** 247 * Replacements for the small words, in the same order they appear. 248 * 249 * With the position of a small word it's possible to index the translation 250 * directly, as its position in the `$small_words` string corresponds to 251 * the index of the replacement in the `$small_mapping` array. 252 * 253 * Example: 254 * 255 * array( '>', '<', '>', '<' ) 256 * 257 * @since 6.6.0 258 * 259 * @var string[] 260 */ 261 private $small_mappings = array(); 262 263 /** 264 * Create a token map using an associative array of key/value pairs as the input. 265 * 266 * Example: 267 * 268 * $smilies = WP_Token_Map::from_array( array( 269 * '8O' => 'π―', 270 * ':(' => 'π', 271 * ':)' => 'π', 272 * ':?' => 'π', 273 * ) ); 274 * 275 * @since 6.6.0 276 * 277 * @param array $mappings The keys transform into the values, both are strings. 278 * @param int $key_length Determines the group key length. Leave at the default value 279 * of 2 unless there's an empirical reason to change it. 280 * 281 * @return WP_Token_Map|null Token map, unless unable to create it. 282 */ 283 public static function from_array( array $mappings, int $key_length = 2 ): ?WP_Token_Map { 284 $map = new WP_Token_Map(); 285 $map->key_length = $key_length; 286 287 // Start by grouping words. 288 289 $groups = array(); 290 $shorts = array(); 291 foreach ( $mappings as $word => $mapping ) { 292 if ( 293 self::MAX_LENGTH <= strlen( $word ) || 294 self::MAX_LENGTH <= strlen( $mapping ) 295 ) { 296 _doing_it_wrong( 297 __METHOD__, 298 sprintf( 299 /* translators: 1: maximum byte length (a count) */ 300 __( 'Token Map tokens and substitutions must all be shorter than %1$d bytes.' ), 301 self::MAX_LENGTH 302 ), 303 '6.6.0' 304 ); 305 return null; 306 } 307 308 $length = strlen( $word ); 309 310 if ( $key_length >= $length ) { 311 $shorts[] = $word; 312 } else { 313 $group = substr( $word, 0, $key_length ); 314 315 if ( ! isset( $groups[ $group ] ) ) { 316 $groups[ $group ] = array(); 317 } 318 319 $groups[ $group ][] = array( substr( $word, $key_length ), $mapping ); 320 } 321 } 322 323 /* 324 * Sort the words to ensure that no smaller substring of a match masks the full match. 325 * For example, `Cap` should not match before `CapitalDifferentialD`. 326 */ 327 usort( $shorts, 'WP_Token_Map::longest_first_then_alphabetical' ); 328 foreach ( $groups as $group_key => $group ) { 329 usort( 330 $groups[ $group_key ], 331 static function ( array $a, array $b ): int { 332 return self::longest_first_then_alphabetical( $a[0], $b[0] ); 333 } 334 ); 335 } 336 337 // Finally construct the optimized lookups. 338 339 foreach ( $shorts as $word ) { 340 $map->small_words .= str_pad( $word, $key_length + 1, "\x00", STR_PAD_RIGHT ); 341 $map->small_mappings[] = $mappings[ $word ]; 342 } 343 344 $group_keys = array_keys( $groups ); 345 sort( $group_keys ); 346 347 foreach ( $group_keys as $group ) { 348 $map->groups .= "{$group}\x00"; 349 350 $group_string = ''; 351 352 foreach ( $groups[ $group ] as $group_word ) { 353 list( $word, $mapping ) = $group_word; 354 355 $word_length = pack( 'C', strlen( $word ) ); 356 $mapping_length = pack( 'C', strlen( $mapping ) ); 357 $group_string .= "{$word_length}{$word}{$mapping_length}{$mapping}"; 358 } 359 360 $map->large_words[] = $group_string; 361 } 362 363 return $map; 364 } 365 366 /** 367 * Creates a token map from a pre-computed table. 368 * This skips the initialization cost of generating the table. 369 * 370 * This function should only be used to load data created with 371 * WP_Token_Map::precomputed_php_source_tag(). 372 * 373 * @since 6.6.0 374 * 375 * @param array $state { 376 * Stores pre-computed state for directly loading into a Token Map. 377 * 378 * @type string $storage_version Which version of the code produced this state. 379 * @type int $key_length Group key length. 380 * @type string $groups Group lookup index. 381 * @type array $large_words Large word groups and packed strings. 382 * @type string $small_words Small words packed string. 383 * @type array $small_mappings Small word mappings. 384 * } 385 * 386 * @return WP_Token_Map Map with precomputed data loaded. 387 */ 388 public static function from_precomputed_table( $state ): ?WP_Token_Map { 389 $has_necessary_state = isset( 390 $state['storage_version'], 391 $state['key_length'], 392 $state['groups'], 393 $state['large_words'], 394 $state['small_words'], 395 $state['small_mappings'] 396 ); 397 398 if ( ! $has_necessary_state ) { 399 _doing_it_wrong( 400 __METHOD__, 401 __( 'Missing required inputs to pre-computed WP_Token_Map.' ), 402 '6.6.0' 403 ); 404 return null; 405 } 406 407 if ( self::STORAGE_VERSION !== $state['storage_version'] ) { 408 _doing_it_wrong( 409 __METHOD__, 410 /* translators: 1: version string, 2: version string. */ 411 sprintf( __( 'Loaded version \'%1$s\' incompatible with expected version \'%2$s\'.' ), $state['storage_version'], self::STORAGE_VERSION ), 412 '6.6.0' 413 ); 414 return null; 415 } 416 417 $map = new WP_Token_Map(); 418 419 $map->key_length = $state['key_length']; 420 $map->groups = $state['groups']; 421 $map->large_words = $state['large_words']; 422 $map->small_words = $state['small_words']; 423 $map->small_mappings = $state['small_mappings']; 424 425 return $map; 426 } 427 428 /** 429 * Indicates if a given word is a lookup key in the map. 430 * 431 * Example: 432 * 433 * true === $smilies->contains( ':)' ); 434 * false === $smilies->contains( 'simile' ); 435 * 436 * @since 6.6.0 437 * 438 * @param string $word Determine if this word is a lookup key in the map. 439 * @param string $case_sensitivity Optional. Pass 'ascii-case-insensitive' to ignore ASCII case when matching. Default 'case-sensitive'. 440 * @return bool Whether there's an entry for the given word in the map. 441 */ 442 public function contains( string $word, string $case_sensitivity = 'case-sensitive' ): bool { 443 if ( str_contains( $word, "\x00" ) ) { 444 return false; 445 } 446 447 $ignore_case = 'ascii-case-insensitive' === $case_sensitivity; 448 449 if ( $this->key_length >= strlen( $word ) ) { 450 if ( 0 === strlen( $this->small_words ) ) { 451 return false; 452 } 453 454 $term = str_pad( $word, $this->key_length + 1, "\x00", STR_PAD_RIGHT ); 455 $word_at = $ignore_case ? stripos( $this->small_words, $term ) : strpos( $this->small_words, $term ); 456 if ( false === $word_at ) { 457 return false; 458 } 459 460 return true; 461 } 462 463 $group_key = substr( $word, 0, $this->key_length ); 464 $group_at = $ignore_case ? stripos( $this->groups, $group_key ) : strpos( $this->groups, $group_key ); 465 if ( false === $group_at ) { 466 return false; 467 } 468 $group = $this->large_words[ $group_at / ( $this->key_length + 1 ) ]; 469 $group_length = strlen( $group ); 470 $slug = substr( $word, $this->key_length ); 471 $length = strlen( $slug ); 472 $at = 0; 473 474 while ( $at < $group_length ) { 475 $token_length = unpack( 'C', $group[ $at++ ] )[1]; 476 $token_at = $at; 477 $at += $token_length; 478 $mapping_length = unpack( 'C', $group[ $at++ ] )[1]; 479 $mapping_at = $at; 480 481 if ( $token_length === $length && 0 === substr_compare( $group, $slug, $token_at, $token_length, $ignore_case ) ) { 482 return true; 483 } 484 485 $at = $mapping_at + $mapping_length; 486 } 487 488 return false; 489 } 490 491 /** 492 * If the text starting at a given offset is a lookup key in the map, 493 * return the corresponding transformation from the map, else `false`. 494 * 495 * This function returns the translated string, but accepts an optional 496 * parameter `$matched_token_byte_length`, which communicates how many 497 * bytes long the lookup key was, if it found one. This can be used to 498 * advance a cursor in calling code if a lookup key was found. 499 * 500 * Example: 501 * 502 * false === $smilies->read_token( 'Not sure :?.', 0, $token_byte_length ); 503 * 'π' === $smilies->read_token( 'Not sure :?.', 9, $token_byte_length ); 504 * 2 === $token_byte_length; 505 * 506 * Example: 507 * 508 * while ( $at < strlen( $input ) ) { 509 * $next_at = strpos( $input, ':', $at ); 510 * if ( false === $next_at ) { 511 * break; 512 * } 513 * 514 * $smily = $smilies->read_token( $input, $next_at, $token_byte_length ); 515 * if ( false === $next_at ) { 516 * ++$at; 517 * continue; 518 * } 519 * 520 * $prefix = substr( $input, $at, $next_at - $at ); 521 * $at += $token_byte_length; 522 * $output .= "{$prefix}{$smily}"; 523 * } 524 * 525 * @since 6.6.0 526 * 527 * @param string $text String in which to search for a lookup key. 528 * @param int $offset Optional. How many bytes into the string where the lookup key ought to start. Default 0. 529 * @param int|null &$matched_token_byte_length Optional. Holds byte-length of found token matched, otherwise not set. Default null. 530 * @param string $case_sensitivity Optional. Pass 'ascii-case-insensitive' to ignore ASCII case when matching. Default 'case-sensitive'. 531 * 532 * @return string|null Mapped value of lookup key if found, otherwise `null`. 533 */ 534 public function read_token( string $text, int $offset = 0, &$matched_token_byte_length = null, $case_sensitivity = 'case-sensitive' ): ?string { 535 $ignore_case = 'ascii-case-insensitive' === $case_sensitivity; 536 $text_length = strlen( $text ); 537 538 // Search for a long word first, if the text is long enough, and if that fails, a short one. 539 if ( $text_length > $this->key_length ) { 540 /* 541 * Keys cannot contain null bytes, which is taken care of for the full words, 542 * but here itβs required to reject group keys with null bytes so that the 543 * lookup doesnβt get off track when scanning the group string. 544 */ 545 if ( strcspn( $text, "\x00", $offset, $this->key_length ) < $this->key_length ) { 546 return null; 547 } 548 549 $group_key = substr( $text, $offset, $this->key_length ); 550 $group_at = $ignore_case ? stripos( $this->groups, $group_key ) : strpos( $this->groups, $group_key ); 551 if ( false === $group_at ) { 552 // Perhaps a short word then. 553 return strlen( $this->small_words ) > 0 554 ? $this->read_small_token( $text, $offset, $matched_token_byte_length, $case_sensitivity ) 555 : null; 556 } 557 558 $group = $this->large_words[ $group_at / ( $this->key_length + 1 ) ]; 559 $group_length = strlen( $group ); 560 $at = 0; 561 while ( $at < $group_length ) { 562 $token_length = unpack( 'C', $group[ $at++ ] )[1]; 563 $token = substr( $group, $at, $token_length ); 564 $at += $token_length; 565 $mapping_length = unpack( 'C', $group[ $at++ ] )[1]; 566 $mapping_at = $at; 567 568 if ( 0 === substr_compare( $text, $token, $offset + $this->key_length, $token_length, $ignore_case ) ) { 569 $matched_token_byte_length = $this->key_length + $token_length; 570 return substr( $group, $mapping_at, $mapping_length ); 571 } 572 573 $at = $mapping_at + $mapping_length; 574 } 575 } 576 577 // Perhaps a short word then. 578 return strlen( $this->small_words ) > 0 579 ? $this->read_small_token( $text, $offset, $matched_token_byte_length, $case_sensitivity ) 580 : null; 581 } 582 583 /** 584 * Finds a match for a short word at the index. 585 * 586 * @since 6.6.0 587 * 588 * @param string $text String in which to search for a lookup key. 589 * @param int $offset Optional. How many bytes into the string where the lookup key ought to start. Default 0. 590 * @param int|null &$matched_token_byte_length Optional. Holds byte-length of found lookup key if matched, otherwise not set. Default null. 591 * @param string $case_sensitivity Optional. Pass 'ascii-case-insensitive' to ignore ASCII case when matching. Default 'case-sensitive'. 592 * 593 * @return string|null Mapped value of lookup key if found, otherwise `null`. 594 */ 595 private function read_small_token( string $text, int $offset = 0, &$matched_token_byte_length = null, $case_sensitivity = 'case-sensitive' ): ?string { 596 $ignore_case = 'ascii-case-insensitive' === $case_sensitivity; 597 $small_length = strlen( $this->small_words ); 598 $search_text = substr( $text, $offset, $this->key_length ); 599 if ( $ignore_case ) { 600 $search_text = strtoupper( $search_text ); 601 } 602 $starting_char = $search_text[0]; 603 604 $at = 0; 605 while ( $at < $small_length ) { 606 if ( 607 $starting_char !== $this->small_words[ $at ] && 608 ( ! $ignore_case || strtoupper( $this->small_words[ $at ] ) !== $starting_char ) 609 ) { 610 $at += $this->key_length + 1; 611 continue; 612 } 613 614 for ( $adjust = 1; $adjust < $this->key_length; $adjust++ ) { 615 if ( "\x00" === $this->small_words[ $at + $adjust ] ) { 616 $matched_token_byte_length = $adjust; 617 return $this->small_mappings[ $at / ( $this->key_length + 1 ) ]; 618 } 619 620 if ( 621 $search_text[ $adjust ] !== $this->small_words[ $at + $adjust ] && 622 ( ! $ignore_case || strtoupper( $this->small_words[ $at + $adjust ] !== $search_text[ $adjust ] ) ) 623 ) { 624 $at += $this->key_length + 1; 625 continue 2; 626 } 627 } 628 629 $matched_token_byte_length = $adjust; 630 return $this->small_mappings[ $at / ( $this->key_length + 1 ) ]; 631 } 632 633 return null; 634 } 635 636 /** 637 * Exports the token map into an associate array of key/value pairs. 638 * 639 * Example: 640 * 641 * $smilies->to_array() === array( 642 * '8O' => 'π―', 643 * ':(' => 'π', 644 * ':)' => 'π', 645 * ':?' => 'π', 646 * ); 647 * 648 * @return array The lookup key/substitution values as an associate array. 649 */ 650 public function to_array(): array { 651 $tokens = array(); 652 653 $at = 0; 654 $small_mapping = 0; 655 $small_length = strlen( $this->small_words ); 656 while ( $at < $small_length ) { 657 $key = rtrim( substr( $this->small_words, $at, $this->key_length + 1 ), "\x00" ); 658 $value = $this->small_mappings[ $small_mapping++ ]; 659 $tokens[ $key ] = $value; 660 661 $at += $this->key_length + 1; 662 } 663 664 foreach ( $this->large_words as $index => $group ) { 665 $prefix = substr( $this->groups, $index * ( $this->key_length + 1 ), 2 ); 666 $group_length = strlen( $group ); 667 $at = 0; 668 while ( $at < $group_length ) { 669 $length = unpack( 'C', $group[ $at++ ] )[1]; 670 $key = $prefix . substr( $group, $at, $length ); 671 672 $at += $length; 673 $length = unpack( 'C', $group[ $at++ ] )[1]; 674 $value = substr( $group, $at, $length ); 675 676 $tokens[ $key ] = $value; 677 $at += $length; 678 } 679 } 680 681 return $tokens; 682 } 683 684 /** 685 * Export the token map for quick loading in PHP source code. 686 * 687 * This function has a specific purpose, to make loading of static token maps fast. 688 * It's used to ensure that the HTML character reference lookups add a minimal cost 689 * to initializing the PHP process. 690 * 691 * Example: 692 * 693 * echo $smilies->precomputed_php_source_table(); 694 * 695 * // Output. 696 * WP_Token_Map::from_precomputed_table( 697 * array( 698 * "storage_version" => "6.6.0", 699 * "key_length" => 2, 700 * "groups" => "", 701 * "long_words" => array(), 702 * "small_words" => "8O\x00:)\x00:(\x00:?\x00", 703 * "small_mappings" => array( "π―", "π", "π", "π" ) 704 * ) 705 * ); 706 * 707 * @since 6.6.0 708 * 709 * @param string $indent Optional. Use this string for indentation, or rely on the default horizontal tab character. Default "\t". 710 * @return string Value which can be pasted into a PHP source file for quick loading of table. 711 */ 712 public function precomputed_php_source_table( string $indent = "\t" ): string { 713 $i1 = $indent; 714 $i2 = $i1 . $indent; 715 $i3 = $i2 . $indent; 716 717 $class_version = self::STORAGE_VERSION; 718 719 $output = self::class . "::from_precomputed_table(\n"; 720 $output .= "{$i1}array(\n"; 721 $output .= "{$i2}\"storage_version\" => \"{$class_version}\",\n"; 722 $output .= "{$i2}\"key_length\" => {$this->key_length},\n"; 723 724 $group_line = str_replace( "\x00", "\\x00", $this->groups ); 725 $output .= "{$i2}\"groups\" => \"{$group_line}\",\n"; 726 727 $output .= "{$i2}\"large_words\" => array(\n"; 728 729 $prefixes = explode( "\x00", $this->groups ); 730 foreach ( $prefixes as $index => $prefix ) { 731 if ( '' === $prefix ) { 732 break; 733 } 734 $group = $this->large_words[ $index ]; 735 $group_length = strlen( $group ); 736 $comment_line = "{$i3}//"; 737 $data_line = "{$i3}\""; 738 $at = 0; 739 while ( $at < $group_length ) { 740 $token_length = unpack( 'C', $group[ $at++ ] )[1]; 741 $token = substr( $group, $at, $token_length ); 742 $at += $token_length; 743 $mapping_length = unpack( 'C', $group[ $at++ ] )[1]; 744 $mapping = substr( $group, $at, $mapping_length ); 745 $at += $mapping_length; 746 747 $token_digits = str_pad( dechex( $token_length ), 2, '0', STR_PAD_LEFT ); 748 $mapping_digits = str_pad( dechex( $mapping_length ), 2, '0', STR_PAD_LEFT ); 749 750 $mapping = preg_replace_callback( 751 "~[\\x00-\\x1f\\x22\\x5c]~", 752 static function ( $match_result ) { 753 switch ( $match_result[0] ) { 754 case '"': 755 return '\\"'; 756 757 case '\\': 758 return '\\\\'; 759 760 default: 761 $hex = dechex( ord( $match_result[0] ) ); 762 return "\\x{$hex}"; 763 } 764 }, 765 $mapping 766 ); 767 768 $comment_line .= " {$prefix}{$token}[{$mapping}]"; 769 $data_line .= "\\x{$token_digits}{$token}\\x{$mapping_digits}{$mapping}"; 770 } 771 $comment_line .= ".\n"; 772 $data_line .= "\",\n"; 773 774 $output .= $comment_line; 775 $output .= $data_line; 776 } 777 778 $output .= "{$i2}),\n"; 779 780 $small_words = array(); 781 $small_length = strlen( $this->small_words ); 782 $at = 0; 783 while ( $at < $small_length ) { 784 $small_words[] = substr( $this->small_words, $at, $this->key_length + 1 ); 785 $at += $this->key_length + 1; 786 } 787 788 $small_text = str_replace( "\x00", '\x00', implode( '', $small_words ) ); 789 $output .= "{$i2}\"small_words\" => \"{$small_text}\",\n"; 790 791 $output .= "{$i2}\"small_mappings\" => array(\n"; 792 foreach ( $this->small_mappings as $mapping ) { 793 $output .= "{$i3}\"{$mapping}\",\n"; 794 } 795 $output .= "{$i2})\n"; 796 $output .= "{$i1})\n"; 797 $output .= ')'; 798 799 return $output; 800 } 801 802 /** 803 * Compares two strings, returning the longest, or whichever 804 * is first alphabetically if they are the same length. 805 * 806 * This is an important sort when building the token map because 807 * it should not form a match on a substring of a longer potential 808 * match. For example, it should not detect `Cap` when matching 809 * against the string `CapitalDifferentialD`. 810 * 811 * @since 6.6.0 812 * 813 * @param string $a First string to compare. 814 * @param string $b Second string to compare. 815 * @return int -1 or lower if `$a` is less than `$b`; 1 or greater if `$a` is greater than `$b`, and 0 if they are equal. 816 */ 817 private static function longest_first_then_alphabetical( string $a, string $b ): int { 818 if ( $a === $b ) { 819 return 0; 820 } 821 822 $length_a = strlen( $a ); 823 $length_b = strlen( $b ); 824 825 // Longer strings are less-than for comparison's sake. 826 if ( $length_a !== $length_b ) { 827 return $length_b - $length_a; 828 } 829 830 return strcmp( $a, $b ); 831 } 832 }
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
| Generated : Sat Jun 27 08:20:12 2026 | Cross-referenced by PHPXref |