[ Index ] |
PHP Cross Reference of WordPress Trunk (Updated Daily) |
[Source view] [Print] [Project Stats]
(no description)
File Size: | 293 lines (11 kb) |
Included or required: | 0 times |
Referenced: | 0 times |
Includes or requires: | 0 files |
_wp_scan_utf8( string $bytes, int &$at, int &$invalid_length, ?int $max_bytes = null, ?int $max_code_points = null ) X-Ref |
Finds spans of valid and invalid UTF-8 bytes in a given string. This is a low-level tool to power various UTF-8 functionality. It scans through a string until it finds invalid byte spans. When it does this, it does three things: - Assigns `$at` to the position after the last successful code point. - Assigns `$invalid_length` to the length of the maximal subpart of the invalid bytes starting at `$at`. - Returns how many code points were successfully scanned. This information is enough to build a number of useful UTF-8 functions. Example: // ñ is U+F1, which in `ISO-8859-1`/`latin1`/`Windows-1252`/`cp1252` is 0xF1. "Pi\xF1a" === $pineapple = mb_convert_encoding( "Piña", 'Windows-1252', 'UTF-8' ); $at = $invalid_length = 0; // The first step finds the invalid 0xF1 byte. 2 === _wp_scan_utf8( $pineapple, $at, $invalid_length ); $at === 2; $invalid_length === 1; // The second step continues to the end of the string. 1 === _wp_scan_utf8( $pineapple, $at, $invalid_length ); $at === 4; $invalid_length === 0; Note! This functions many arguments are passed without and “options” array. This choice is based on the fact that this is a low-level function and there’s no need to create an array of items on every invocation. return: int How many code points were successfully scanned. param: string $bytes UTF-8 encoded string which might include invalid spans of bytes. param: int $at Where to start scanning. param: int $invalid_length Will be set to how many bytes are to be ignored after `$at`. param: int|null $max_bytes Stop scanning after this many bytes have been seen. param: int|null $max_code_points Stop scanning after this many code points have been seen. |
_wp_is_valid_utf8_fallback( string $bytes ) X-Ref |
Fallback mechanism for safely validating UTF-8 bytes. return: bool Whether the provided bytes can decode as valid UTF-8. param: string $bytes String which might contain text encoded as UTF-8. |
_wp_scrub_utf8_fallback( string $bytes ) X-Ref |
Fallback mechanism for replacing invalid spans of UTF-8 bytes. Example: 'Pi�a' === _wp_scrub_utf8_fallback( "Pi\xF1a" ); // “ñ” is 0xF1 in Windows-1252. return: string Input string with spans of invalid bytes swapped with the replacement character. param: string $bytes UTF-8 encoded string which might contain spans of invalid bytes. |
Generated : Fri Oct 10 08:20:03 2025 | Cross-referenced by PHPXref |