| [ Index ] |
PHP Cross Reference of WordPress Trunk (Updated Daily) |
[Source view] [Print] [Project Stats]
(no description)
| File Size: | 565 lines (20 kb) |
| Included or required: | 0 times |
| Referenced: | 0 times |
| Includes or requires: | 0 files |
| _wp_scan_utf8( string $bytes, int &$at, int &$invalid_length, ?int $max_bytes = null, ?int $max_code_points = null, ?bool &$has_noncharacters = null ) X-Ref |
| Finds spans of valid and invalid UTF-8 bytes in a given string. This is a low-level tool to power various UTF-8 functionality. It scans through a string until it finds invalid byte spans. When it does this, it does three things: - Assigns `$at` to the position after the last successful code point. - Assigns `$invalid_length` to the length of the maximal subpart of the invalid bytes starting at `$at`. - Returns how many code points were successfully scanned. This information is enough to build a number of useful UTF-8 functions. Example: // ñ is U+F1, which in `ISO-8859-1`/`latin1`/`Windows-1252`/`cp1252` is 0xF1. "Pi\xF1a" === $pineapple = mb_convert_encoding( "Piña", 'Windows-1252', 'UTF-8' ); $at = $invalid_length = 0; // The first step finds the invalid 0xF1 byte. 2 === _wp_scan_utf8( $pineapple, $at, $invalid_length ); $at === 2; $invalid_length === 1; // The second step continues to the end of the string. 1 === _wp_scan_utf8( $pineapple, $at, $invalid_length ); $at === 4; $invalid_length === 0; Note! While passing an options array here might be convenient from a calling-code standpoint, this function is intended to serve as a very low-level foundation upon which to build higher level functionality. For the sake of keeping costs explicit all arguments are passed directly. param: string $bytes UTF-8 encoded string which might include invalid spans of bytes. param: int $at Where to start scanning. param: int $invalid_length Will be set to how many bytes are to be ignored after `$at`. param: int|null $max_bytes Stop scanning after this many bytes have been seen. param: int|null $max_code_points Stop scanning after this many code points have been seen. param: bool|null $has_noncharacters Set to indicate if scanned string contained noncharacters. return: int How many code points were successfully scanned. |
| _wp_is_valid_utf8_fallback( string $bytes ) X-Ref |
| Fallback mechanism for safely validating UTF-8 bytes. param: string $bytes String which might contain text encoded as UTF-8. return: bool Whether the provided bytes can decode as valid UTF-8. |
| _wp_scrub_utf8_fallback( string $bytes ) X-Ref |
| Fallback mechanism for replacing invalid spans of UTF-8 bytes. Example: 'Pi�a' === _wp_scrub_utf8_fallback( "Pi\xF1a" ); // “ñ” is 0xF1 in Windows-1252. param: string $bytes UTF-8 encoded string which might contain spans of invalid bytes. return: string Input string with spans of invalid bytes swapped with the replacement character. |
| _wp_utf8_codepoint_count( string $text, ?int $byte_offset = 0, ?int $max_byte_length = PHP_INT_MAX ) X-Ref |
| Returns how many code points are found in the given UTF-8 string. Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling `mb_strlen( $text, 'UTF-8' )`. When negative values are provided for the byte offsets or length, this will always report zero code points. Example: 4 === _wp_utf8_codepoint_count( 'text' ); // Groups are 'test', "\x90" as '�', 'wp', "\xE2\x80" as '�', "\xC0" as '�', and 'test'. 13 === _wp_utf8_codepoint_count( "test\x90wp\xE2\x80\xC0test" ); param: string $text Count code points in this string. param: ?int $byte_offset Start counting after this many bytes in `$text`. Must be positive. param: ?int $max_byte_length Optional. Stop counting after having scanned past this many bytes. return: int How many code points were found. |
| _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ) X-Ref |
| Given a starting offset within a string and a maximum number of code points, return how many bytes are occupied by the span of characters. Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling `strlen( mb_substr( substr( $text, $at ), 0, $max_code_points ) )`. param: string $text Count bytes of span in this text. param: int $byte_offset Start counting at this byte offset. param: int $max_code_points Stop counting after this many code points have been seen, param: ?int $found_code_points Optional. Will be set to number of found code points in return: int Number of bytes spanned by the code points. |
| _wp_has_noncharacters_fallback( string $text ) X-Ref |
| Fallback support for determining if a string contains Unicode noncharacters. param: string $text Are there noncharacters in this string? return: bool Whether noncharacters were found in the string. |
| _wp_utf8_encode_fallback( $iso_8859_1_text ) X-Ref |
| Converts a string from ISO-8859-1 to UTF-8, maintaining backwards compatibility with the deprecated function from the PHP standard library. param: string $iso_8859_1_text Text treated as ISO-8859-1 (latin1) bytes. return: string Text converted into UTF-8. |
| _wp_utf8_decode_fallback( $utf8_text ) X-Ref |
| Converts a string from UTF-8 to ISO-8859-1, maintaining backwards compatibility with the deprecated function from the PHP standard library. param: string $utf8_text Text treated as UTF-8 bytes. return: string Text converted into ISO-8859-1. |
| Generated : Thu Oct 30 08:20:06 2025 | Cross-referenced by PHPXref |