[ Index ]

PHP Cross Reference of WordPress Trunk (Updated Daily)

Search

title

Body

[close]

/wp-includes/ -> compat-utf8.php (summary)

(no description)

File Size: 293 lines (11 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 3 functions

  _wp_scan_utf8()
  _wp_is_valid_utf8_fallback()
  _wp_scrub_utf8_fallback()

Functions
Functions that are not part of a class:

_wp_scan_utf8( string $bytes, int &$at, int &$invalid_length, ?int $max_bytes = null, ?int $max_code_points = null )   X-Ref
Finds spans of valid and invalid UTF-8 bytes in a given string.

This is a low-level tool to power various UTF-8 functionality.
It scans through a string until it finds invalid byte spans.
When it does this, it does three things:

- Assigns `$at` to the position after the last successful code point.
- Assigns `$invalid_length` to the length of the maximal subpart of
the invalid bytes starting at `$at`.
- Returns how many code points were successfully scanned.

This information is enough to build a number of useful UTF-8 functions.

Example:

// ñ is U+F1, which in `ISO-8859-1`/`latin1`/`Windows-1252`/`cp1252` is 0xF1.
"Pi\xF1a" === $pineapple = mb_convert_encoding( "Piña", 'Windows-1252', 'UTF-8' );
$at = $invalid_length = 0;

// The first step finds the invalid 0xF1 byte.
2 === _wp_scan_utf8( $pineapple, $at, $invalid_length );
$at === 2; $invalid_length === 1;

// The second step continues to the end of the string.
1 === _wp_scan_utf8( $pineapple, $at, $invalid_length );
$at === 4; $invalid_length === 0;

Note! This functions many arguments are passed without and “options”
array. This choice is based on the fact that this is a low-level function
and there’s no need to create an array of items on every invocation.

return: int How many code points were successfully scanned.
param: string   $bytes           UTF-8 encoded string which might include invalid spans of bytes.
param: int      $at              Where to start scanning.
param: int      $invalid_length  Will be set to how many bytes are to be ignored after `$at`.
param: int|null $max_bytes       Stop scanning after this many bytes have been seen.
param: int|null $max_code_points Stop scanning after this many code points have been seen.

_wp_is_valid_utf8_fallback( string $bytes )   X-Ref
Fallback mechanism for safely validating UTF-8 bytes.

return: bool Whether the provided bytes can decode as valid UTF-8.
param: string $bytes String which might contain text encoded as UTF-8.

_wp_scrub_utf8_fallback( string $bytes )   X-Ref
Fallback mechanism for replacing invalid spans of UTF-8 bytes.

Example:

'Pi�a' === _wp_scrub_utf8_fallback( "Pi\xF1a" ); // “ñ” is 0xF1 in Windows-1252.

return: string Input string with spans of invalid bytes swapped with the replacement character.
param: string $bytes UTF-8 encoded string which might contain spans of invalid bytes.



Generated : Fri Oct 10 08:20:03 2025 Cross-referenced by PHPXref