A WordPress-centric search engine for devs and theme authors



_mb_substr ›

Since3.2.0
Deprecatedn/a
_mb_substr ( $str, $start, $length = null, $encoding = null )
Parameters: (4)
  • (string) $str The string to extract the substring from.
    Required: Yes
  • (int) $start Position to being extraction from in `$str`.
    Required: Yes
  • (int|null) $length Optional. Maximum number of characters to extract from `$str`. Default null.
    Required: No
    Default: null
  • (string|null) $encoding Optional. Character encoding to use. Default null.
    Required: No
    Default: null
Returns:
  • (string) Extracted substring.
Defined at:
Codex:

Internal compat function to mimic mb_substr().

Only understands UTF-8 and 8bit. All other character sets will be treated as 8bit. For $encoding === UTF-8, the $str input is expected to be a valid UTF-8 byte sequence. The behavior of this function for invalid inputs is undefined.



Source

function _mb_substr( $str, $start, $length = null, $encoding = null ) {
	if ( null === $encoding ) {
		$encoding = get_option( 'blog_charset' );
	}

	/*
	 * The solution below works only for UTF-8, so in case of a different
	 * charset just use built-in substr().
	 */
	if ( ! in_array( $encoding, array( 'utf8', 'utf-8', 'UTF8', 'UTF-8' ) ) ) {
		return is_null( $length ) ? substr( $str, $start ) : substr( $str, $start, $length );
	}

	if ( _wp_can_use_pcre_u() ) {
		// Use the regex unicode support to separate the UTF-8 characters into an array.
		preg_match_all( '/./us', $str, $match );
		$chars = is_null( $length ) ? array_slice( $match[0], $start ) : array_slice( $match[0], $start, $length );
		return implode( '', $chars );
	}

	$regex = '/(
		  [\x00-\x7F]                  # single-byte sequences   0xxxxxxx
		| [\xC2-\xDF][\x80-\xBF]       # double-byte sequences   110xxxxx 10xxxxxx
		| \xE0[\xA0-\xBF][\x80-\xBF]   # triple-byte sequences   1110xxxx 10xxxxxx * 2
		| [\xE1-\xEC][\x80-\xBF]{2}
		| \xED[\x80-\x9F][\x80-\xBF]
		| [\xEE-\xEF][\x80-\xBF]{2}
		| \xF0[\x90-\xBF][\x80-\xBF]{2} # four-byte sequences   11110xxx 10xxxxxx * 3
		| [\xF1-\xF3][\x80-\xBF]{3}
		| \xF4[\x80-\x8F][\x80-\xBF]{2}
	)/x';

	// Start with 1 element instead of 0 since the first thing we do is pop.
	$chars = array( '' );
	do {
		// We had some string left over from the last round, but we counted it in that last round.
		array_pop( $chars );

		/*
		 * Split by UTF-8 character, limit to 1000 characters (last array element will contain
		 * the rest of the string).
		 */
		$pieces = preg_split( $regex, $str, 1000, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );

		$chars = array_merge( $chars, $pieces );

		// If there's anything left over, repeat the loop.
	} while ( count( $pieces ) > 1 && $str = array_pop( $pieces ) );

	return join( '', array_slice( $chars, $start, $length ) );
}