[ Index ] |
PHP Cross Reference of WordPress Trunk (Updated Daily) |
[Source view] [Print] [Project Stats]
HTML API: WP_HTML_Tag_Processor class Scans through an HTML document to find specific tags, then transforms those tags by adding, removing, or updating the values of the HTML attributes within that tag (opener).
File Size: | 4558 lines (147 kb) |
Included or required: | 0 times |
Referenced: | 0 times |
Includes or requires: | 0 files |
WP_HTML_Tag_Processor:: (47 methods):
__construct()
change_parsing_namespace()
next_tag()
next_token()
base_class_next_token()
paused_at_incomplete_token()
class_list()
has_class()
set_bookmark()
release_bookmark()
skip_rawtext()
skip_rcdata()
skip_script_data()
parse_next_tag()
parse_next_attribute()
skip_whitespace()
after_tag()
class_name_updates_to_attributes_updates()
apply_attributes_updates()
has_bookmark()
seek()
sort_start_ascending()
get_enqueued_attribute_value()
get_attribute()
get_attribute_names_with_prefix()
get_namespace()
get_tag()
get_qualified_tag_name()
get_qualified_attribute_name()
has_self_closing_flag()
is_tag_closer()
get_token_type()
get_token_name()
get_comment_type()
get_full_comment_text()
subdivide_text_appropriately()
get_modifiable_text()
set_modifiable_text()
set_attribute()
remove_attribute()
add_class()
remove_class()
__toString()
get_updated_html()
parse_query()
matches()
get_doctype_info()
Class: WP_HTML_Tag_Processor - X-Ref
Core class used to modify attributes in an HTML document for tags matching a query.__construct( $html ) X-Ref |
Constructor. param: string $html HTML to process. |
change_parsing_namespace( string $new_namespace ) X-Ref |
Switches parsing mode into a new namespace, such as when encountering an SVG tag and entering foreign content. return: bool Whether the namespace was valid and changed. param: string $new_namespace One of 'html', 'svg', or 'math' indicating into what |
next_tag( $query = null ) X-Ref |
Finds the next tag matching the $query. return: bool Whether a tag was matched. param: array|string|null $query { |
next_token() X-Ref |
Finds the next token in the HTML document. An HTML document can be viewed as a stream of tokens, where tokens are things like HTML tags, HTML comments, text nodes, etc. This method finds the next token in the HTML document and returns whether it found one. If it starts parsing a token and reaches the end of the document then it will seek to the start of the last token and pause, returning `false` to indicate that it failed to find a complete token. Possible token types, based on the HTML specification: - an HTML tag, whether opening, closing, or void. - a text node - the plaintext inside tags. - an HTML comment. - a DOCTYPE declaration. - a processing instruction, e.g. `<?xml version="1.0" ?>`. The Tag Processor currently only supports the tag token. return: bool Whether a token was parsed. |
base_class_next_token() X-Ref |
Internal method which finds the next token in the HTML document. This method is a protected internal function which implements the logic for finding the next token in a document. It exists so that the parser can update its state without affecting the location of the cursor in the document and without triggering subclass methods for things like `next_token()`, e.g. when applying patches before searching for the next token. return: bool Whether a token was parsed. |
paused_at_incomplete_token() X-Ref |
Whether the processor paused because the input HTML document ended in the middle of a syntax element, such as in the middle of a tag. Example: $processor = new WP_HTML_Tag_Processor( '<input type="text" value="Th' ); false === $processor->get_next_tag(); true === $processor->paused_at_incomplete_token(); return: bool Whether the parse paused at the start of an incomplete token. |
class_list() X-Ref |
Generator for a foreach loop to step through each class name for the matched tag. This generator function is designed to be used inside a "foreach" loop. Example: $p = new WP_HTML_Tag_Processor( "<div class='free <egg<\tlang-en'>" ); $p->next_tag(); foreach ( $p->class_list() as $class_name ) { echo "{$class_name} "; } // Outputs: "free <egg> lang-en " |
has_class( $wanted_class ) X-Ref |
Returns if a matched tag contains the given ASCII case-insensitive class name. return: bool|null Whether the matched tag contains the given class name, or null if not matched. param: string $wanted_class Look for this CSS class name, ASCII case-insensitive. |
set_bookmark( $name ) X-Ref |
Sets a bookmark in the HTML document. Bookmarks represent specific places or tokens in the HTML document, such as a tag opener or closer. When applying edits to a document, such as setting an attribute, the text offsets of that token may shift; the bookmark is kept updated with those shifts and remains stable unless the entire span of text in which the token sits is removed. Release bookmarks when they are no longer needed. Example: <main><h2>Surprising fact you may not know!</h2></main> ^ ^ \-|-- this `H2` opener bookmark tracks the token <main class="clickbait"><h2>Surprising fact you may no… ^ ^ \-|-- it shifts with edits Bookmarks provide the ability to seek to a previously-scanned place in the HTML document. This avoids the need to re-scan the entire document. Example: <ul><li>One</li><li>Two</li><li>Three</li></ul> ^^^^ want to note this last item $p = new WP_HTML_Tag_Processor( $html ); $in_list = false; while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) { if ( 'UL' === $p->get_tag() ) { if ( $p->is_tag_closer() ) { $in_list = false; $p->set_bookmark( 'resume' ); if ( $p->seek( 'last-li' ) ) { $p->add_class( 'last-li' ); } $p->seek( 'resume' ); $p->release_bookmark( 'last-li' ); $p->release_bookmark( 'resume' ); } else { $in_list = true; } } if ( 'LI' === $p->get_tag() ) { $p->set_bookmark( 'last-li' ); } } Bookmarks intentionally hide the internal string offsets to which they refer. They are maintained internally as updates are applied to the HTML document and therefore retain their "position" - the location to which they originally pointed. The inability to use bookmarks with functions like `substr` is therefore intentional to guard against accidentally breaking the HTML. Because bookmarks allocate memory and require processing for every applied update, they are limited and require a name. They should not be created with programmatically-made names, such as "li_{$index}" with some loop. As a general rule they should only be created with string-literal names like "start-of-section" or "last-paragraph". Bookmarks are a powerful tool to enable complicated behavior. Consider double-checking that you need this tool if you are reaching for it, as inappropriate use could lead to broken HTML structure or unwanted processing overhead. return: bool Whether the bookmark was successfully created. param: string $name Identifies this particular bookmark. |
release_bookmark( $name ) X-Ref |
Removes a bookmark that is no longer needed. Releasing a bookmark frees up the small performance overhead it requires. return: bool Whether the bookmark already existed before removal. param: string $name Name of the bookmark to remove. |
skip_rawtext( string $tag_name ) X-Ref |
Skips contents of generic rawtext elements. return: bool Whether an end to the RAWTEXT region was found before the end of the document. param: string $tag_name The uppercase tag name which will close the RAWTEXT region. |
skip_rcdata( string $tag_name ) X-Ref |
Skips contents of RCDATA elements, namely title and textarea tags. return: bool Whether an end to the RCDATA region was found before the end of the document. param: string $tag_name The uppercase tag name which will close the RCDATA region. |
skip_script_data() X-Ref |
Skips contents of script tags. return: bool Whether the script tag was closed before the end of the document. |
parse_next_tag() X-Ref |
Parses the next tag. This will find and start parsing the next tag, including the opening `<`, the potential closer `/`, and the tag name. It does not parse the attributes or scan to the closing `>`; these are left for other methods. return: bool Whether a tag was found before the end of the document. |
parse_next_attribute() X-Ref |
Parses the next attribute. return: bool Whether an attribute was found before the end of the document. |
skip_whitespace() X-Ref |
Move the internal cursor past any immediate successive whitespace. |
after_tag() X-Ref |
Applies attribute updates and cleans up once a tag is fully parsed. |
class_name_updates_to_attributes_updates() X-Ref |
Converts class name updates into tag attributes updates (they are accumulated in different data formats for performance). |
apply_attributes_updates( int $shift_this_point ) X-Ref |
Applies attribute updates to HTML document. return: int How many bytes the given pointer moved in response to the updates. param: int $shift_this_point Accumulate and return shift for this position. |
has_bookmark( $bookmark_name ) X-Ref |
Checks whether a bookmark with the given name exists. return: bool Whether that bookmark exists. param: string $bookmark_name Name to identify a bookmark that potentially exists. |
seek( $bookmark_name ) X-Ref |
Move the internal cursor in the Tag Processor to a given bookmark's location. In order to prevent accidental infinite loops, there's a maximum limit on the number of times seek() can be called. return: bool Whether the internal cursor was successfully moved to the bookmark's location. param: string $bookmark_name Jump to the place in the document identified by this bookmark name. |
sort_start_ascending( WP_HTML_Text_Replacement $a, WP_HTML_Text_Replacement $b ) X-Ref |
Compare two WP_HTML_Text_Replacement objects. return: int Comparison value for string order. param: WP_HTML_Text_Replacement $a First attribute update. param: WP_HTML_Text_Replacement $b Second attribute update. |
get_enqueued_attribute_value( string $comparable_name ) X-Ref |
Return the enqueued value for a given attribute, if one exists. Enqueued updates can take different data types: - If an update is enqueued and is boolean, the return will be `true` - If an update is otherwise enqueued, the return will be the string value of that update. - If an attribute is enqueued to be removed, the return will be `null` to indicate that. - If no updates are enqueued, the return will be `false` to differentiate from "removed." return: string|boolean|null Value of enqueued update if present, otherwise false. param: string $comparable_name The attribute name in its comparable form. |
get_attribute( $name ) X-Ref |
Returns the value of a requested attribute from a matched tag opener if that attribute exists. Example: $p = new WP_HTML_Tag_Processor( '<div enabled class="test" data-test-id="14">Test</div>' ); $p->next_tag( array( 'class_name' => 'test' ) ) === true; $p->get_attribute( 'data-test-id' ) === '14'; $p->get_attribute( 'enabled' ) === true; $p->get_attribute( 'aria-label' ) === null; $p->next_tag() === false; $p->get_attribute( 'class' ) === null; return: string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`. param: string $name Name of attribute whose value is requested. |
get_attribute_names_with_prefix( $prefix ) X-Ref |
Gets lowercase names of all attributes matching a given prefix in the current tag. Note that matching is case-insensitive. This is in accordance with the spec: > There must never be two or more attributes on > the same start tag whose names are an ASCII > case-insensitive match for each other. - HTML 5 spec Example: $p = new WP_HTML_Tag_Processor( '<div data-ENABLED class="test" DATA-test-id="14">Test</div>' ); $p->next_tag( array( 'class_name' => 'test' ) ) === true; $p->get_attribute_names_with_prefix( 'data-' ) === array( 'data-enabled', 'data-test-id' ); $p->next_tag() === false; $p->get_attribute_names_with_prefix( 'data-' ) === null; return: array|null List of attribute names, or `null` when no tag opener is matched. param: string $prefix Prefix of requested attribute names. |
get_namespace() X-Ref |
Returns the namespace of the matched token. return: string One of 'html', 'math', or 'svg'. |
get_tag() X-Ref |
Returns the uppercase name of the matched tag. Example: $p = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' ); $p->next_tag() === true; $p->get_tag() === 'DIV'; $p->next_tag() === false; $p->get_tag() === null; return: string|null Name of currently matched tag in input HTML, or `null` if none found. |
get_qualified_tag_name() X-Ref |
Returns the adjusted tag name for a given token, taking into account the current parsing context, whether HTML, SVG, or MathML. return: string|null Name of current tag name. |
get_qualified_attribute_name( $attribute_name ) X-Ref |
Returns the adjusted attribute name for a given attribute, taking into account the current parsing context, whether HTML, SVG, or MathML. return: string|null param: string $attribute_name Which attribute to adjust. |
has_self_closing_flag() X-Ref |
Indicates if the currently matched tag contains the self-closing flag. No HTML elements ought to have the self-closing flag and for those, the self-closing flag will be ignored. For void elements this is benign because they "self close" automatically. For non-void HTML elements though problems will appear if someone intends to use a self-closing element in place of that element with an empty body. For HTML foreign elements and custom elements the self-closing flag determines if they self-close or not. This function does not determine if a tag is self-closing, but only if the self-closing flag is present in the syntax. return: bool Whether the currently matched tag contains the self-closing flag. |
is_tag_closer() X-Ref |
Indicates if the current tag token is a tag closer. Example: $p = new WP_HTML_Tag_Processor( '<div></div>' ); $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); $p->is_tag_closer() === false; $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); $p->is_tag_closer() === true; return: bool Whether the current tag is a tag closer. |
get_token_type() X-Ref |
Indicates the kind of matched token, if any. This differs from `get_token_name()` in that it always returns a static string indicating the type, whereas `get_token_name()` may return values derived from the token itself, such as a tag name or processing instruction tag. Possible values: - `#tag` when matched on a tag. - `#text` when matched on a text node. - `#cdata-section` when matched on a CDATA node. - `#comment` when matched on a comment. - `#doctype` when matched on a DOCTYPE declaration. - `#presumptuous-tag` when matched on an empty tag closer. - `#funky-comment` when matched on a funky comment. return: string|null What kind of token is matched, or null. |
get_token_name() X-Ref |
Returns the node name represented by the token. This matches the DOM API value `nodeName`. Some values are static, such as `#text` for a text node, while others are dynamically generated from the token itself. Dynamic names: - Uppercase tag name for tag matches. - `html` for DOCTYPE declarations. Note that if the Tag Processor is not matched on a token then this function will return `null`, either because it hasn't yet found a token or because it reached the end of the document without matching a token. return: string|null Name of the matched token. |
get_comment_type() X-Ref |
Indicates what kind of comment produced the comment node. Because there are different kinds of HTML syntax which produce comments, the Tag Processor tracks and exposes this as a type for the comment. Nominally only regular HTML comments exist as they are commonly known, but a number of unrelated syntax errors also produce comments. return: string|null |
get_full_comment_text() X-Ref |
Returns the text of a matched comment or null if not on a comment type node. This method returns the entire text content of a comment node as it would appear in the browser. This differs from {@see ::get_modifiable_text()} in that certain comment types in the HTML API cannot allow their entire comment text content to be modified. Namely, "bogus comments" of the form `<?not allowed in html>` will create a comment whose text content starts with `?`. Note that if that character were modified, it would be possible to change the node type. return: string|null The comment text as it would appear in the browser or null |
subdivide_text_appropriately() X-Ref |
Subdivides a matched text node, splitting NULL byte sequences and decoded whitespace as distinct nodes prefixes. Note that once anything that's neither a NULL byte nor decoded whitespace is encountered, then the remainder of the text node is left intact as generic text. - The HTML Processor uses this to apply distinct rules for different kinds of text. - Inter-element whitespace can be detected and skipped with this method. Text nodes aren't eagerly subdivided because there's no need to split them unless decisions are being made on NULL byte sequences or whitespace-only text. Example: $processor = new WP_HTML_Tag_Processor( "\x00Apples & Oranges" ); true === $processor->next_token(); // Text is "Apples & Oranges". true === $processor->subdivide_text_appropriately(); // Text is "". true === $processor->next_token(); // Text is "Apples & Oranges". false === $processor->subdivide_text_appropriately(); $processor = new WP_HTML_Tag_Processor( " \r\n\tMore" ); true === $processor->next_token(); // Text is " ␉More". true === $processor->subdivide_text_appropriately(); // Text is " ␉". true === $processor->next_token(); // Text is "More". false === $processor->subdivide_text_appropriately(); return: bool Whether the text node was subdivided. |
get_modifiable_text() X-Ref |
Returns the modifiable text for a matched token, or an empty string. Modifiable text is text content that may be read and changed without changing the HTML structure of the document around it. This includes the contents of `#text` nodes in the HTML as well as the inner contents of HTML comments, Processing Instructions, and others, even though these nodes aren't part of a parsed DOM tree. They also contain the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any other section in an HTML document which cannot contain HTML markup (DATA). If a token has no modifiable text then an empty string is returned to avoid needless crashing or type errors. An empty string does not mean that a token has modifiable text, and a token with modifiable text may have an empty string (e.g. a comment with no contents). Limitations: - This function will not strip the leading newline appropriately after seeking into a LISTING or PRE element. To ensure that the newline is treated properly, seek to the LISTING or PRE opening tag instead of to the first text node inside the element. return: string |
set_modifiable_text( string $plaintext_content ) X-Ref |
Sets the modifiable text for the matched token, if matched. Modifiable text is text content that may be read and changed without changing the HTML structure of the document around it. This includes the contents of `#text` nodes in the HTML as well as the inner contents of HTML comments, Processing Instructions, and others, even though these nodes aren't part of a parsed DOM tree. They also contain the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any other section in an HTML document which cannot contain HTML markup (DATA). Not all modifiable text may be set by this method, and not all content may be set as modifiable text. In the case that this fails it will return `false` indicating as much. For instance, it will not allow inserting the string `</script` into a SCRIPT element, because the rules for escaping that safely are complicated. Similarly, it will not allow setting content into a comment which would prematurely terminate the comment. Example: // Add a preface to all STYLE contents. while ( $processor->next_tag( 'STYLE' ) ) { $style = $processor->get_modifiable_text(); $processor->set_modifiable_text( "// Made with love on the World Wide Web\n{$style}" ); } // Replace smiley text with Emoji smilies. while ( $processor->next_token() ) { if ( '#text' !== $processor->get_token_name() ) { continue; } $chunk = $processor->get_modifiable_text(); if ( ! str_contains( $chunk, ':)' ) ) { continue; } $processor->set_modifiable_text( str_replace( ':)', '🙂', $chunk ) ); } return: bool Whether the text was able to update. param: string $plaintext_content New text content to represent in the matched token. |
set_attribute( $name, $value ) X-Ref |
No description |
remove_attribute( $name ) X-Ref |
Remove an attribute from the currently-matched tag. return: bool Whether an attribute was removed. param: string $name The attribute name to remove. |
add_class( $class_name ) X-Ref |
Adds a new class name to the currently matched tag. return: bool Whether the class was set to be added. param: string $class_name The class name to add. |
remove_class( $class_name ) X-Ref |
Removes a class name from the currently matched tag. return: bool Whether the class was set to be removed. param: string $class_name The class name to remove. |
__toString() X-Ref |
Returns the string representation of the HTML Tag Processor. return: string The processed HTML. |
get_updated_html() X-Ref |
Returns the string representation of the HTML Tag Processor. return: string The processed HTML. |
parse_query( $query ) X-Ref |
Parses tag query input into internal search criteria. param: array|string|null $query { |
matches() X-Ref |
Checks whether a given tag and its attributes match the search criteria. return: bool Whether the given tag and its attribute match the search criteria. |
get_doctype_info() X-Ref |
Gets DOCTYPE declaration info from a DOCTYPE token. DOCTYPE tokens may appear in many places in an HTML document. In most places, they are simply ignored. The main parsing functions find the basic shape of DOCTYPE tokens but do not perform detailed parsing. This method can be called to perform a full parse of the DOCTYPE token and retrieve its information. return: WP_HTML_Doctype_Info|null The DOCTYPE declaration information or `null` if not |
Generated : Tue Jan 21 08:20:01 2025 | Cross-referenced by PHPXref |