Ruby 4.1.0dev (2026-04-04 revision 3b6245536cf55da9e8bfcdb03c845fe9ef931d7f)
Data Fields
pm_regexp_parser_t Struct Reference

This is the parser that is going to handle parsing regular expressions. More...

Data Fields

pm_parser_tparser
 The parser that is currently being used.
 
const uint8_t * start
 A pointer to the start of the source that we are parsing.
 
const uint8_t * cursor
 A pointer to the current position in the source.
 
const uint8_t * end
 A pointer to the end of the source that we are parsing.
 
const pm_encoding_tencoding
 The encoding of the source.
 
pm_regexp_name_callback_t name_callback
 The callback to call when a named capture group is found.
 
pm_regexp_name_data_tname_data
 The data to pass to the name callback.
 
const uint8_t * node_start
 The start of the regexp node (for error locations).
 
const uint8_t * node_end
 The end of the regexp node (for error locations).
 
const pm_encoding_texplicit_encoding
 The explicit encoding determined by escape sequences.
 
const uint8_t * property_name
 Pointer to the first non-POSIX property name (for /n error messages).
 
size_t property_name_length
 Length of the first non-POSIX property name found.
 
const uint8_t * unicode_property_name
 Pointer to the first Unicode-only property name (for /e, /s error messages).
 
size_t unicode_property_name_length
 Length of the first Unicode-only property name found.
 
pm_buffer_t hex_escape_buffer
 Buffer of hex escape byte values >= 0x80, separated by 0x00 sentinels.
 
uint32_t non_ascii_literal_count
 Count of non-ASCII literal bytes (not from escapes).
 
bool extended_mode
 Whether or not the regular expression currently being parsed is in extended mode, wherein whitespace is ignored and comments are allowed.
 
bool encoding_changed
 Whether the encoding has changed from the default.
 
bool shared
 Whether the source content is shared (for named capture callback).
 
bool has_unicode_escape
 Whether a \u{...} escape with value >= 0x80 was seen.
 
bool has_hex_escape
 Whether a \xNN escape (or \M-x, etc.) with value >= 0x80 was seen.
 
bool last_escape_was_unicode
 Tracks whether the last encoding-setting escape was \u (true) or \x (false).
 
bool has_property_escape
 Whether any \p{...} or \P{...} property escape was found.
 
bool has_unicode_property_escape
 Whether a Unicode-only property escape was found (not POSIX or script).
 
bool invalid_unicode_range
 Whether a \u escape with invalid range (surrogate or > 0x10FFFF) was seen.
 
bool hex_group_active
 Whether we are accumulating consecutive hex escape bytes.
 
bool has_invalid_multibyte
 Whether an invalid multibyte character was found during parsing.
 

Detailed Description

This is the parser that is going to handle parsing regular expressions.

Definition at line 23 of file regexp.c.

Field Documentation

◆ cursor

const uint8_t* pm_regexp_parser_t::cursor

A pointer to the current position in the source.

Definition at line 31 of file regexp.c.

◆ encoding

const pm_encoding_t* pm_regexp_parser_t::encoding

The encoding of the source.

Definition at line 37 of file regexp.c.

◆ encoding_changed

bool pm_regexp_parser_t::encoding_changed

Whether the encoding has changed from the default.

Definition at line 91 of file regexp.c.

◆ end

const uint8_t* pm_regexp_parser_t::end

A pointer to the end of the source that we are parsing.

Definition at line 34 of file regexp.c.

◆ explicit_encoding

const pm_encoding_t* pm_regexp_parser_t::explicit_encoding

The explicit encoding determined by escape sequences.

NULL if no encoding-setting escape has been seen, UTF-8 for \u escapes, or the source encoding for \x escapes.

Definition at line 56 of file regexp.c.

◆ extended_mode

bool pm_regexp_parser_t::extended_mode

Whether or not the regular expression currently being parsed is in extended mode, wherein whitespace is ignored and comments are allowed.

Definition at line 88 of file regexp.c.

◆ has_hex_escape

bool pm_regexp_parser_t::has_hex_escape

Whether a \xNN escape (or \M-x, etc.) with value >= 0x80 was seen.

Definition at line 100 of file regexp.c.

◆ has_invalid_multibyte

bool pm_regexp_parser_t::has_invalid_multibyte

Whether an invalid multibyte character was found during parsing.

Definition at line 121 of file regexp.c.

◆ has_property_escape

bool pm_regexp_parser_t::has_property_escape

Whether any \p{...} or \P{...} property escape was found.

Definition at line 109 of file regexp.c.

◆ has_unicode_escape

bool pm_regexp_parser_t::has_unicode_escape

Whether a \u{...} escape with value >= 0x80 was seen.

Definition at line 97 of file regexp.c.

◆ has_unicode_property_escape

bool pm_regexp_parser_t::has_unicode_property_escape

Whether a Unicode-only property escape was found (not POSIX or script).

Definition at line 112 of file regexp.c.

◆ hex_escape_buffer

pm_buffer_t pm_regexp_parser_t::hex_escape_buffer

Buffer of hex escape byte values >= 0x80, separated by 0x00 sentinels.

Definition at line 79 of file regexp.c.

◆ hex_group_active

bool pm_regexp_parser_t::hex_group_active

Whether we are accumulating consecutive hex escape bytes.

Definition at line 118 of file regexp.c.

◆ invalid_unicode_range

bool pm_regexp_parser_t::invalid_unicode_range

Whether a \u escape with invalid range (surrogate or > 0x10FFFF) was seen.

Definition at line 115 of file regexp.c.

◆ last_escape_was_unicode

bool pm_regexp_parser_t::last_escape_was_unicode

Tracks whether the last encoding-setting escape was \u (true) or \x (false).

This matters for error messages when both types are mixed.

Definition at line 106 of file regexp.c.

◆ name_callback

pm_regexp_name_callback_t pm_regexp_parser_t::name_callback

The callback to call when a named capture group is found.

Definition at line 40 of file regexp.c.

◆ name_data

pm_regexp_name_data_t* pm_regexp_parser_t::name_data

The data to pass to the name callback.

Definition at line 43 of file regexp.c.

◆ node_end

const uint8_t* pm_regexp_parser_t::node_end

The end of the regexp node (for error locations).

Definition at line 49 of file regexp.c.

◆ node_start

const uint8_t* pm_regexp_parser_t::node_start

The start of the regexp node (for error locations).

Definition at line 46 of file regexp.c.

◆ non_ascii_literal_count

uint32_t pm_regexp_parser_t::non_ascii_literal_count

Count of non-ASCII literal bytes (not from escapes).

Definition at line 82 of file regexp.c.

◆ parser

pm_parser_t* pm_regexp_parser_t::parser

The parser that is currently being used.

Definition at line 25 of file regexp.c.

◆ property_name

const uint8_t* pm_regexp_parser_t::property_name

Pointer to the first non-POSIX property name (for /n error messages).

POSIX properties (Alnum, Alpha, etc.) work in all encodings. Script properties (Hiragana, Katakana, etc.) work in /e, /s, /u. Unicode-only properties (L, Ll, etc.) work only in /u.

Definition at line 64 of file regexp.c.

◆ property_name_length

size_t pm_regexp_parser_t::property_name_length

Length of the first non-POSIX property name found.

Definition at line 67 of file regexp.c.

◆ shared

bool pm_regexp_parser_t::shared

Whether the source content is shared (for named capture callback).

Definition at line 94 of file regexp.c.

◆ start

const uint8_t* pm_regexp_parser_t::start

A pointer to the start of the source that we are parsing.

Definition at line 28 of file regexp.c.

◆ unicode_property_name

const uint8_t* pm_regexp_parser_t::unicode_property_name

Pointer to the first Unicode-only property name (for /e, /s error messages).

NULL if only POSIX or script properties have been seen.

Definition at line 73 of file regexp.c.

◆ unicode_property_name_length

size_t pm_regexp_parser_t::unicode_property_name_length

Length of the first Unicode-only property name found.

Definition at line 76 of file regexp.c.


The documentation for this struct was generated from the following file: