Ruby 4.1.0dev (2026-03-20 revision eb04ab9117336f2b3613244cfe8a528c52faf6d6)
Data Structures | Typedefs | Functions
regexp.h File Reference

(eb04ab9117336f2b3613244cfe8a528c52faf6d6)

A regular expression parser. More...

#include "prism/defines.h"
#include "prism/parser.h"
#include "prism/encoding.h"
#include "prism/util/pm_memchr.h"
#include "prism/util/pm_string.h"
#include <stdbool.h>
#include <stddef.h>
#include <string.h>
Include dependency graph for regexp.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  pm_regexp_name_data_t
 Accumulation state for named capture groups found during regexp parsing. More...
 

Typedefs

typedef void(* pm_regexp_name_callback_t) (pm_parser_t *parser, const pm_string_t *name, bool shared, pm_regexp_name_data_t *data)
 Callback invoked by pm_regexp_parse() for each named capture group found.
 

Functions

PRISM_EXPORTED_FUNCTION pm_node_flags_t pm_regexp_parse (pm_parser_t *parser, pm_regular_expression_node_t *node, pm_regexp_name_callback_t name_callback, pm_regexp_name_data_t *name_data)
 Parse a regular expression, validate its encoding, and optionally extract named capture groups.
 
void pm_regexp_parse_named_captures (pm_parser_t *parser, const uint8_t *source, size_t size, bool shared, bool extended_mode, pm_regexp_name_callback_t name_callback, pm_regexp_name_data_t *name_data)
 Parse an interpolated regular expression for named capture groups only.
 

Detailed Description

A regular expression parser.

Definition in file regexp.h.

Typedef Documentation

◆ pm_regexp_name_callback_t

typedef void(* pm_regexp_name_callback_t) (pm_parser_t *parser, const pm_string_t *name, bool shared, pm_regexp_name_data_t *data)

Callback invoked by pm_regexp_parse() for each named capture group found.

Parameters
parserThe main parser.
nameThe name of the capture group.
sharedWhether the source content is shared (impacts constant storage).
dataThe accumulation state for named captures.

Definition at line 44 of file regexp.h.

Function Documentation

◆ pm_regexp_parse()

PRISM_EXPORTED_FUNCTION pm_node_flags_t pm_regexp_parse ( pm_parser_t parser,
pm_regular_expression_node_t node,
pm_regexp_name_callback_t  name_callback,
pm_regexp_name_data_t name_data 
)

Parse a regular expression, validate its encoding, and optionally extract named capture groups.

Returns the encoding flags to set on the node.

Parameters
parserThe parser that is currently being used.
nodeThe regular expression node to parse and validate.
name_callbackThe optional callback to call when a named capture group is found.
name_dataThe optional accumulation state for named captures.
Returns
The encoding flags to set on the node (e.g., FORCED_UTF8_ENCODING).

Encoding validation walks the raw source (content_loc) to distinguish escape-produced bytes from literal bytes. Named capture extraction walks the unescaped content since escape sequences in group names (e.g., line continuations) have already been processed by the lexer.

Definition at line 1599 of file regexp.c.

◆ pm_regexp_parse_named_captures()

void pm_regexp_parse_named_captures ( pm_parser_t parser,
const uint8_t *  source,
size_t  size,
bool  shared,
bool  extended_mode,
pm_regexp_name_callback_t  name_callback,
pm_regexp_name_data_t name_data 
)

Parse an interpolated regular expression for named capture groups only.

No encoding validation is performed.

Parameters
parserThe parser that is currently being used.
sourceThe source content to parse.
sizeThe length of the source content.
sharedWhether the source points into the parser's source buffer.
extended_modeWhether or not the regular expression is in extended mode.
name_callbackThe callback to call when a named capture group is found.
name_dataThe accumulation state for named captures.

This is used for the =~ operator with interpolated regexps where we don't have a pm_regular_expression_node_t. No encoding validation is performed.

Note: The encoding-tracking fields (has_unicode_escape, has_hex_escape, etc.) are initialized but not used for the result. They exist because the parsing functions (pm_regexp_parse_backslash_escape, etc.) unconditionally update them as they walk through the content.

Definition at line 1673 of file regexp.c.