Ruby
3.4.0dev (2024-12-06 revision 892c46283a5ea4179500d951c9d4866c0051f27b)
|
This struct represents the overall parser. More...
#include <parser.h>
Data Fields | |
uint32_t | node_id |
The next node identifier that will be assigned. More... | |
pm_lex_state_t | lex_state |
The current state of the lexer. More... | |
int | enclosure_nesting |
Tracks the current nesting of (), [], and {}. More... | |
int | lambda_enclosure_nesting |
Used to temporarily track the nesting of enclosures to determine if a { is the beginning of a lambda following the parameters of a lambda. More... | |
int | brace_nesting |
Used to track the nesting of braces to ensure we get the correct value when we are interpolating blocks with braces. More... | |
pm_state_stack_t | do_loop_stack |
The stack used to determine if a do keyword belongs to the predicate of a while, until, or for loop. More... | |
pm_state_stack_t | accepts_block_stack |
The stack used to determine if a do keyword belongs to the beginning of a block. More... | |
struct { | |
pm_lex_mode_t * current | |
The current mode of the lexer. More... | |
pm_lex_mode_t stack [PM_LEX_STACK_SIZE] | |
The stack of lexer modes. More... | |
size_t index | |
The current index into the lexer mode stack. More... | |
} | lex_modes |
A stack of lex modes. More... | |
const uint8_t * | start |
The pointer to the start of the source. More... | |
const uint8_t * | end |
The pointer to the end of the source. More... | |
pm_token_t | previous |
The previous token we were considering. More... | |
pm_token_t | current |
The current token we're considering. More... | |
const uint8_t * | next_start |
This is a special field set on the parser when we need the parser to jump to a specific location when lexing the next token, as opposed to just using the end of the previous token. More... | |
const uint8_t * | heredoc_end |
This field indicates the end of a heredoc whose identifier was found on the current line. More... | |
pm_list_t | comment_list |
The list of comments that have been found while parsing. More... | |
pm_list_t | magic_comment_list |
The list of magic comments that have been found while parsing. More... | |
pm_location_t | data_loc |
An optional location that represents the location of the END marker and the rest of the content of the file. More... | |
pm_list_t | warning_list |
The list of warnings that have been found while parsing. More... | |
pm_list_t | error_list |
The list of errors that have been found while parsing. More... | |
pm_scope_t * | current_scope |
The current local scope. More... | |
pm_context_node_t * | current_context |
The current parsing context. More... | |
pm_static_literals_t * | current_hash_keys |
The hash keys for the hash that is currently being parsed. More... | |
const pm_encoding_t * | encoding |
The encoding functions for the current file is attached to the parser as it's parsing so that it can change with a magic comment. More... | |
pm_encoding_changed_callback_t | encoding_changed_callback |
When the encoding that is being used to parse the source is changed by prism, we provide the ability here to call out to a user-defined function. More... | |
const uint8_t * | encoding_comment_start |
This pointer indicates where a comment must start if it is to be considered an encoding comment. More... | |
pm_lex_callback_t * | lex_callback |
This is an optional callback that can be attached to the parser that will be called whenever a new token is lexed by the parser. More... | |
pm_string_t | filepath |
This is the path of the file being parsed. More... | |
pm_constant_pool_t | constant_pool |
This constant pool keeps all of the constants defined throughout the file so that we can reference them later. More... | |
pm_newline_list_t | newline_list |
This is the list of newline offsets in the source file. More... | |
pm_node_flags_t | integer_base |
We want to add a flag to integer nodes that indicates their base. More... | |
pm_string_t | current_string |
This string is used to pass information from the lexer to the parser. More... | |
int32_t | start_line |
The line number at the start of the parse. More... | |
const pm_encoding_t * | explicit_encoding |
When a string-like expression is being lexed, any byte or escape sequence that resolves to a value whose top bit is set (i.e., >= 0x80) will explicitly set the encoding to the same encoding as the source. More... | |
pm_node_list_t * | current_block_exits |
When parsing block exits (e.g., break, next, redo), we need to validate that they are in correct contexts. More... | |
pm_options_version_t | version |
The version of prism that we should use to parse. More... | |
uint8_t | command_line |
The command line flags given from the options. More... | |
int8_t | frozen_string_literal |
Whether or not we have found a frozen_string_literal magic comment with a true or false value. More... | |
bool | parsing_eval |
Whether or not we are parsing an eval string. More... | |
bool | partial_script |
Whether or not we are parsing a "partial" script, which is a script that will be evaluated in the context of another script, so we should not check jumps (next/break/etc.) for validity. More... | |
bool | command_start |
Whether or not we're at the beginning of a command. More... | |
bool | recovering |
Whether or not we're currently recovering from a syntax error. More... | |
bool | encoding_locked |
This is very specialized behavior for when you want to parse in a context that does not respect encoding comments. More... | |
bool | encoding_changed |
Whether or not the encoding has been changed by a magic comment. More... | |
bool | pattern_matching_newlines |
This flag indicates that we are currently parsing a pattern matching expression and impacts that calculation of newlines. More... | |
bool | in_keyword_arg |
This flag indicates that we are currently parsing a keyword argument. More... | |
bool | semantic_token_seen |
Whether or not the parser has seen a token that has semantic meaning (i.e., a token that is not a comment or whitespace). More... | |
bool | current_regular_expression_ascii_only |
True if the current regular expression being lexed contains only ASCII characters. More... | |
bool | warn_mismatched_indentation |
By default, Ruby always warns about mismatched indentation. More... | |
This struct represents the overall parser.
It contains a reference to the source file, as well as pointers that indicate where in the source it's currently parsing. It also contains the most recent and current token that it's considering.
pm_state_stack_t pm_parser::accepts_block_stack |
int pm_parser::brace_nesting |
uint8_t pm_parser::command_line |
bool pm_parser::command_start |
pm_list_t pm_parser::comment_list |
The list of comments that have been found while parsing.
Definition at line 718 of file parser.h.
Referenced by pm_parser_free(), and pm_serialize_parse_comments().
pm_constant_pool_t pm_parser::constant_pool |
This constant pool keeps all of the constants defined throughout the file so that we can reference them later.
Definition at line 786 of file parser.h.
Referenced by pm_parser_free(), and pm_serialize_content().
pm_lex_mode_t* pm_parser::current |
pm_token_t pm_parser::current |
pm_node_list_t* pm_parser::current_block_exits |
When parsing block exits (e.g., break, next, redo), we need to validate that they are in correct contexts.
For the most part we can do this by looking at our parent contexts. However, modifier while and until expressions can change that context to make block exits valid. In these cases, we need to keep track of the block exits and then validate them after the expression has been parsed.
We use a pointer here because we don't want to keep a whole list attached since this will only be used in the context of begin/end expressions.
pm_context_node_t* pm_parser::current_context |
pm_static_literals_t* pm_parser::current_hash_keys |
The hash keys for the hash that is currently being parsed.
This is not usually necessary because it can pass it down the various call chains, but in the event that you're parsing a hash that is being directly pushed into another hash with **, we need to share the hash keys so that we can warn for the nested hash as well.
bool pm_parser::current_regular_expression_ascii_only |
pm_scope_t* pm_parser::current_scope |
pm_string_t pm_parser::current_string |
pm_location_t pm_parser::data_loc |
pm_state_stack_t pm_parser::do_loop_stack |
int pm_parser::enclosure_nesting |
const pm_encoding_t* pm_parser::encoding |
The encoding functions for the current file is attached to the parser as it's parsing so that it can change with a magic comment.
Definition at line 755 of file parser.h.
Referenced by pm_regexp_parse(), pm_serialize_parse_comments(), and pm_strpbrk().
bool pm_parser::encoding_changed |
Whether or not the encoding has been changed by a magic comment.
We use this to provide a fast path for the lexer instead of going through the function pointer.
Definition at line 903 of file parser.h.
Referenced by pm_regexp_parse(), and pm_strpbrk().
pm_encoding_changed_callback_t pm_parser::encoding_changed_callback |
When the encoding that is being used to parse the source is changed by prism, we provide the ability here to call out to a user-defined function.
Definition at line 762 of file parser.h.
Referenced by pm_parser_register_encoding_changed_callback().
const uint8_t* pm_parser::encoding_comment_start |
bool pm_parser::encoding_locked |
This is very specialized behavior for when you want to parse in a context that does not respect encoding comments.
Its main use case is translating into the whitequark/parser AST which re-encodes source files in UTF-8 before they are parsed and ignores encoding comments.
const uint8_t* pm_parser::end |
pm_list_t pm_parser::error_list |
The list of errors that have been found while parsing.
Definition at line 734 of file parser.h.
Referenced by pm_parse_stream(), pm_parse_success_p(), and pm_parser_free().
const pm_encoding_t* pm_parser::explicit_encoding |
When a string-like expression is being lexed, any byte or escape sequence that resolves to a value whose top bit is set (i.e., >= 0x80) will explicitly set the encoding to the same encoding as the source.
Alternatively, if a unicode escape sequence is used (e.g., \u{80}) that resolves to a value whose top bit is set, then the encoding will be explicitly set to UTF-8.
The next time this happens, if the encoding that is about to become the explicitly set encoding does not match the previously set explicit encoding, a mixed encoding error will be emitted.
When the expression is finished being lexed, the explicit encoding controls the encoding of the expression. For the most part this means that the expression will either be encoded in the source encoding or UTF-8. This holds for all encodings except US-ASCII. If the source is US-ASCII and an explicit encoding was set that was not UTF-8, then the expression will be encoded as ASCII-8BIT.
Note that if the expression is a list, different elements within the same list can have different encodings, so this will get reset between each element. Furthermore all of this only applies to lists that support interpolation, because otherwise escapes that could change the encoding are ignored.
At first glance, it may make more sense for this to live on the lexer mode, but we need it here to communicate back to the parser for character literals that do not push a new lexer mode.
pm_string_t pm_parser::filepath |
This is the path of the file being parsed.
We use the filepath when constructing SourceFileNodes.
Definition at line 780 of file parser.h.
Referenced by pm_parser_free().
int8_t pm_parser::frozen_string_literal |
const uint8_t* pm_parser::heredoc_end |
bool pm_parser::in_keyword_arg |
size_t pm_parser::index |
The current index into the lexer mode stack.
Definition at line 687 of file parser.h.
Referenced by pm_parse_stream(), and pm_parser_free().
pm_node_flags_t pm_parser::integer_base |
int pm_parser::lambda_enclosure_nesting |
pm_lex_callback_t* pm_parser::lex_callback |
This is an optional callback that can be attached to the parser that will be called whenever a new token is lexed by the parser.
Definition at line 774 of file parser.h.
Referenced by pm_serialize_lex(), and pm_serialize_parse_lex().
struct { ... } pm_parser::lex_modes |
A stack of lex modes.
Referenced by pm_parse_stream(), and pm_parser_free().
pm_lex_state_t pm_parser::lex_state |
pm_list_t pm_parser::magic_comment_list |
The list of magic comments that have been found while parsing.
Definition at line 721 of file parser.h.
Referenced by pm_parser_free().
pm_newline_list_t pm_parser::newline_list |
This is the list of newline offsets in the source file.
Definition at line 789 of file parser.h.
Referenced by pm_parser_free().
const uint8_t* pm_parser::next_start |
uint32_t pm_parser::node_id |
The next node identifier that will be assigned.
This is a unique identifier used to track nodes such that the syntax tree can be dropped but the node can be found through another parse.
Definition at line 646 of file parser.h.
Referenced by pm_parser_init().
bool pm_parser::parsing_eval |
bool pm_parser::partial_script |
bool pm_parser::pattern_matching_newlines |
pm_token_t pm_parser::previous |
bool pm_parser::recovering |
bool pm_parser::semantic_token_seen |
pm_lex_mode_t pm_parser::stack[PM_LEX_STACK_SIZE] |
const uint8_t* pm_parser::start |
The pointer to the start of the source.
Definition at line 691 of file parser.h.
Referenced by pm_regexp_parse(), and pm_serialize_content().
int32_t pm_parser::start_line |
The line number at the start of the parse.
This will be used to offset the line numbers of all of the locations.
Definition at line 809 of file parser.h.
Referenced by pm_serialize_parse_comments().
pm_options_version_t pm_parser::version |
bool pm_parser::warn_mismatched_indentation |
pm_list_t pm_parser::warning_list |
The list of warnings that have been found while parsing.
Definition at line 731 of file parser.h.
Referenced by pm_parser_free().