Ruby  3.4.0dev (2024-12-06 revision 892c46283a5ea4179500d951c9d4866c0051f27b)
ctype.h File Reference

(892c46283a5ea4179500d951c9d4866c0051f27b)

Our own, locale independent, character handling routines. More...

#include "ruby/internal/config.h"
#include "ruby/internal/attr/artificial.h"
#include "ruby/internal/attr/const.h"
#include "ruby/internal/attr/constexpr.h"
#include "ruby/internal/attr/nonnull.h"
#include "ruby/internal/dllexport.h"
Include dependency graph for ctype.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Macros

Old character classification macros

What is this ISPRINT business? Well, according to our VCS and some internet surfing, it appears that the initial intent of these macros were to mimic codes appear in common in several GNU projects.

As far as @shyouhei detects they seem to originate GNU regex (that standalone one rather than Gnulib or Glibc), and at least date back to 1995.

Let me lawfully quote from a GNU coreutils commit https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=49803907f5dbd7646184a8912c9db9b09dcd0f22

Jim Meyering writes:

"... Some ctype macros are valid only for character codes that isascii says are ASCII (SGI's IRIX-4.0.5 is one such system –when using /bin/cc or gcc but without giving an ansi option). So, all ctype uses should be through macros like ISPRINT... If STDC_HEADERS is defined, then autoconf has verified that the ctype macros don't need to be guarded with references to isascii. ... Defining isascii to 1 should let any compiler worth its salt eliminate the && through constant folding."

Bruno Haible adds:

"... Furthermore, isupper(c) etc. have an undefined result if c is outside the range -1 <= c <= 255. One is tempted to write isupper(c) with c being of type ‘char’, but this is wrong if c is an 8-bit character >= 128 which gets sign-extended to a negative value. The macro ISUPPER protects against this as well."

So the intent was to reroute old problematic systems that no longer exist. At the same time the problems described above no longer hurt us, because we decided to completely avoid using system-provided isupper etc. to reinvent the wheel. These macros are entirely legacy; please ignore them.

But let me also put stress that GNU people are wise; they use those macros only inside of their own implementations and never let them be public. On the other hand ruby has thoughtlessly publicised them to 3rd party libraries since its beginning, which is a very bad idea. These macros are too easy to get conflicted with definitions elsewhere.

New programs should stick to the rb_ prefixed names.

Note
It seems we just mimic the API. We do not share their implementation with GPL-ed programs.
#define ISASCII   rb_isascii
 Old name of rb_isascii. More...
 
#define ISPRINT   rb_isprint
 Old name of rb_isprint. More...
 
#define ISGRAPH   rb_isgraph
 Old name of rb_isgraph. More...
 
#define ISSPACE   rb_isspace
 Old name of rb_isspace. More...
 
#define ISUPPER   rb_isupper
 Old name of rb_isupper. More...
 
#define ISLOWER   rb_islower
 Old name of rb_islower. More...
 
#define ISALNUM   rb_isalnum
 Old name of rb_isalnum. More...
 
#define ISALPHA   rb_isalpha
 Old name of rb_isalpha. More...
 
#define ISDIGIT   rb_isdigit
 Old name of rb_isdigit. More...
 
#define ISXDIGIT   rb_isxdigit
 Old name of rb_isxdigit. More...
 
#define ISBLANK   rb_isblank
 Old name of rb_isblank. More...
 
#define ISCNTRL   rb_iscntrl
 Old name of rb_iscntrl. More...
 
#define ISPUNCT   rb_ispunct
 Old name of rb_ispunct. More...
 
#define TOUPPER   rb_toupper
 Old name of rb_toupper. More...
 
#define TOLOWER   rb_tolower
 Old name of rb_tolower. More...
 
#define STRCASECMP   st_locale_insensitive_strcasecmp
 Old name of st_locale_insensitive_strcasecmp. More...
 
#define STRNCASECMP   st_locale_insensitive_strncasecmp
 Old name of st_locale_insensitive_strncasecmp. More...
 
#define STRTOUL   ruby_strtoul
 Old name of ruby_strtoul. More...
 

Functions

locale insensitive functions
int st_locale_insensitive_strcasecmp (const char *s1, const char *s2)
 Our own locale-insensitive version of strcasecmp(3). More...
 
int st_locale_insensitive_strncasecmp (const char *s1, const char *s2, size_t n)
 Our own locale-insensitive version of strcnasecmp(3). More...
 
unsigned long ruby_strtoul (const char *str, char **endptr, int base)
 Our own locale-insensitive version of strtoul(3). More...
 
static int rb_isascii (int c)
 Our own locale-insensitive version of isascii(3). More...
 
static int rb_isupper (int c)
 Our own locale-insensitive version of isupper(3). More...
 
static int rb_islower (int c)
 Our own locale-insensitive version of islower(3). More...
 
static int rb_isalpha (int c)
 Our own locale-insensitive version of isalpha(3). More...
 
static int rb_isdigit (int c)
 Our own locale-insensitive version of isdigit(3). More...
 
static int rb_isalnum (int c)
 Our own locale-insensitive version of isalnum(3). More...
 
static int rb_isxdigit (int c)
 Our own locale-insensitive version of isxdigit(3). More...
 
static int rb_isblank (int c)
 Our own locale-insensitive version of isblank(3). More...
 
static int rb_isspace (int c)
 Our own locale-insensitive version of isspace(3). More...
 
static int rb_iscntrl (int c)
 Our own locale-insensitive version of iscntrl(3). More...
 
static int rb_isprint (int c)
 Identical to rb_isgraph(), except it also returns true for ‘’ '`. More...
 
static int rb_ispunct (int c)
 Our own locale-insensitive version of ispunct(3). More...
 
static int rb_isgraph (int c)
 Our own locale-insensitive version of isgraph(3). More...
 
static int rb_tolower (int c)
 Our own locale-insensitive version of tolower(3). More...
 
static int rb_toupper (int c)
 Our own locale-insensitive version of toupper(3). More...
 

Detailed Description

Our own, locale independent, character handling routines.

Author
Ruby developers ruby-.nosp@m.core.nosp@m.@ruby.nosp@m.-lan.nosp@m.g.org
Warning
Symbols prefixed with either RBIMPL or rbimpl are implementation details. Don't take them as canon. They could rapidly appear then vanish. The name (path) of this header file is also an implementation detail. Do not expect it to persist at the place it is now. Developers are free to move it anywhere anytime at will.
Note
To ruby-core: remember that this header can be possibly recursively included from extension libraries written in C++. Do not expect for instance __VA_ARGS__ is always available. We assume C99 for ruby itself but we don't assume languages of extension libraries. They could be written in C++98.

Definition in file ctype.h.

Function Documentation

◆ rb_isalnum()

static int rb_isalnum ( int  c)
inlinestatic

Our own locale-insensitive version of isalnum(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in either IEEE 1003.1 section 7.3.1.1 "upper", "lower", or "digit".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 326 of file ctype.h.

Referenced by rb_ispunct().

◆ rb_isalpha()

static int rb_isalpha ( int  c)
inlinestatic

Our own locale-insensitive version of isalpha(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in either IEEE 1003.1 section 7.3.1.1 "upper" or "lower".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 279 of file ctype.h.

Referenced by rb_isalnum().

◆ rb_isascii()

static int rb_isascii ( int  c)
inlinestatic

Our own locale-insensitive version of isascii(3).

Parameters
[in]cByte in question to query.
Return values
falsec is out of range of ASCII character set.
trueYes it is.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 209 of file ctype.h.

◆ rb_isblank()

static int rb_isblank ( int  c)
inlinestatic

Our own locale-insensitive version of isblank(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "blank".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 372 of file ctype.h.

◆ rb_iscntrl()

static int rb_iscntrl ( int  c)
inlinestatic

Our own locale-insensitive version of iscntrl(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "cntrl".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 418 of file ctype.h.

◆ rb_isdigit()

static int rb_isdigit ( int  c)
inlinestatic

Our own locale-insensitive version of isdigit(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "digit".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 302 of file ctype.h.

Referenced by rb_isalnum(), and rb_isxdigit().

◆ rb_isgraph()

static int rb_isgraph ( int  c)
inlinestatic

Our own locale-insensitive version of isgraph(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in either IEEE 1003.1 section 7.3.1.1 "upper", "lower", "digit", or "punct".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 489 of file ctype.h.

◆ rb_islower()

static int rb_islower ( int  c)
inlinestatic

Our own locale-insensitive version of islower(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "lower".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 255 of file ctype.h.

Referenced by rb_isalpha(), and rb_toupper().

◆ rb_isprint()

static int rb_isprint ( int  c)
inlinestatic

Identical to rb_isgraph(), except it also returns true for ‘’ '`.

Parameters
[in]cByte in question to query.
Return values
truec is listed in either IEEE 1003.1 section 7.3.1.1 "upper", "lower", "digit", "punct", or a ‘’ '`.
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 442 of file ctype.h.

◆ rb_ispunct()

static int rb_ispunct ( int  c)
inlinestatic

Our own locale-insensitive version of ispunct(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "punct".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 465 of file ctype.h.

◆ rb_isspace()

static int rb_isspace ( int  c)
inlinestatic

Our own locale-insensitive version of isspace(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "space".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 395 of file ctype.h.

◆ rb_isupper()

static int rb_isupper ( int  c)
inlinestatic

Our own locale-insensitive version of isupper(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "upper".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 232 of file ctype.h.

Referenced by rb_isalpha(), and rb_tolower().

◆ rb_isxdigit()

static int rb_isxdigit ( int  c)
inlinestatic

Our own locale-insensitive version of isxdigit(3).

Parameters
[in]cByte in question to query.
Return values
truec is listed in IEEE 1003.1 section 7.3.1.1 "xdigit".
falseAnything else.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 349 of file ctype.h.

◆ rb_tolower()

static int rb_tolower ( int  c)
inlinestatic

Our own locale-insensitive version of tolower(3).

Parameters
[in]cByte in question to convert.
Return values
cThe byte is not listed in in IEEE 1003.1 section 7.3.1.1 "upper".
otherwiseByte converted using the map defined in IEEE 1003.1 section 7.3.1 "tolower".
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 514 of file ctype.h.

◆ rb_toupper()

static int rb_toupper ( int  c)
inlinestatic

Our own locale-insensitive version of toupper(3).

Parameters
[in]cByte in question to convert.
Return values
cThe byte is not listed in in IEEE 1003.1 section 7.3.1.1 "lower".
otherwiseByte converted using the map defined in IEEE 1003.1 section 7.3.1 "toupper".
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
c is an int. This means that when you pass a char value here, it experiences "integer promotion" as defined in ISO/IEC 9899:2018 section 6.3.1.1 paragraph 1.

Definition at line 539 of file ctype.h.

◆ ruby_strtoul()

unsigned long ruby_strtoul ( const char *  str,
char **  endptr,
int  base 
)

Our own locale-insensitive version of strtoul(3).

The conversion is done as if the current locale is set to the "C" locale, no matter actual runtime locale settings.

Note
This is needed because strtoul("i", 0, 36) would return zero if it is locale sensitive and the current locale is tr_TR.
Parameters
[in]strString of digits, optionally preceded with whitespaces (ignored) and optionally + or - sign.
[out]endptrNULL, or an arbitrary pointer (overwritten on return).
[in]base2 to 36 inclusive for each base, or special case 0 to detect the base from the contents of the string.
Returns
Converted integer, casted to unsigned long.
Postcondition
If endptr is not NULL, it is updated to point the first such byte where conversion failed.
Note
This function sets errno on failure.
  • EINVAL: Passed base is out of range.
  • ERANGE: Converted integer is out of range of long.
Warning
As far as @shyouhei reads ISO/IEC 9899:2018 section 7.22.1.4, a conforming strtoul implementation shall render ERANGE whenever it finds the input string represents a negative integer. Such thing can never be representable using unsigned long. However this implementation does not honour that language. It just casts such negative value to the return type, resulting a very big return value. This behaviour is at least questionable. But we can no longer change that at this point.
Note
Not only does this function works under the "C" locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.

Definition at line 138 of file util.c.

◆ st_locale_insensitive_strcasecmp()

int st_locale_insensitive_strcasecmp ( const char *  s1,
const char *  s2 
)

Our own locale-insensitive version of strcasecmp(3).

The "case" here always means that of the POSIX Locale. It doesn't depend on runtime locale settings.

Parameters
[in]s1Comparison LHS.
[in]s2Comparison RHS.
Return values
-1s1 is "less" than s2.
0Both strings converted into lowercase would be identical.
1s1 is "greater" than s2.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.

Definition at line 2002 of file st.c.

◆ st_locale_insensitive_strncasecmp()

int st_locale_insensitive_strncasecmp ( const char *  s1,
const char *  s2,
size_t  n 
)

Our own locale-insensitive version of strcnasecmp(3).

The "case" here always means that of the POSIX Locale. It doesn't depend on runtime locale settings.

Parameters
[in]s1Comparison LHS.
[in]s2Comparison RHS.
[in]nComparison shall stop after first n bytes are scanned.
Return values
-1s1 is "less" than s2.
0Both strings converted into lowercase would be identical.
1s1 is "greater" than s2.
Note
Not only does this function works under the POSIX Locale, but also assumes its execution character set be what ruby calls an ASCII-compatible character set; which does not include for instance EBCDIC or UTF-16LE.
Warning
This function is not timing safe.

Definition at line 2026 of file st.c.