Request For Comment on Draft Specification for The CXING Programming Language.

Greetings all. This is a proposed draft of a proposed new programming language. The BDFL of this project is DannyNiu/NJF. The intention of this request for comments is to solicit ideas - advice, suggestions for improvement, as well as critique on preceived defects.

While any idea are welcome, they're better received if they're accompanied with counter-arguments, usage illustrations, and/or sketch of implementation, yet the decision of adoption is ultimately made by the BDFL of the project.

You may submit your idea and/or queries by opening Issues at GitHub or Gitee, both English and Chinese languages are accepted.

This page is intentionally left blank.
Table of Contents
  1. 1. Introduction
  2. 2. Features
  3. 3. Lexical Elements.
  4. 4. Expressions
  5. 4.1. Grouping, Postifix, and Unaries.
  6. 4.2. Arithmetic Binary Operations
  7. 4.3. Bit Shifting Operations
  8. 4.4. Arithmetic Relations
  9. 4.4.1. Details of Loose and Strict Equality and Ordering Relation Comparison
  10. 4.5. Bitwise Operations
  11. 4.6. Boolean Logics
  12. 4.7. Compounds
  13. 5. Phrases
  14. 6. Statements
  15. 6.1. Condition Statements
  16. 6.2. Loops
  17. 6.3. Statements List
  18. 6.4. Declarations
  19. 7. Functions
  20. 8. Translation Unit Interface
  21. 8.1. Translation Unit Source Code Syntax
  22. 8.2. Source Code Inclusion
  23. 8.3. Constants Definition
  24. 9. Language Semantics
  25. 9.1. Objects and Values
  26. 9.2. Object/Value Key Access
  27. 9.3. Automatic Resource Management
  28. 9.4. Subroutines and Methods
  29. 10. Types and Special Values
  30. 10.1. Implicit Type and Value Conversion
  31. 11. Type Definition and Object Initialization Syntax
  32. 12. Numerics and Maths
  33. 12.1. Rounding
  34. 12.2. Exceptional Conditions
  35. 12.3. Reproducibility and Robustness
  36. 12.4. Recommended Applications of Floating Points
  37. 13. Runtime Semantics
  38. 13.1. Binary Linking Compatibility
  39. 13.2. Calling Conventions and Foreign Function Interface
  40. 13.3. Finalization and Garbage Collection
  41. 14. Standard Library
  42. 15. Library for the String Data Type
  43. 16. Library for the Describing Data Structure Layout
  44. 17. Dynamic Data Structure Types
  45. 18. Type Reflection
  46. 19. Library for Floating Point Environment
  47. 20. Regex
  48. 21. Library for Multi-Threading
  49. 21.1. Exclusive and Sharable Objects and Mutices (Mutex)
  50. 21.2. Condition Variables
  51. 21.3. Thread Management
  52. 22. Library for I/O
  53. 22.1. Simple Input/Output
  54. 22.2. Generic File
  55. 22.3. Regular Files
  56. 22.4. Unidirectional Communication
  57. 22.5. Filesystem Operations
  58. 23. Library for Process Management
  59. Annex A. Identifier Namespace
  60. A.1. Reserved Identifiers
  61. A.2. Conventions for Identifiers

The cxing Programming Language

Build Info: This build of the (draft) spec is based on git commit 8b2ba2204e06264302f3b766d189b5110bf34820

The 2025-12-26 revision of the draft spec is the 2nd feature-complete beta, and is ready to be implemented for testing

Introduction

Goal

The 'cxing' programming language (with or without caps) is a general-purpose programming language with a C-like syntax that is memory-safe, aims to be thread-safe, and have surprise-free semantics. It aims to fit into and interoperate with the existing ecosystem written in other languages, with C as its starting point.

It attempts to pioneer in the field of efficient, expressive, and robust error handling using language design toolsets.

The language is meant to be an open standard with multiple independent implementations that are widely interoperable. It can be implemented either as interpreted or as compiled. Programs written in cxing should be no less portable than when it's written in C.

Features are introduced on strictly maintainable basis. The reference implementation will be an AST-based interpreter (or a transpiler to C?), which will serve as instrument of verification for additional implementations. The version of the language (if it ever changes) will be independent of the versions of the implementations.

The see section 2. Features for more information on how the goals are achieved.

Naming

Just as Java is a beautiful island in Indonesia, we wanted a name that pride ourselves as Earth-Loving Chinese here in Shanghai, therefore we choose to name our language after the National Nature Reserve Park of Changxing Island. However, the name is too long to be used directly, and "changx" looked too much like 'clang', so we simplified it to "cxing", which we find both pleasure in looking at it, and the name giving connotation with an information technology product.

License

The language itself and the reference implementation are released into the public domain.

Features

To best reflect the intent of the design, the specification shall be programmer-oriented. The purpose of features will be explained, with examples provided on how they're to be used. The syntax and semantic definitions follow.

Memory and Thread Safety

The language does not expose pointers - to data or to function - only opaque object handles. It uses reference counting with garbage collection to ensure memory safety. It has separate type domain for sharable types catered to multi-threaded access, and exclusive types for efficient access within a single thread; only sharable types can be declared globally.

Null safety.

It's typical to desire some result come out of a failing program, it is even more desirable that the failure of a single component doesn't deny the service of users, it's very desirable that error recovery can be easy to program, and it's undesirable that errors cannot be detected.

In cxing, errors occur in the forms of nullish values. For the special value null, accessing any member of it yields null, and calling a null as a function returns null. Nullish values can be substituted with other alternative values that programs recover from errors.

// We do not know the schema of this object, but we know it can be
// one of the two alternatives. Here the "??" punctuation is the
// nullish coalescing operator:
timescale = mp4box.movie.timescale ??
            mp4box.fragments[0].timescale ??
            mp4file.timescale;

Nullish NaNs

A bit of background first.

The IEEE-754 standard for floating point arithmetic specifies handling of exceptional conditions for computations. These conditions can be handled in the default way (default exception handling) or in some alternative ways (alternative exception handling).

The 1985 edition of the standard described exceptions and their default handling in section 7, and handling using traps in section 8. These were revised as "exceptions and default exception handling" in section 7 as well as "altenate exception handling attributes" in section 8 in the 2008 edition of the standard - these "attributes" are associated with "blocks" which (as most would expect) are group(s) of statements. Alternate exception handling are used in many advanced numerical programs to improve robustness.

As a prescriptive standard, it was intended to have language standards to describe constructs for handling floating point errors in a generic way that abstracts away the underlying detail of system and hardware implementations. In doing so, the standard itself becomes non-generic, and described features specific to some languages that were not present in others.

The cxing language employs null coalescing operators as general-purpose error-handling syntax, and make it cover NaNs by making them nullish. As an unsolicited half-improvement, I (@dannyniu) propose the following alternative description for "alternate exception handling":

Language ought to specify ways for program to transfer the control of execution, or to evaluate certain expressions when a subset (some or all) of exceptions occur.

As an example, the continued fraction function in code example A-16 from "Numerical Computing Guide" of Sun ONE Studio 8 (https://www5.in.tum.de/~huckle/numericalcomputationguide.pdf , accessed 2025-08-15) can be written in cxing as:

subr continued_fraction(N, a, b, x, out)
{
    decl f, f1, d, d1, pd1, q;
    decl j;

    f1 = 0.0;
    f = a[N];
    for(j=N-1; j>=0; j--)
    {
        d = x + f;
        d1 = 1.0 + f;
        q = b[j] / d;
        f1 = (-d1 / d) * q _Fallback f1 = b[j] * pd1 / b[j+1];
        pd1 = d1;
        f = a[j] + q;
    }
    out.f = f;
    out.f1 = f1;
}

Reproducibility issues treated in the standard are further discussed in 12.3. Reproducibility and Robustness

Lexical Elements.

For the purpose of this section, the POSIX Extended Regular Expressions (ERE) syntax is used to describe the production of lexical elements. The POSIX regular expression is chosen for it being vendor neutral. There's a difference between the POSIX semantic of regular expression and PCRE semantic, the latter of which is widely used in many programming languages even on POSIX platforms, most notably Perl, Python, PHP, and have been adopted by JavaScript. Care have been taken to ensure the expressions used in this chapter are interpreted identically under both semantics.

Comments

Comments in the language begin with 2 forward slashses: //, or 1 hash sign: #, and span towards the end of the line. Another form of comments exists, where it begins with /* and ends with */ - this form of comment can span multiple lines.

Comments in the following explanatory code blocks use the same notation as in the actual language.

Identifiers and Keywords

An identfier has the following production: [_[:alpha:]][_[:alnum:]]*. A keyword is an identifier that matches one of the following:

// Special Values:
true false null

// Phrases:
return break continue and or _Fallback

// Statements and Declarations:
decl

// Control Flows:
if else elif while do for

// Functions:
subr method this

// Translation Unit Interface:
_Include extern const

Numbers

Decimal integer literals have the following production: [1-9][0-9]*[uU]?. When the literal has the "U" suffix, the literal has type ulong, otherwise, the literal has type long.

Octal integer literals have the following production: 0o?[0-7]*. An octal literal always has type ulong.

Note: As it had been a common mistake in newcomers to zero-pad a decimal number only to realize it's become an octal literal, it is recommended that implementations issue warnings when a number is zero-padded and recommend user to prefix the literal with 0o when they do intend to use octals. Likewise, for some functions (e.g. chmod in POSIX), users may actually DO intend to use octals when they forget to zero-prefix them to become octal literals - in these cases, it is recommended that semantic analysis be performed using syntax information (if possible) and appropriate warnings be given.

Hexadecimal integer literals have the following production: 0[xX][0-9a-fA-F]+. A hexadecimal literal always has type ulong.

Radix-64 literals have the following production: 0\\[A-Za-z0-9._]+. The primary use of radix-64 literals are as option flags to functions, as bitwise compositions are obscure, and symbolic constants need verbose prefixes to not pollute global name space. A radix-64 literal always have type ulong. The characters following the backslash have the same numerical value as those in the Base 64 Encoding with URL and Filename Safe Alphabet except that the minus sign (-) is replaced with a period (.) due to possible ambiguity with the subtraction expression operator, and that there's no padding characters.

Fraction literals has the following production: [0-9]+\.[0-9]*|\.[0-9]+. The literal always has type double.

Decimal scientific literals is a fraction literal further suffixed by a decimal exponent literal production: [eE][-+]?[0-9]+. The digits of the production indicates a power of 10 to raise fraction part to.

Hexadecimal fraction literal has the following production: 0[xX]([0-9a-fA-F]+.[0-9a-fA-F]*|.[0-9a-fA-F]+) - this production is NOT a valid lexical element in the language, but hexadecimal scientific literal is, which is defined as hex fraction literal followed by hexadecimal exponent literal - having the production: [pP][-+]?[0-9]+. The digits of the production indicates a power of 2 to raise the fraction part to.

Characters and Strings

Character and string literals have the following production: ['"]([^\]|\\(["'abefnrtv]|x[0-9a-fA-F]{2,2}|[0-7]{1,3}))['"]

In the 2nd subexpression, each alternative have the following meanings:

  1. Escaping
  2. Hexadecimal byte literal. The first character is interpreted as the high nibble of the byte, while the second the low.
  3. Octal byte literal. The characters (total 3 at most) are interpreted as an octal integer literal used as value for the byte. If there are 3 digits, then the first digit must be between 0 and 3.

When single-quoted, the literal is a character literal having the value of the first character as type long, the behavior is implementation-defined if there are multiple characters.

When double-quoted, the literal is a string literal having type str.

Raw string literals have the following production: \\("[^"]*"|'[^']')

In a raw string literal, there is no escape sequence. Single quotes cannot appear in single-quoted raw string literals, and double quotes cannot appear in double-quoted raw string literals.

Raw string literals are primarily intended for writing regular expressions.

Any number of raw string and double-quoted string may be concatenated into one string object by virtue of them being placed in adjacency with no character in between other than whitespaces. The set of whitespace characters are defined to be exactly the following: U+0020 (space), U+000D (carriage return), U+000B (vertical tab), U+000A (line-feed), U+0009 (horizontal tab).

Punctuations

A punctuation is one of the following:

( ) [ ] =? . ++ -- + - ~ ! * / %
<< >> >>> < > & ^ |
<= >= == != === !== && || ?? ? :
= *= /= %= += -= <<= >>= >>>= &= ^= |= ,
; { }

Expressions

Grouping, Postifix, and Unaries.

primary-expr % primary
: "(" expressions-list ")" % paren
| identifier % ident
| constant % const
;
postfix-expr % postfix
: primary-expr % degenerate
| postfix-expr "=?" primary-expr % nullcoalesce
| postfix-expr "[" assign-expr "]" % indirect
| postfix-expr "." identifier % member
| postfix-expr "++" % inc
| postfix-expr "--" % dec
| function-call % funccall
| object-notation % objdef
;

function-call % funccall
: postfix-expr "(" ")" % noarg
| funccall-start-nocomma ")" % somearg
;

funccall-start-nocomma % funcinvokenocomma
: postfix-expr "(" assign-expr % base
| funccall-start-nocomma "," assign-expr % genrule
;

Note: Previously, the close-binding null-coalescing operator was ->, this was changed as it had been desired to reserve it for a 'trait' static call syntax where the first argument of a subroutine (i.e. non-method function) receives the value of or a reference to the left-hand of the operator. This is tentative and no commitment over this had been made yet. All in all, the close-binding null-coalescing operator is now =?. (Note dated 2025-09-26.)

unary-expr % unary
: postfix-expr % degenerate
| "++" unary-expr % inc
| "--" unary-expr % dec
| "+" unary-expr % positive
| "-" unary-expr % negative
| "~" unary-expr % bitcompl
| "!" unary-expr % logicnot
;

For inc and dec in unary and postfix, and positive and negative, operation are computed under arithmetic context. For bitcompl and logicnot, the operation are computed under integer context.

Arithmetic Binary Operations

mul-expr % mulexpr
: unary-expr % degenerate
| mul-expr "*" unary-expr % multiply
| mul-expr "/" unary-expr % divide
| mul-expr "%" unary-expr % remainder
;

The result of division on integers SHALL round towards 0.

The remainder computed SHALL be such that (a/b)*b + a%b == a is true.

If the divisor is 0, then the quotient of division becomes positive/negative infinity of type double if the sign of both operands are same/different, while the remainder becomes NaN, with the "invalid" floating point exception signalled.

For the purpose of determining the sign of operands, the integer 0 in ulong and two's complement signed long are considered to be positive.

Editorial Note: The first 3 of the above 4 paragraphs were together 1 paragraph in a previous version of the draft before 2025-08-25. This had the potential of causing the confusion that remainder is only applicable to integers. Because now remainder is also applicable to floating points, this is first separated into its own paragraph. The rule regarding type conversion on division by 0 is of separate interest, so it's also an individual paragraph now. The 4th paragraph is added on 2025-08-25.

Note: The condition for determining remainder is equivalent to:

remainder x % y shall be such x-ny such that for some integer n, if y is non-zero, the result has the same sign as x and magnitude less than that of y.

These are separate descriptions for integer modulo operator and floating point fmod function in the C language, as such, an implementation may utilize these facilities in C. Any inconsistency between these 2 definitions in C are supposedly unintentional from the standard developer's perspective.

All of mulexpr are computed under arithmetic context.

add-expr % addexpr
: mul-expr % degenerate
| add-expr "+" mul-expr % add
| add-expr "-" mul-expr % subtract
;

All of addexpr are computed under arithmetic context.

Bit Shifting Operations

bit-shift-expr % shiftexpr
: add-expr % degenerate
| bit-shift-expr "<<" add-expr % lshift
| bit-shift-expr ">>" add-expr % arshift
| bit-shift-expr ">>>" add-expr % rshift
;

All of shiftexpr are computed under integer context.

Side Note: There was left and right rotate operators. Since there's only a single 64-bit width in native integer types, bit rotation become meaningless. Therefore those functionalities will be offered in the standard library method functions.

Arithmetic Relations

rel-expr % relops
: bit-shift-expr % degenerate
| rel-expr "<" bit-shift-expr % lt
| rel-expr ">" bit-shift-expr % gt
| rel-expr "<=" bit-shift-expr % le
| rel-expr ">=" bit-shift-expr % ge
;

All of the ordering relations of relops are evaluated under arithmetic context. If either operand is NaN or null, then the value of the expression is false.

eq-expr % eqops
: rel-expr % degenerate
| eq-expr "==" rel-expr % eq
| eq-expr "!=" rel-expr % ne
| eq-expr "===" rel-expr % ideq
| eq-expr "!==" rel-expr % idne
;

Details of Loose and Strict Equality and Ordering Relation Comparison

To evaluate whether two operands are equal:

To evaluate the ordering relation of 2 operands:

Note: The equals() method is never used for ordering relations including the <= and the >= operators because an object that's missing cmpwith() has no reasonable definition of ordering relations. Conversely, the cmpwith() method is not used with the strict equality test, because the ordering of objects doesn't necessarily reflect their identity.

Bitwise Operations

bit-and % bitand
: eq-expr % degenerate
| bit-and "&" eq-expr % bitand
;

bit-xor % bitxor
: bit-and % degenerate
| bit-xor "^" bit-and % bitxor
;

bit-or % bitxor
: bit-xor % degenerate
| bit-or "|" bit-xor % bitor
;

All of the bitwise operations are computed under integer context.

Boolean Logics

logic-and % logicand
: bit-or % degenerate
| logic-and "&&" bit-or % logicand
;

logic-or % logicor
: logic-and % degenerate
| logic-or "||" logic-and % logicor
| logic-or "??" logic-and % nullcoalesce
;

Compounds

cond-expr % tenary
: logic-or % degenerate
| logic-or "?" expressions-list ":" cond-expr % tenary
;
assign-expr % assignment
: cond-expr % degenerate
| unary-expr "=" assign-expr % directassign
| unary-expr "*=" assign-expr % mulassign
| unary-expr "/=" assign-expr % divassign
| unary-expr "%=" assign-expr % remassign
| unary-expr "+=" assign-expr % addassign
| unary-expr "-=" assign-expr % subassign
| unary-expr "<<=" assign-expr % lshiftassign
| unary-expr ">>=" assign-expr % arshiftassign
| unary-expr ">>>=" assign-expr % rshiftassign
| unary-expr "&=" assign-expr % andassign
| unary-expr "^=" assign-expr % xorassign
| unary-expr "|=" assign-expr % orassign
;

See 9.2. Object/Value Key Access for further discussion.

expressions-list % exprlist
: assign-expr % degenerate
| expressions-list "," assign-expr % exprlist
;

Phrases

Between expressions and statements, there are phrases.

Phrases are like expressions, and have values, but due to grammatical constraints, they lack the usage flexibility of expressions. For example, phrases cannot be used as arguments to function calls, since phrases are not comma-delimited; nor can they be assigned to variables, since assignment operators binds more tightly than phrase delimiters. On the other hand, phrases provides flexibility in combining full expressions in way that wouldn't otherwise be expressive enough through expressions due to use of parentheses.

conj-ion % and_phrase_ion
: conj-atom "and" % and
| conj-atom "_Then" % then
;

conj-atom % and_phrase_atom
: expressions-list % degenerate
| conj-ion expressions-list % atomize
;

disj-ion % or_phrase_ion
: disj-atom "or" % or
| disj-atom "_Fallback" % nc
| conj-ion control-flow-ions % ctrl_flow
;

disj-atom % or_phrase_atom
: conj-atom % degenerate
| disj-ion conj-atom % atomize
;

phrase-stmt % phrase_stmt
: disj-atom ";" % base
| control-flow-molecule % ctrl_flow
| conj-ion control-flow-molecule % conj_ctrl_flow
| disj-ion control-flow-molecule % disj_ctrl_flow
;

control-flow-ions % ctrl_flow_ion
: control-flow-operator "or" % op_or
| control-flow-operator "_Fallback" % op_nc
| control-flow-operator identifier "or" % labelledop_or
| control-flow-operator identifier "_Fallback" % labelledop_nc
| "return" "or" % returnnull_or
| "return" "_Fallback" % returnnull_nc
| "return" expressions-list "or" % returnexpr_or
| "return" expressions-list "_Fallback" % returnexpr_nc
;

control-flow-molecule % ctrl_flow_molecule
: control-flow-operator ";" % op
| control-flow-operator identifier ";" % labelledop
| "return" ";" % returnnull
| "return" expressions-list ";" % returnexpr
;

control-flow-operator % flowctrlop
: "break" % break
| "continue" % continue
;
control-flow-operator % flowctrlop
: "break" % break
| "continue" % continue
;

Statements

statement % stmt
: ";" % emptystmt
| identifier ":" statement % labelled
| phrase-stmt % phrase
| conditionals % cond
| while-loop % while
| do-while-loop % dowhile
| for-loop % for
| "{" statements-list "}" % brace
| declaration ";" % decl
;

Condition Statements

conditionals % condstmt
: predicated-clause % base
| predicated-clause "else" statement % else
;
predicated-clause % predclause
: "if" "(" expressions-list ")" statement % base
| predicated-clause "elif" "(" expressions-list ")" statement % genrule
;

Loops

while-loop % while
: "while" "(" expressions-list ")" statement % rule
;
do-while-loop % dowhile
: "do" "{" statements-list "}" "while" "(" expressions-list ")" ";" % rule
;
for-loop % for
: "for" "(" ";" ";" ")" statement % forever

| "for" "(" ";" ";" expressions-list ")" statement % iterated

| "for" "(" ";" expressions-list ";" ")" statement % conditioned

| "for" "(" ";" expressions-list ";"
                expressions-list ")" statement % controlled

| "for" "(" expressions-list ";" ";" ")" statement % initonly

| "for" "(" expressions-list ";" ";"
            expressions-list ")" statement % nocond

| "for" "(" expressions-list ";"
            expressions-list ";" ")" statement % noiter

| "for" "(" expressions-list ";"
            expressions-list ";"
            expressions-list ")" statement % classic

| "for" "(" declaration ";" ";" ")" statement % vardecl

| "for" "(" declaration ";" ";"
            expressions-list ")" statement % vardecl_nocond

| "for" "(" declaration ";"
            expressions-list ";" ")" statement % vardecl_noiter

| "for" "(" declaration ";"
            expressions-list ";"
            expressions-list ")" statement % vardecl_controlled
;

Evaluates expressions-list or declaration before the first semicolon, then execute the for loop by invoking the "execute the for loop once" recursive procedure described later.

To execute the for loop once, evaluate expressions-list after the first semicolon, if it's true, then statement is evaluated, then the expressions-list after the second semicolon is evaluated, and the for loop is executed once again. For the purpose of "proceeding to the next iteration" as mentioned in continue, the expressions-list after the second semicolon is not considered part of the loop body, and is therefore always executed before proceeding to the next iteration.

The description here used the word "once" to describe the semantic of the loop in terms of "functional recursion", where "functional" is in the sense of the "functional programming paradigm".

Statements List

statements-list % stmtlist
: statement % base
| statements-list statement % genrule
;

Declarations

Because the value of a variable that held integer value may transition to null after being assigned the result of certain computation, the variable needs to hold type information, as such, variables are represented conceptually as "lvalue" native objects. (Actually, just value native objects, as their scope and key can be deduced from context.)

declaration % decl
: "decl" identifier % singledecl
| "decl" identifier "=" assign-expr % singledeclinit
| declaration "," identifier % declarelist1
| declaration "," identifier "=" assign-expr % declarelist2
;

Functions

function-declaration % funcdecl
: "subr" identifier arguments-list statement % subr
| "method" identifier arguments-list statement % method
;

arguments-list % arglist
: "(" ")" % empty
| arguments-begin ")" % some
;

arguments-begin % args
: "(" identifier % base
| arguments-begin "," identifier % genrule
;

Note: As of 2025-12-26, all concepts of type keywords, operand attributes, and annotation in general had been eliminated as unnecessary.

When the function body is emptystmt, the function-declaration declares a function; when it's brace, it defines a function. The this keyword MUST NOT appear in the function body of a subroutine.

When the end of the function body is reached without an explicit return phrase, a Morgoth null is implicitly returned.

The number of parameters between all declarations and the definition of a function MUST be consistent - the order of the arguments in a function call MUST be consistent with what's expected by the parameters of the function. Furthermore, whether a function is a method or a subroutine. The name of the parameters may be changed in the source code of a program. Depending on the context, this may provide the benefit of both explanative argument naming in declaration, and avoidance identifier collision in function definition when the argument is appropriately renamed.

Note: Before 2025-10-27, there were FFI methods. This had been removed, because methods are attached to properties of objects, and their prototypes cannot be reliably determined unless all parameters are of uniform type, only then, could the number of arguments be determined. As of 2025-11-03, all FFI are removed - this is because impossibility with determining the prototype of the said FFI functions when they're called from object properties.

Translation Unit Interface

A translation unit consist of a series of function declarations and definitions. Because definition of objects occur during run time, it's not possible to define data objects of static storage duration in cxing, this is recognized as unfortunate and accepted as a design decision.

A translation unit in cxing correspond to relocatable code object, or a file contain such information. We choose such definition to emphasize binary runtime portability; the word "translate/translation" doesn't require translation to occur - it's allowed for an implementation to interpret the source code and execute it directly for when it can be achieved. The terms "translation unit" and "relocatable object" take their usual commonly accepted meanings in building programs and applications.

Translation Unit Source Code Syntax

The goal symbol of a source code text string is TU - the translation unit production. It consist of a series of entity declarations.

TU % TU
: entity-declaration % base
| TU entity-declaration % genrule
;

entity-declaration % entdecl
: "_Include" string-literal ";" % srcinc
| "extern" function-declaration % extern
| function-declaration % implicit
| "const" identifier constant ";" % constdef
;

There MUST NOT be more than 1 definition of a function.

By default, all entity declarations are internal to the translation unit. For a declaration to be visible in multiple translation units, it must be declared "external" with the extern keyword.

As a best practice, external declarations should be kept in "header" files, and included (explained shortly) in a source code file. The recommended filename extension for cxing source code file is .cxing, and .hxing for headers (named after the Hongxing Yu village on the Changxing Island).

Source Code Inclusion

Source code inclusion is a limited form of reference to external definitions. This is not preprocessing, not importation, and not substitute for linking. Source code inclusion is exclusively for sharing the declarations in multiple source code files and translation units.

By default, header files are first searched in a set of pre-defined paths. (These paths are typically hierarchy organized and implemented using a file system.) If the header isn't found in the pre-defined paths, then it's searched relative to the path of the source code file. However, if the string literal naming the header file begins with ./ or ../, then it's first searched relative to the path of the source code file, then the pre-defined set of paths.

Constants Definition

The const keyword can be used to define symbolic constants. The type of the constant MUST be one of long, ulong, or double. Once the constant is defined, the identifier may be used later to substitute the defined value.

Language Semantics

Objects and Values

An object may have properties, properties may also be called members.

Note: The word "property" emphasizes the semantic value of the said component, while the word "member" emphasizes its identification. Both words may be used interchangeably consistent with the intended point of perspective.

The internals of an object is largely opaque to the language. The primary interface to objects are functions that operates on them.

Note: Functions in compiled implementations follow platform ABI's calling convention. Because certain opaque object types (such as the string type) in the runtime may need to be used in functions compiled on different implementations, the consistency of their structure layout is essential.

A native object is a construct for describing the language. It has a fixed set of properties, and are copied by value; mutating a native object does not affect other copies of the object.

An value is a native object with the following properties:

  1. the value proper,
  2. a type,
  3. for an lvalue - which can be the left operand of respective assignment expression, there's the following additional properties:
    1. a scope object - this can be a block, an object; for sharable types, this can also be the "global" scope,
    2. a key - this identifies/is the name of the lvalue under the scope.

Other native objects may be introduced in the future.

All values have a (possibly empty) set of type-associated properties that're immutable. These type-associated properties take priority over other properties. The behavior is UNSPECIFIED when these properties are written to.

Note: The data structure for the value native objects are further defined to enable the interoperability of certain language features. Values are such described to enable discussion of "lvalue"s, alternative implementations may use other conceptual models for lvalues should they see fit.

Object/Value Key Access

As described in 9.1. Objects and Values objects have properties. The key used to access a value on an object is typically a string or an integer.

When the key used to access a property is an integer, there may be a mapping from the integer to a string defined by the implementation of the runtime. Portable applications SHOULD NOT create objects with mixed string and integer keys. All implementations of the runtime SHALL guarantee there's no collision between any key that is the valid spelling of an identifier and any integer between 0 and 1010 inclusive.

Note: The limit was chosen for efficiency reasons. While implementing a number to string conersion would immediately solve the issue of collision between numerical and identifier keys, it's slightly inefficient. A second option would be to pad the integer word with bytes that can never be valid in identifiers, this would be the best of both worlds. Yet considering most applications won't be needing such big array, and those that do would probably go for the string type in the standard library, a limit is set so that plausible real-world applications and implementations can enjoy the efficiency enabled by such latitude.

To read a key from an object: 0. if the object is null, it is returned as is, preserving uncasting information. (TODO: 2026-01-24, check back for inconsistencies)

  1. if the key refers to one of the type-associated properties:
    1. a native object results consisting of:
      • value-proper: the value of this property,
      • type: the type of this property.
  2. if the key is not one of the type-associated properties:
    1. if the key __get__ is one of the type-associated properties, then this method is used to retrieve the actual property:
      1. this method is called with the object as its this parameter,
      2. this method is called with the key as a val,
      3. its return value is augmented with the 'scope' and 'key' being the object and the key used to access this property, to yield an lvalue.
    2. if the key __get__ is not defined as one of the type-associated properties, then an lvalue being null augmented with 'scope' and 'key' being the object and the key used to access this property is returned.

Note: The return value from 2.1.3. may be null. The null resulting from step 2.2. shall be a Morgoth, because there exists no diagnostic information.

To write a key onto an object:

Note: Compound assignment is different from loading the values from both sides of the assignment operator, perform the computation, then storing the result into the key, as the latter performs the read on the lvalue twice.

When a key is being deleted from an object:

Note: Destruction of values and finalization of resources are further discussed in 13.3. Finalization and Garbage Collection.

Automatic Resource Management

Note: This section is added 2025-12-28, and explicitly defines when resource management methods are invoked.

As mentioned before, lvalues have scopes. Precisely, for an lvalue:

Values that're not lvalues are known as rvalues.

The following rules govern the occurence of automatic resource management:

Subroutines and Methods

Both subroutines and methods are codes that can be executed in the language, the distinction is that methods have an implicit this parameter while subroutines don't - for compiled implementations, this is significant, as it causes difference in parameter passing under a given calling convention.

Subroutines and methods are distinct types, as such there's no restriction that subroutines have to be called directly through identifiers or that methods have to be identified through a member access.

Previously (before 2025-11-03), there had been FFI (foreign function interface) subroutines and functions. Because it's impossible to determine the prototype of the functions called from properties of objects, it is therefore unsafe to call FFI functions. On the same safety note, calling convention of (non-FFI) subroutine and methods are changed to take into account for potentially missing parameters.

Note: In a previous revision, there was a note claimed that this being a pointer handle. The idea back then was that when cxing runtime is implemented with SafeTypes2, certain APIs of the library can be used without modification. However, better runtime implementation stratagy was discovered which resulted in the introduction of type-associated properties. And so this parameter is received as a val in all (currently one) type(s) of methods. Still, to facilitate the correct passing of parameters, it necessitates the distinction between methods and subroutines. As of 2025-10-27, the ref argument type is removed entirely, further as of 2025-12-26, operand types' annotations are eliminated altogether.

Types and Special Values

The long and ulong types

The long type is a signed 64-bit integer type with negative values having two's complement representation. The ulong type is an unsigned 64-bit integer type. Both types have sizes and alignments of 8 bytes.

Note: 32-bit and narrower integer types don't exist natively, primarily because of the year 2038 problem and issue with big files. However, respective type objects for smaller integers, as well as those for float/binary32 and other floating point types are defined in the standard library to interpret data structures in byte strings.

The keyword bool is used exclusively as an alias for the type long, there is no restriction that a bool can store only 0 or 1, it exist primarily for programmers to clarify their intentions.

The double type

The double type is the floating point number type. It should correspond to the IEEE-754 (a.k.a. ISO/IEC-60559) binary64 type - that is, it should have 1 sign bit, 11 exponent bits, and 52 mantissa bits. The type have sizes and alignment of 8 bytes.

The str type

The string type str is not a built-in type, instead, it's an opaque object type defined in the standard library. The string type has significance in the indirect member access operator in a postfix-expr postfix expression.

Each occurence of (concatenation of) string literal creates a new string object.

The true and false special values

The special value true is equal to 1 in type long. The special value false is equal to 0 likewisely.

The null and NaN special values

The null special value results in certain error conditions. Accessing any properties (unless otherwise stated) results in null; calling null as if it's a function results in null.

There are 2 kinds of nulls

All nulls compares equal to each other barring uncasting.

The NaN special value represents exceptional condition in mathematical computation. NaN does not compare equal to any number, or to itself. Uncasting an NaN results in its bit pattern being re-interpreted as a long.

Both null and NaN are considered nullish in coalescing operations.

See 12. Numerics and Maths for furher discussion.

Implicit Type and Value Conversion

Values and/or their types may be converted used under certain contexts:

The "implicit type and value conversion" apply to multiple operands in such way that there's one common type (or special value) that is the same regardless of the order of the operands. This conversion is defined in terms of a binary operation that is associative and commutative, so that any binary expression operator that is associative and commutative preserve this property regardless of the types of the operands.

Under a integer context:

Under the floating point context:

Under arithmetic context:

The special value null is treated specially:

Operators shall document whether they evaluate the order of, or compute a value from operands. In general, operators that returns true/false predicate from arithmetic operands evaluates the order, while ones that computes a value would evaluate to arithmetic types.

Note: The special value NaN always have type double.

Note: It was considered to have certain operations in integer context that involved floating points to have NaNs, but this was dropped for 2 simple reasons: 1st, the current conversion rule is much simpler written, and 2nd, there exist prior art with JavaScript.

Type Definition and Object Initialization Syntax

There's a simple syntax in cxing for creating compound objects and types:

decl Complex := namedtuple() { 're': double, 'im': double };
decl I := Complex() { 're': 0, 'im': 1 };
decl sockaddr := dict() { 'host': "example.net", 'port': 443 };

In the above scenario,

namedtuple, Complex, and dict are "type objects", of which, with namedtuple being sort of a meta.

A type object contains an method property named __initset__ declared as follow:

method __initset__(key, value);

Note The parameters of the __initset__ method property were changed from ref to val. For one, most usages would have keys and values as literals, so it doesn't make sense to have references to them. Other issue is that, there haven't been a way to signify the end of list. This is now changed to use the setting of the existing __proto__ property to the type object for signifying the end-of-list. As of 2025-10-27, the ref argument type is removed completely, further as of 2025-12-26, operand types' annotations are eliminated altogether.

objdef-start % objdefstart
: objdef-start-comma % comma
| objdef-start-nocomma % nocomma
;

objdef-start-comma % objdefstartcomma
: objdef-start-nocomma "," % genrule
;

objdef-start-nocomma % objdefstartnocomma
: postfix-expr "{" postfix-expr ":" assign-expr % base
| objdef-start-nocomma "," postfix-expr ":" assign-expr % genrule
;

object-notation % objdef
: postfix-expr "{" "}" % empty
| objdef-start "}" % some
| auto-index % array
;

The postfix-expr MUST NOT be inc or dec. Furthermore, if postfix-expr is degenerate, then the primary expression MUST NOT be const.

On encountering a postfix-expr that is a type object, the key-value pairs enclosed in the braces delimited by commas are taken and the __initset__ method is called on them in turn. The key is the value of the postfix expression on the left side of the colon, while the value is that of the assignment expression on the right side of the colon. After this completes, the __initset__ method is invoked with __proto__ as key and the value of postfix-expr to signify the end, and then, the now value of postfix-expr becomes the value of the object-notation expression.

Note: As such, the property names __initset__ and __proto__ are RESERVED for the "Type Definition and Object Initialization Syntax".

auto-index-start-comma % array_piece
: postfix-expr "[" assign-expr "," % base
| auto-index-start-comma assign-expr "," % genrule
;

auto-index % array
: auto-index-start-comma "]" % complete
| auto-index-start-comma assign-expr "]" % streamline
;

The array rule is a syntax sugar that invokes __initset__ with elements in the expressions-list as value and successive integer indicies as key, starting with 0.

Numerics and Maths

Note: Much of this section is motivated by a desire to have a self-contained description of numerics in commodity computer systems, as well as an/a interpretation / explanation / rationale of the standard text that's at least more useful in terms of practical usage than the standard text itself.

Rounding

IEEE-754 specifies the following rounding modes:

The standard library provides facility for setting and querying the rounding mode in the current thread. The presence of other rounding modes (e.g. roundTiesToAway, roundToOdd, etc.) are implementation-defined.

Exceptional Conditions

Infinity and NaNs are not numbers. It is the interpretation of @dannyniu that they exist in numerical computation strictly to serve as error recovery and reporting mechanism.

IEEE-754 specifies the following 5 exceptions:

The standard library provides facility for querying, clearing, and raising exceptions. Alternate exception handling attributes are implemented in the language as error-handling flow-control constructs, such as null-coalescing expression and phrases operators, as well as execution control functions.

Reproducibility and Robustness

Floating points have a fixed significand width as well as limited range(s) of exponents, as such, they're very similar to scientific notations, further as such, they suffer from the same inaccuracy problems as any notation that truncates a large fraction of value digits. However, this do yield a favorable trade-off in terms of implementation (and to some extent, usage) efficiency.

IEEE-754 recommends that language standard provide a mean to derive a sequence (graph actually, if taken dependencies into account) of computation in a way that is deterministic. Many C compilers provide options that make maths work faster using arithmetic associativity, commutativity, distributivity and other laws (e.g. fast-math options), cxing make no provision that prevents this - people favoring efficiency and people favoring accuracy should both be audience of this language.

The root cause of calculation errors stem from the fact that the significand of floating point datum are limited. This error is amplified in calculations. A way to quantify this error is using the "unit(s) in the last place" - ULP. There are various definitions of ULP. Vendors of mathematical libraries may at their discretion document the error amplification behavior of their library routines for users to consult; framework and library standards may at their discretion specify requirements in terms error amplification limits. Developers are reminded again to recognize, and evaluate at their discretion, the trade-off between accuracy and efficiency.

Because of the existence of calculation errors, floating point datum are recommended as instrument of data exchange. In fact, earlier versions of the IEEE-754 standard distinguished between interchange formats and arithmetic formats. Because arithmetics and the format where it's carried out are essentially black-box implementation details, the significance of arithmetic formats is no longer emphasized in IEEE-754.

The recommended methodology of arithmetic, is to first derive procedure of calculation that is a simplified version of the full algorithm, eliminating as much amplification of error as possible, then feed the input datum elements into the algorithm to obtain the output data. The procedure so derived should take into account of any exceptions that might occur.

For example, (a+b)(c+d) = ac+ad + bc+bd have 2 additions and 1 multiplication on the left-hand side and 3 additions and 4 multiplications on the right-hand side.

a program may first attempt to calculate the left hand side, because it has less chance of error amplification. However, if the addition of c and d overflows but they're individually small enough such that their multiplication with either a and b won't overflow, yet the sum of a and b underflows in a certain way that's catastrophic, the the whole expression may become NaN.

In this case, a fallback expression may then compute the right-hand side of the expression, possibly yielding a finite result, or at least one that arithmetically make sense (i.e. infinity).

The result of computation carried out using such "derived" procedure will certainly deviate from the result from of a "complete" algorithm. Developers should recognize that robustness may be more important in some applications than they may expect. In the limited circumstances where an application in reality is less important, or in fact be prototyping, developer may at their careful discretion, excercise less engineering effort when coding a numerical program.

Finally, it is recognized that large existing body of sophisticated numerical programs are written using 3rd-party libraries, and/or using techniques that're under active research and not specified and beyond the scope of many standards. Developers requiring high numerial sophistication and robustness are encouraged to consult these research, and evaluate (again) the accuracy and efficiency requirements at their careful discretion.

Recommended Applications of Floating Points

The recommended applications of floating points in computer, are Computer Graphics, Signal Processing, Artificial Intelligence, etc.

Typical characteristics of these applications include:

Runtime Semantics

With the exception of resources and garbage collection, everything else in the entirity of this chapter is concerned with the interoperability of compiled implementations. Non-compiled implementations are nonetheless recommended to consult this chapter to maintain modal conceptual consistency. Care have been taken to ensure that this chapter is decoupled to the maximal extent from language proper, and any entanglement is not intentionally desired.

While the features and the specification of the language is supposed to be stable, as a guiding policy, in the unlikely event where certain interface in the runtime posing efficiency problem are to be replaced with alternatives, deprecation periods are given in the current major version of the runtime (and thus the language), before removal in a future major version should that happen; in the even more unlikely event where certain interface exposes a vulnerability so fundamental that necessitates its removal, the language along with its runtime is revised, a new version is released, and the vulnerable version is deprecated immediately. The versioning practice is in line with recommendation by Semantic Versioning.

Binary Linking Compatibility

Dynamic libraries and applications linking with dynamic libraries programmed in cxing should not statically link with the cxing runtime. Unless no opaque objects is passed between translation units compiled by different implementations (which is unlikely), statically linking to different incompatible implementations of the runtime may result in undefined behavior when opaque objects and the functions that manipulates them are from different implementations.

The version of the runtime and the version of the language specification are coupled together to make it easy to determine which version of runtime should be used to obtain the features of relevant version of the language. If the standard library is to be provided, then the runtime should be provided as part of the standard library, the name of the linking library file should be the same for both the runtime and for when it's extended into/as standard library.

The recommended name for the library corresponding to version 0.5 of the specification is libcxing0.so.5 for systems using the UNIX System V ABI such as Linux, BSDs, and several commercial Unix distros. For the Darwin family of operating systems such as macOS, iOS, etc. the recommended name is libcxing0.5.dylib .

For some platforms such as Windows, vendors have greater control over the dynamic libraries bundled with the programs in an application. Therefore no particular recommendations are made for these platforms.

Calling Conventions and Foreign Function Interface

The types long and ulong are passed to functions as C types int64_t and uint64_t respectively; the type double is passed as the C type double.

The "value" and "lvalue" native object are defined as the following C structure types:

enum types_enum : uint64_t {
    valtyp_null = 0,
    valtyp_long,
    valtyp_ulong,
    valtyp_double,

    // the opaque object type.
    valtyp_obj,

    // `porper.p` points to a `struct value_nativeobj`.
    // currently unused.
    valtyp_ref,

    // subroutines and methods.
    valtyp_subr = 6,
    valtyp_method,
    valtyp_ffisubr, // reserved as of 2025-11-03.
    valtyp_ffimethod, // reserved as of 2025-11-03.

    // 10 types so far.
};

struct value_nativeobj;
struct type_nativeobj;

struct value_nativeobj {
    union { double f; int64_t l; uint64_t u; void *p; } proper;
    union {
        const struct type_nativeobj *type;
        uint64_t pad; // zero-extend the type pointer to 64-bit on ILP32 ABIs.
    };
};

struct lvalue_nativeobj {
    struct value_nativeobj value;

    // The following fields are for lvalues:
    
    // 2026-01-01:
    // because different kind of scopes needs different accessors,
    // a mere pointer to the scope is not enough - it needs
    // accessor properties, therefore this is changed to 
    // a value native object.
    struct value_nativeobj scope;
    
    // 2026-01-01:
    // the reference implementation uses `s2data_t` from the SafeTypes2
    // library, other implementations may have a different choice,
    // barring binary compatibility and interoperability issues.
    void *key;
};

struct type_nativeobj {
    enum types_enum typeid;
    uint64_t n_entries;

    // There are `n_entries + 1` elements, last of which `type` being the only
    // `NULL` entry in the array.
    struct {
        const char *name;
        struct value_nativeobj *member;
    } static_members[];
};

As mentioned in language semantics, there are 2 types of nulls:

A function in cxing receive its arguments as a pointer to an array of value native objects, passed as the second argument in the respective C calling convention, with the fisrt argument containing the number of actual arguments passed. Because cxing is a dynamically typed language, the actual number of passed arguments may be less (or more in certain cases) than the number of argument expected as inferred from the declaration of the functions. Implementations must anticipate for these and generate Morgoth nulls as appropriate when these values are accessed.

As mentioned in 9.4. Subroutines and Methods, methods carries an implicit this parameter, this is passed as the initial argument (i.e. element with index 0 in the array of value native objects); subroutines on the other hand receive the first argument as the initial element in the arguments array directly.

The C prototype of cxing functions are:

struct value_nativeobj <func-ident>(int argn, struct value_nativeobj args[]);

Where <func-ident> is the identifier naming the function.

Note: Before 2025-10-03, it was mistakenly said that the this parameter is received as a ref. This was in conflict with the spec developer intent that opaque objects be passed as pointer handles. Since better runtime implementation stratagy was discovered, the passing of this and opaque object arguments are revised. See note in 9.4. Subroutines and Methods . As of 2025-10-27, the ref argument type is removed completely.

The cxing language did away with foreign function interface as of Nov. 2025, and this aspect had been replaced entirely with reverse FFI - that is, instead of cxing invoking the foreign function, a foregin language exposes a cxing interface instead, and invokes cxing function in accordance to the cxing calling conventions.

Finalization and Garbage Collection

Resources are generically defined as what enables a program to run and function, and assciated with it. When a value is destroyed, the resources associated with it are finalized and released, which may lead to the resources be free for reuse elsewhere.

Note: On a reference-counted implementation (which is conceptually prescribed), releasing an object "decreases" its reference count, and when the reference count reaches 0, the resources are "freed". Under implementation-defined circumstances, an object may be released by all, but still referenced somewhere (e.g. reference cycle), which require garbage collection to fully "free" the object and its resources.

Editorial Note: Previously (before 2025-09-26), finalize and destroy were used interchangeably; now finalize refer to that of resource and destroy refer to that of values (i.e. the concept of value native objects).

subr cxing_gc();

The cxing_gc foreign function invokes the garbage collection process.

Note: In part because of the runtime implementation need to be informed of destruction of values to finalize relevant resources, more pressingly because of benefit to the design of idiomatic standard library features, copying and destruction of values are now being defined. To define the concepts in terms of reference counts would mean to depend on intrinsic implementation details, and also that there's circular dependency in definition. Seeking an alternative, it's discovered that copying and destroying are paired concepts that must be described together, and this is the approach that will be taken right now.

To copy a value, means to preserve its existence in the event of its destruction, which causes the value ceases to exist; when a value is copied, the value and the copied value can both exist, and the destruction of either don't affect the existence of the other.

The __copy__ property is a method that copies its this argument and returns "the copy" as a val. The __final__ property is a method that releases the resources used by the value before the destruction of the value.

Although the __copy__ and __final__ properties are not required to be type-associated, but because they manipulate resources that're opaque to the language, they're almost always implemented as type-associated.

Note: Primitive types such as long, ulong, and double may not need a __copy__ method - runtime recognizing these sort of types may copy them in any way that may be assumed reasonable according to common sense. For types without a __final__ method, it is assumed that there are no resource consumed by the value beyond what's already in the value native object structure.

Standard Library

In the following sections, some special notations that're not part of langauge are used for ease of presentation.

The meaning of such notation:

[Type(Base): Function1 | Function2 | ... | FunctionN] := { ... }

is as follow:

An object whose members are listed in the brace may be created by and/or returned from function(s) Function1 ... FunctionN.

The optional Type(Base): part specifies Type as name for the type of object returned by the said functions, with Base representing the 'base class' that Type inherits features and/or behaviors from.

Modules

The cxing is composed of modules. Language syntax and semantics are specified in preceeding chapters, along with following chapters on mandatory standard libraries, these form what's colliqually known as "Module-0". Additional modules are optional, and should they exist, they specify interfaces related to particular functionality. Certain interfaces of a particular module may be specified in separate chapters if they're topically sparse.

For all library chapter in module-0, the following statement exists towards the beginning of relevant chapters:

This chapter forms an integral part of the language and its implementation is mandatory.

For library chapters pertaining to particular module, the following statement exists towards the beginning of chapters making up the module:

This chapter forms an integral part of module X - should module X be implemneted, this chapter along with any chapter constituting part of module X must be implemented in their entirity.

Certain modules may have dependencies on others, and the following statement may appear:

This module depend on module Y, should this module be implemented, module Y must also be implemented.

Library for the String Data Type

This chapter forms an integral part of the language and its implementation is mandatory.

str(obj) := {
  method len(),
  method trunc(newlength),
  method putc(c),
  method puts(s),
  method putfin(),
  method cmpwith(s2), // efficient byte-wise collation.
  method equals(s2), // constant-time, cryptography-safe.
  [method map(structlayout)] := {
    method __get__(k),
    method __set__(k, v),
    method unmap(),
  },
};

The string type str is a sequence of bytes. Some APIs may expect nul-terminated strings, and would ignore any byte after the first nul byte.

A string has a length that's reported by the len() function as a long, and can be altered using the trunc() function.

The putc() function can be used to append a byte whose integer value is specified by c, to the end of the string; the puts() function can be used to append another string to the end; both putc() and puts() may buffer the input on the working context of the string, such buffer need to be flushed using the putfin() function before the string is used in other places.

For trunc(), putc(), puts(), and putfin(), the object itself is returned on success, and null is returned on failure.

The cmpwith() returns less than, equal to, or greater than 0 if the string is less than, the same as, or greater than s2. The strict prefix of a string is less than the string to which it's a prefix of.

The equals() function returns true if the string equals s2 and false otherwise. If the 2 strings are of the same length, it is guaranteed that the comparison is done without cryptographically exploitable time side-channel.

The map() function creates an object that is a parsed representation of the underlying data structure. This object can be used to modify the memory backing of the data structure if the corresponding memory backing is writable. The memory backing is writable by default, and the circumstances under which it's not writable is implementation-defined.

The unmap() function unmaps the parsed representation, thus making it no longer usable, and returns true. The variable can then only be finalized (or overwritten, which would imply a finalization). The trunc() function cannot be called on the string unless there's no active mapping of the string.

Note: Previously, the unmap() function returned null. Because nullish values are reserved in cxing entirely as an error indicator, its return type is now changed to bool.

Note: Although the canonical way to access data behind a str object, is to first map it to a structure type, it is anticipated that a common extension will exist in the wild allowing for "mutable" strings - where they implement the __get__ and the __set__ methods. This is not yet considered for standardization eventhough there's no compelling reason not to. For implementations that do provide this extension, the following requirements apply:

  1. The __get__ method shall return long for byte range 0-255 inclusive, and -1 on out of bound access.
  2. The __set__ method shall accept second argument of at least the long and the ulong type, and shall cast double to ulong by truncating fractions. The byte values shall be set by discarding all but lowest 8 bits of the byte (non-octet bytes are not considered for cxing).
  3. The application shall ensure the key be non-negative integer indicies of type long or ulong, and the implementation may have undefined behavior if this requirement on the applications are not met.

Library for the Describing Data Structure Layout

This chapter forms an integral part of the language and its implementation is mandatory.

decl char, byte; // signed and unsigned 8-bit,
decl short, ushort; // signed and unsigned 16-bit,
decl int, uint; // signed and unsigned 32-bit,
decl long, ulong; // signed and unsigned 64-bit,
decl half, float, double; // binary16, binary32, binary64.
// decl _Decimal32, _Decimal64; // not supported yet.
// decl huge, uhuge, quad, _Decimal128; // too large.

[subr struct()] := {
  method __initset__(key, value),
};

[subr packed()] := {
  method __initset__(key, value),
};

[subr union()] := {
  method __initset__(key, value),
};

The representations for char, byte, short, ushort, int, uint, long, ulong, half, float, and double are explained in the comments following their description; their alignments are the same as their size. These are known as primitive types.

All of these type objects have a method member called from, which performs explicit type and value conversion - unlike implicit type and value conversion, the resulting type are determined by the type object. The method takes one argument and converts it to a value representable in the destination type:

A struct_inst object represents an instance of structure that is suitabl for use in a call to the map() method of the str type, representing a structure with members laid out sequentially and suitably align. A packed_inst is similar, but with no alignment - all members are packed back-to-back. A union_inst creates a structure layout object with all members having the same start address at byte 0 and alignment of the strictestly-align member.

Each object of type struct_inst, packed_inst, and union_inst are type objects. They're initialized with members using the syntax as described in 11. Type Definition and Object Initialization Syntax; and are created using the struct(), packed(), and union() factory functions respectively.

Primitive types and structure layout object may be array-accessed to create array types of respective types.

For example:

decl AesBlock = union() { 'b': byte[16], 'w': uint[4] };
decl Aes128Key = AesBlock[11];

The variable AesBlock holds a structure layout object of 128 bits, and Aes128Key holds the 11 round keys for an AES-128 cipher.

Dynamic Data Structure Types

This chapter forms an integral part of the language and its implementation is mandatory.

[subr dict()] := {
  method __get__(k),
  method __set__(k, v),
  method __copy__(),
  method __final__(),
  method __unset__(k),
  method __initset__(k, v),
  [method __keys__()] := {
    method __get__(k),
    method __copy__(),
    method __final__(),
  },
}

The function dict creates a dictionary, also known as associative arraies, or hash table (from the implementation's perspective) in literatures. The semantics of __get__, __set__, __copy__, __final__, and __unset__ are as described in 9.2. Object/Value Key Access, The member __initset__ SHALL NOT be a type-associated property.

The __keys__() method retrieves an immutable snapshot of the keys present on the dictionary, at the time of the snapshot, and returns an object consisting of the type-associated method properties __get__(), __copy__(), and __final__().

The __get__() method may be used to retrieve length which indicates the number of keys in the snapshot, as well as the keys themselves indexed 0 through length-1. The order of the keys are unspecified.

Type Reflection

subr isnull(x);
subr islong(x);
subr isulong(x);
subr isdouble(x);
subr _Uncast(x);

The functions isnull, islong, isulong, isdouble, determines whether the value is the special value null, of type long, type ulong, or type double respectively.

The function _Uncast performs uncasting of nulls - an operation whose semantic is described in 10. Types and Special Values.

TODO 2025-12-26: decide what to do with non-null arguments for uncasting.

Library for Floating Point Environment

Rounding Mode

To Be Changed: The exact form of the following functionality is being redesigned, and will change over time.

subr fpmode(mode);

Returns the currently active rounding mode. If mode is one of the supported mode, then set the current rounding mode to the specified mode. The value -1 is guaranteed to not be any supported mode.

The following modes are supported:

The support for other modes are unspecified.

The encoding of modes are as follow:

The next bits are as follow:

Such encoding is chosen to cater to possible future extensions. Not all possible rounding modes offer numerical analysis merit, as such some of the combinations are not valid on some implementations.

Floating Point Exceptions

To Be Changed: The exact form of the following functionality is being redesigned, and will change over time.

// Tests for exceptions
subr fptestinval(); // **invalid**
subr fptestpole(); // **division-by-zero**
subr fptestoverf(); // **overflow**
subr fptestunderf(); // **underflow**
subr fptestinexact(); // **inexact**

// Clears exceptions
subr fpclearinval(); // **invalid**
subr fpclearpole(); // **division-by-zero**
subr fpclearoverf(); // **overflow**
subr fpclearunderf(); // **underflow**
subr fpclearinexact(); // **inexact**

// Sets exceptions
subr fpsetinval(); // **invalid**
subr fpsetpole(); // **division-by-zero**
subr fpsetoverf(); // **overflow**
subr fpsetunderf(); // **underflow**
subr fpsetinexact(); // **inexact**

// Exceptions state.
subr fpexcepts(excepts);

The fptest*, fpclear*, and fpset* functions tests, clears, and sets the corresponding floating point exceptions in the current thread.

The fpexcepts function returns the current exceptions flags. If excepts is a valid flag, then the exceptions flag in the current thread will be set, otherwise, it will not be set. The value 0 is guaranteed to be a valid flag meaning all exceptions are clear; the value -1 is guaranteed to be an invalid flag. The validity of other flag values are UNSPECIFIED. When the implementation is being hosted by a C implementation, the encoding of excepts is exactly that of FE_* macros, with the clear intention to minimize unecessary duplicate enumerations as much as possible.

Regex

This chapter forms an integral part of "The Regex Module" - should this module be implemneted, this chapter along with any chapter constituting part of "The Regex Module" must be implemented in their entirity.

[RegExp: bre_comp(regex, cflags) | ere_comp(regex, cflags)] := {
  // An opaque object representing a compiled regular expression.
  method split(subject);
  method match(subject);
  method capture(subject);
  method replace(subject, replacement, limit);
};

The bre_comp() and ere_comp() functions compiles a regular expression based on the "Basic Regular Expression" and "Extended Regular Expression" syntax specified by POSIX. Under the C/POSIX locale, all regex features up to POSIX-2017 are mandatory.

The cflags are expressed as radix-64 digits, whose correspondence with POSIX compile flag constants are as follow:

The split() method splits the subject string into a 0-base-indexed array of strings. The match() method determines whether the subject string can be matched by the regular expression.

The capture() method matches the subject string, putting matched subexpressions in array (starting from the index 1), the (entire) matched portion of the subject string in the 0th element of the said array, then return the array.

The replace() method replaces limit number of occurences of the substring matching the regex, with replacement. Each occurences of $<n> where <n> is a single decimal digit is replaced with the n-th subexpression in the regex. If <n> is 0, then it's replaced with the whole matched portion of the subject string. If limit is -1, then all occurences shall be replaced.

Library for Multi-Threading

This chapter forms an integral part of "The Multi-Threading Module" - should this module be implemneted, this chapter along with any chapter constituting part of "The Multi-Threading Module" must be implemented in their entirity.

Exclusive and Sharable Objects and Mutices (Mutex)

// - Sharable objects may be used across threads
// - Exclusive objects have more efficient implementations than
//   sharable objects, but the behavior is undefined when used
//   in multiple threads.

[subr mutex(v)] := {
  method __copy__(),
  method __final__(),
  [method acquire()] := {
    method __get__(),
    method __set__(),
    method __copy__(),
    method __final__(),
  },
}

The mutex() function creates a mutex which is a sharable object that can be used across threads. The argument v will be an exclusive object protected by the mutex.

The the mutex protects its own internal state during __copy__ and __final__, which makes it a sharable object.

Note: If implemented using reference counting, __copy__ and __final__ methods of the mutex locks the underlying mutex before changing the count, and unlocks it afterwards.

The acquire() method of a mutex returns a "gift" object that can be used for accessing v - when the function returns, it is guaranteed that the thread in which it returns is the only thread holding the value protected by the mutex, and that until the gift object goes out of scope, there should be no other thread simultaneously using the value.

Note: The "gift" object is so named, that the exclusive gift is wrapped under a mutex, protected by it before being revealled to the acquiring thread.

The __get__() and the __set__() methods are used to access the object protected by the mutex. When they're called with the string v as its key argument, they respectively returns and sets the object protected by the mutex; on all other values, they returns null. Note that the object loses the protection of the mutex if it does not go out of scope when the gift object does.

The __copy__() and __final__() properties increments and decrements respectively, a conceptual counter - this counter is initially set to 1 by acquire() and any future functions that may be defined fulfilling similar role; when it reaches 0, the mutex is 'unlocked', allowing other threads to acquire the value for use.

Note: A typical implementation of acquire() may lock a mutex, sets the conceptual counter to 1, creates and returns a value native object. A typical implementation of the __copy__() method may be as simple as just incrementing the conceptual counter. A typical implementation of the __final__() method may decrement the counter, and when it reaches 0, unlocks the mutex.

Note: The conceptual counter is distinct from the reference count of any potential resources used by the value protected by the mutex and the mutex itself.

Condition Variables

[subr condvar(mtx)]  := {
  method __copy__(),
  method __final__(),
  method wait(),
  method broadcast(),
  method signal(),
}

The condvar() function creates a condition variable. It monitors a condition associated with the states protected by the mutex identified by mtx.

Note: Condition variables are created associated with a mutex up front so that potential implementations using reference count can protect that counter with the mutex just like mutex instances. It is strongly advised that implementations use actual atomic reference counts where available if they were to use reference counting for resource management.

The wait() method of a condition variable instance does the following:

  1. unlocks the mutex mtx specified in the creation argument,
  2. blocks the calling thread, and
  3. wake up when and if the condition variable is signalled,
  4. returns "a tail call of" mutex acquire -

all in one single atomic step.

The broadcast() method of a condition variable signals a condition variable and wakes up all threads that're waiting on it. The signal() method signals the condition varialbe and wakes up an unspecified subset of threads blocked on the condition variable - this subset shall not be empty if there are threads waiting on the condition variable, and this method should typically be more efficient than broadcast() when there's only 1 waiting thread.

Thread Management

[subr thrd_create(thrd_entry, thrd_param) | subr thrd_self()] := {
  method join();
  method detach();
  method equals(thrd_hnd t2);
}
subr thrd_exit();

The thrd_create() function creates a thread with the thrd_entry as its entry point, and thrd_param as its first and only argument. thrd_entry MUST be a subroutine. Its return type is null. On success, a thrd_hnd thread handle is returned, otherwise, null is returned.

The thrd_self() function returns the thread handle corresponding to the current thread.

The thrd_exit() function cause the current thread to immediately terminate.

The join() method of a thrd_hnd blocks the calling thread until the thread referred to by the thread handle termintates. The first such call on a non-detached thread is supposed to succeed - implementation shall document the underlying platform API behavior for it; subsequent calls may not necessarily succeed. The detach() method of a thread handle detaches a thread, after which, the thread may no longer be joinable, or be detached again. The return values of these 2 functions are implementation-defined.

The equals() method returns true if the thread handle t2 refers to the same thread as the thread handle on which the method is called, and false otherwise.

The thrd_hnd shall be sharable across threads. The existence of a thread handle does not imply that of the thread.

Note: The thread management facility is bare minimum, so that first it's directly implementable using existing standard APIs. That second the thread handle type thrd_hnd carries the least complexity, enabling its share across threads - although it's not explicitly specified as a sharable type, it shall behave as such. That third, the usage flexibility makes higher level constructions such as asynchronously completing subroutines, coroutines, single-apartment proxy objects, etc. be readily implementable in terms of the minimal API.

Note: The thread handle type may be implemented as sharable by virtue of it being immutable. A technique of implementing it as sharable is documented at https://langdev.stackexchange.com/a/4633/1388.

Library for I/O

This chapter forms an integral part of "The Input/Output Module" - should "The Input/Output Module" be implemneted, this chapter along with any chapter constituting part of "The Input/Output Module" must be implemented in their entirity.

For the purpose of this chapter, the following definitions from the POSIX standard apply:

Additionally, a file handle is anything that can be used to operate on files. One file may have several file handles. This chapter define several types of object that're file handles.

When a file is operated on from separate handles, the behavior is undefined.

Note: For example, in C, when standard input is being read through a FILE * handle, and buffering is enabled, the subsequent file position of the file descriptor (if implemented on top of one) is undefined - this can cause issue when one program subsequently loads another (e.g. using one of the exec functions) and the loaded program proceeds from an unexpected file position. This is among the few undefined behaviors in cxing, and we choose to not define its behavior due to its usage being arcane and lacking practicality.

When a directory entry is created as a result of calling one of the functions that accesses the filesystem, barring security hardening by specific implementations of this module, eventhough not a recommended practice, the called function should not place access restriction beyond what's already placed by system defaults.

Note: As an example of what previous paragraph means, function calls such as mkdir, mkfifo, open, etc. should use the most liberal permission on the created file - i.e. 0o777 for directories and 0o666 non-executable files according to POSIX, with 'file mode creation mask' (i.e. umask) clearing excess permissions as the said 'system default'. The previous paragraph is normative to the extent not to forbid current latest evolving security best practice.

Simple Input/Output

subr input();
subr print(s);

The input() function is a subroutine that reads a line from the standard input, stripping a single trailing line-feed \n byte, then if there is one, a trailing carriage-return \r byte, then returns the resulting string. On EOF a blessed null that uncasts to 0 is returned; on error, a blessed null that uncasts to an implementation-defined status code is returned.

TODO: This implementation-defined status code is expected to be that of the errno number. Details of this part is being decided.

The print() function is a subroutine that writes the string argument s to the standard output, followed by a single line-feed \n byte. On success, the number of bytes successfully written. A blessed null that uncasts to an implementation-defined status code is returned on failure.

Generic File

GenericFile(obj) := {
  method read(len),
  method write(s),
  method close(),
  method flush(),
  method setsync(b),
}

A GenericFile is the base type for file handle objects.

Its read method reads at most len bytes of data and returns it. On EOF, it returns an empty string; on error, it returns a blessed null that uncasts to an implementation-defined status code.

Its write method writes the string s to the file, and returns the number of bytes actually written. On error, it returns a blessed null that uncasts to an implementation-defined status code.

Its close method closes the file - any buffered content will be committed, any resource consumed for operating the file will be released, any further use of the file handle are invalid and results in error in an undefined way.

For any file, there may be several layers of buffering, two of which are defined here (the rest are given acknowledgement).

  1. The user-space buffering, which are committed by calling the flush method,
  2. The system buffering, which can be disabled (or enabled) by calling setsync with true (or false).

The act of "committing" make it more likely that future access to the data would succeed, such as writing data permanently to the disk. Further buffering, such as those done by routers and switches for network sockets, are out of the control of the program, and to some extent, the system.

Regular Files

subr open(path, mode);
RegularFile(GenericFile) := {
  method lseek(offset, whence),
}

The open function is a subroutine that opens a file named by the path argument, under the mode specified by the mode argument. The file to open doesn't have to be a regular file, any type of file supported by the implementation may be opened (e.g. FIFO, but not sockets).

The mode is made up of one of the following 4 major options:

and modified by any combination of the following minor options:

The lseek method adds offset to the position indicated by whence, and returns the resulting file position:

Unidirectional Communication

The types of files in this section are required to support communicating in one direction, volunteer support for bidirection communication is not required.

subr mkfifo(path);
subr pipe();

The mkfifo function creates a FIFO - i.e. a pipe with a filesystem name. On success, it returns path; on failure, it returns a blessed null that uncasts to an implementation-defined status code.

The pipe function creates an anonymous pipe, and returns an object with 2 members:

Both of which are file handles. On failure, it returns a blessed null that uncasts to an implementation-defined status code.

Filesystem Operations

subr rename(old, new);
subr remove(path);

The function rename renames the old directory entry to the new name. On success, new is returned, otherwise, a blessed null that uncasts to an implementation-defined status code is returned.

The function remove causes the directory entry path to be no longer accessible. On success, it returns 0, otherwise, a blessed null that uncasts to an implementation-defined status code is returned.

subr mkdir(path);
[subr opendir(path)] := {
  method readdir(),
  method rewinddir(),
  method closedir(),
}

The mkdir function creates a directory reachable at path. On success, path is returned, otherwise, a blessed null that uncasts to an implementation-defined status code is returned.

The opendir function opens a directory to enumerate its entries. On success, a directory handle is returned, otherwise, a blessed null that uncasts to an implementation-defined status code is returned.

The readdir method returns a string naming the directory entry at the current directory position, and advancing it. The directory position of a directory handle is an opaque internal concept of directory handle. The rewinddir resets the directory position to the state it was when it was opened and before any call to readdir were made.

The closedir function release any resource used by the directory handle. Any further use of the directory handle are invalid and results in error in an undefined way.

Library for Process Management

This chapter forms an integral part of "The Process Management Module" - should "The Process Management Module" be implemneted, this chapter along with any chapter constituting part of "The Process Management Module" must be implemented in their entirity.

This module depend on "The Input/Output Module", should this module be implemented, "The Input/Output Module" must also be implemented.

[subr CmdInterp()] := {
  method Argv(v),
  method Envp(v),
  method ObtainPipeForStdin(),
  method ObtainPipeForStdout(),
  method ObtainPipeForStderr(),
  method SetSourceForStdin(fp),
  method SetDestForStdout(fp),
  method SetDestForStderr(fp),
  method SetCwd(path),
  [method Exec()] := {
    method __get__(k),
    method Wait(),
    method Terminate(),
    method Kill(),
    method Stop(),
    method Continue(),
  },
};

The CmdInterp function creates a preparation context used for executing a program.

The Argv method passes the argument v as an integer-keyed object consisting of a set of strings as the "argument vector" (i.e. the argv parameter to the C main function) to the context.

The Envp method passes the argument v as a string-keyed object consisting of a set of strings as the "environment variables" (i.e. available through the getenv function in C) to the context.

The ObtainPipeFor* functions create pipes and attach appropriate reading or writing end to the standard input/output/error of the child process, and closing unused end in respective the process.

The SetSourceForStdin method sets the file handle fp as the reading source for standard input of the new process. The SetDestFor* methods set fp as the writing destination for standard output and standard error respectively.

The SetCwd method sets the initial value for the current working directory for the new process.

On success, the functions Argv, Envp, ObtainPipesFor* and Set{Source,Dest}For* functions returns the preparation context, allowing successive operations to be chained. On error, a blessed null that uncasts to an implementation-defined status code is returned.

The Exec method executes and returns a process handle, or a blessed null that uncasts to an implementation-defined status code is returned.

The __get__ method of the process handle is used to retrieve a few non-type-associated properties:

The Wait method blocks the calling thread until the process referred to by the process handle terminates, and returns its exit status.

The Terminate method terminates the process referred to by the process handle The Kill method serves a similar function, but do it more forcibly, without giving a chance for the process to do any cleanup.

The Stop method and Continue method stops (i.e. pauses) and continues the execution of the process refered to by the process handle.

Identifier Namespace

The goal of this section is to avoid ambiguity of identifiers in the global namespace - i.e. avoiding the same identifier with conflicting meanings.

To this end, "commonly-used" refers to the attribute of an entity where it's used so frequently that having a verbose spelling would hamper the readability of the code.

When an identifier consist of multiplie words, the following terms are defined:

Reserved Identifiers

Identifiers in the global namespace that begins with an underscore, followed by an uppercase letter is reserved for standardization by the language.

Identifiers which consist of less than 10 lowercase letters or digits are potentially reserved for standardization by the language, as keywords or as "commonly-used" library functions or objects. Although the use of the word "potentially" signifies that the reservation is not uncompromising, 3rd-party library vendors should nontheless refrain from defining such terse identifiers in the global namespace.

Conventions for Identifiers