This is a draft specification of Ion 1.1, a new minor version of the Ion serialization format.

Status

This document is a working draft and is subject to change.

This documents presents the formal specification for the Ion 1.1 data format. This document is not intended to be used as a user guide or as a cook book, but as a reference to the syntax and semantics of the Ion data format and its logical data model.

What's New in Ion 1.1

We will go through a high-level overview of what is new and different in Ion 1.1 from Ion 1.0 from an implementer's perspective.

Motivation

Ion 1.1 has been designed to address some of the trade-offs in Ion 1.0 to make it suitable for a wider range of applications, giving greater greater representational choice and expressive power. Some applications want to optimize writes over reads, or are constrained by the writer in some way (e.g. it's prohibitively expensive to buffer an entire value before writing). Ion 1.1 now makes both length prefixing of containers and the interning of symbol tokens independently optional, granting such writers greater flexibility. Data density is another motivation. Certain encodings (e.g., timestamps, integers) have been made more compact and efficient. More significantly, macros now enable applications to have very flexible interning of their data's structure. In aggregate, data transcoded from Ion 1.0 to Ion 1.1 should be more compact and more efficient to both read and write.

Backwards compatibility

Ion 1.0 and Ion 1.1 share the same data model. Any data that can be represented in Ion 1.0 can also be represented with full fidelity in Ion 1.1 and vice-versa. This means that it is always possible to convert data from one version to the other without risk of data loss.

Ion 1.1 readers should be able to understand both Ion 1.0 and Ion 1.1 data.

The text encoding grammar of Ion 1.1 is a superset of Ion 1.0's text encoding grammar. Any Ion 1.0 text data can also be parsed by an Ion 1.1 text parser.

note

Because Ion 1.1 has a different system symbol table, symbol IDs in an Ion 1.0 stream do not always refer to the same text as the same symbol ID in an Ion 1.1 stream. For example: in an Ion 1.0 stream, $4 is always the text "name". However, $4 may or may not be "name" in an Ion 1.1 stream. It may instead be user symbol 4 if the user has chosen not to export the system symbols.

Ion 1.1's binary encoding is substantially different from Ion 1.0's binary encoding. Many changes have been made to make values more compact, faster to read and/or faster to write. Ion 1.0's type descriptors have been supplanted by Ion 1.1's more general opcodes, which have been organized to prioritize the most commonly used encodings and make leveraging macros as inexpensive as possible.

In both text and binary Ion 1.1, the Ion Version Marker syntax is compatible with Ion 1.0's version marker syntax.

This means that an Ion 1.0-only reader can correctly identify when a stream uses Ion 1.1 (allowing it to report an error), and an Ion 1.1 reader can correctly "downshift" to expecting Ion 1.0 data when it encounters a stream using Ion 1.0.

Two streams using different Ion versions can be safely concatenated together provided that they are both text or both binary. A concatenated stream containing both Ion 1.0 and Ion 1.1 can only be fully read by a reader that supports Ion 1.1.

Upgrading an existing application to Ion 1.1 often requires little-to-no code changes, as APIs typically operate at the data model level ("write an integer") rather than at the encoding level ("write 0x64 followed by four Little-Endian bytes"). However, taking full advantage of macros after upgrading typically requires additional development time.

Macros, templates, and encoding expressions

Ion 1.1 introduces a new primitive called an encoding expression (E-expression). These expressions are (in text syntax) similar to S-expressions, but they are not part of the data model and are evaluated into one or more Ion values (called a stream) which enable compact representation of Ion data. E-expressions represent the invocation of either system defined or user defined macros with arguments that are either themselves E-expressions, value literals, or container constructors (list, sexp, struct syntax containing E-expressions) corresponding to the formal parameters of the macro's definition. The resulting stream is then expanded into the resulting Ion data model.

Top-level e-expressions At the top level, the stream becomes individual top-level values. Consider for illustrative purposes an E-expression (:values 1 2 3) that evaluates to the stream 1, 2, 3 and (:none) that evaluates to the empty stream. In the following examples, values and none are the names of the macros being invoked and each line is equivalent.

// Encoding
a (:values 1 2 3) b (:none) c

// Evaluates to
a 1 2 3 b c

E-expressions in lists or S-expressions

Within a list or S-expression, the stream becomes additional child elements in the collection.

E-expressions in lists

// Encoding
[a, (:values 1 2 3), b, (:none), c]

// Evaluates to
[a, 1, 2, 3, b, c]

E-expressions in S-expressions

// Encoding
(a (:values 1 2 3) b (:none) c)

// Evaluates to
(a 1 2 3 b c)

E-expressions in structs

Within a struct at the field name position, the resulting stream must contain structs and each of the fields in those structs become fields in the enclosing struct (the value portion is not specified); at the value position, the resulting stream of values becomes fields with whatever field name corresponded before the E-expression (empty stream elides the field all together). In the following examples, let us define (:make_struct { c: 5 }) that evaluates to a single struct {c: 5}.

// Encoding
{
  a: (:values 1 2 3),
  b: 4,
  (:make_struct { c: 5 }),
  (:make_field d 6),
  e: (:none)
}

// Evaluates to
{
  a: 1,
  a: 2,
  a: 3,
  b: 4,
  c: 5,
  d: 6
}

Macro definitions

Macros can be defined by a user either directly in a default module within an encoding directive or in a module defined externally (i.e., shared module). A macro has a name which must be unique in a module or it may have no name.

Ion 1.1 defines a list of system macros that are built-in in the module named $ion. Unlike the system symbol table, which is always installed and accessible in the local symbol table, the system macros are both always accessible to E-expressions and not installed in the local macro table by default (unlike the local symbol table).

In Ion binary, macros are always addressed in E-expressions by integer macro address. For user macros this is the offset in the local macro table. System macros may be addressed by the system macro address using a specific encoding op-code. In Ion text, macros may be addressed by the offset in the local macro table (mirroring binary), by name, or by qualifying the macro name/offset with the module name in the encoding context. An E-expression can only refer to macros installed in the local macro table or a macro from the system module. In text, an E-expression referring to a system macro that is not installed in the local macro table, must use a qualified name with the $ion module name.

For illustrative purposes let's consider the module named foo that has a macro named bar at offset 5 installed at the begining of the local macro table.

E-expressions name resolution

// allowed if there are no other macros named 'bar' 
(:bar)

// fully qualified by module–always allowed
(:foo::bar)

// by local macro table offset
(:5)

// In text, system macros are always addressable by name.
// In binary, system macros may be invoked using a separate
// opcode.
(:$ion::none)

Template definition language

User defined macros are defined by their parameters and template which defines how they are invoked and what stream of data they evaluate to. This template is defined using a domain specific Ion macro definition language with S-expressions. A template defines a list of zero or more parameters that it can accept. These parameters each have their own cardinality of expression arguments which can be specified as exactly one, zero or one, zero or more, and one or more. Furthermore the template defines what type of argument can be accepted by each of these parameters:

"Tagged" values, whose encodings always begin with an opcode.
"Tagless" values, whose encodings do not begin with an opcode and are therefore both more compact and less flexible (For example: flex_int, int32, float16).
Specific macro shaped arguments to allow for structural composition of macros and efficient encoding in binary.

The macro definition includes a template body that defines how the macro is expanded. In the language, system macros, macros defined in previously defined modules in the encoding context, and macros defined previously in the current module are available to be invoked with (.name ...) syntax where name is the macro to be invoked. Certain names in the expression syntax are reserved for special forms (for example, literal and if_none). When a macro name is shadowed by a special form, or is ambiguous with respect to all macros visible, it can always be qualified with (.module::name ...) syntax where module is the name of the module and name is the offset or name of the macro. Referring to a previously defined macro name within a module may be qualified with (.name ...) syntax.

Modules

Ion 1.0 uses symbol tables to group together related text values. In order to also accommodate macros, Ion 1.1 introduces modules, a named organizational unit that contains:

An exported symbol table, a list of text values used to compactly encode symbol tokens like field names, annotations, and symbol values.
An exported macro table, a list of macro definitions used to compactly encode complete values or partially populated containers.
An unexported nested modules map, a set of unique module names and their associated module definitions.

While Ion 1.0 does not have modules, it is reasonable to think of Ion 1.0's local symbol table as a module that only has symbols, and whose macro table and nested modules map are permanently empty.

Modules can be imported from the catalog (they subsume shared symbol tables) or defined locally.

Directives

Directives modify the encoding context. Syntactically, a directive is a top-level s-expression annotated with $ion. Its first child value is an operation name. The operation determines what changes will be made to the encoding context and which clauses may legally follow.

$ion::
(operation_name
    (clause_1 /*...*/)
    (clause_2 /*...*/)
    /*...*/
    (clause_N /*...*/))

In Ion v1.1, there are three supported directive operations:

Shared Modules

Ion 1.1 extends the concept of a shared symbol table to be a shared module. An Ion 1.0 shared symbol table is a shared module with no macro definitions. A new schema for the convention of serializing shared modules in Ion are introduced in Ion 1.1. An Ion 1.1 implementation should support containing Ion 1.0 shared symbol tables and Ion 1.1 shared modules in its catalog.

System Symbol Table Changes

The system symbol table in Ion 1.1 replaces the Ion 1.0 symbol table with new symbols. However, the system symbols are not required to be in the symbol table—they are always available to use.

Text syntax changes

Ion 1.1 text must use the $ion_1_1 version marker at the top-level of the data stream or document.

The only syntax change for the text format is the introduction of encoding expression (E-expression) syntax, which allows for the invocation of macros in the data stream. This syntax is grammatically similar to S-expressions, except that these expressions are opened with (: and closed with ). For example, (:a 1 2) would expand the macro named a with the arguments 1 and 2. This syntax is allowed anywhere an Ion value is allowed, and may also appear in the field name position of a struct. See the Macros, templates, and encoding expressions section for details.

Binary encoding changes

Ion 1.1 binary encoding reorganizes the type descriptors to support compact E-expressions, make certain encodings more compact, and certain lower priority encodings marginally less compact (for greater detail see Type Encoding Changes). The IVM for this encoding is the octet sequence 0xE0 0x01 0x01 0xEA.

Inlined symbol tokens

In binary Ion 1.0, symbol values, field names, and annotations are required to be encoded using a symbol ID in the local symbol table. For some use cases (e.g. RPC or small, independent values where the symbol table overhead cannot be amortized) this creates a burden on the writer and may not actually be efficient for an application. Ion 1.1 introduces optional binary syntax for encoding inline UTF-8 sequences for these tokens which can allow an encoder to have flexibility in whether and when to add a given text value to the symbol table.

Ion text requires no change for this feature as it already had inline symbol tokens without using the local symbol table. Ion text also has compatible syntax for representing the local symbol table and encoding of symbol tokens with their position in the table (i.e., the $id syntax).

See FlexSym documentation for greater detail.

Delimited containers

In Ion 1.0, all data is length prefixed. While this is good for optimizing the reading of data, it requires an Ion encoder to buffer any data in memory to calculate the data's length. Ion 1.1 introduces optional binary syntax to allow containers to be encoded with an end marker instead of a length prefix.

See the relevant list, sexp, and struct deliited encoding sections for greater detail.

Low-level binary encoding changes

Ion 1.0's VarUInt and VarInt encoding primitives used big-endian byte order and used the high bit of each byte to indicate whether it was the final byte in the encoding. VarInt used an additional bit in the first byte to represent the integer's sign. Ion 1.1 replaces these primitives with more optimized versions called FlexUInt and FlexInt.

FlexUInt and FlexInt use little-endian byte order, avoiding the need for reordering on common architectures like x86, aarch64, and RISC-V.

Rather than using a bit in each byte to indicate the width of the encoding, FlexUInt and FlexInt front-load the continuation bits. In most cases, this means that these bits all fit in the first byte of the representation, allowing a reader to determine the complete size of the encoding without having to inspect each byte individually.

Finally, FlexInt does not use a separate bit to indicate its value's sign. Instead, it uses two's complement representation, allowing it to share much of the same structure and parsing logic as its unsigned counterpart. Benchmarks have shown that in aggregate, these encoding changes are between 1.25 and 3x faster than Ion 1.0's VarUInt and VarInt encodings depending on the host architecture.

Ion 1.1 supplants Ion 1.0's Int encoding primitive with a new encoding called FixedInt, which uses two's complement notation instead of sign-and-magnitude. A corresponding FixedUInt primitive has also been introduced; its encoding is nearly the same as Ion 1.0's UInt primitive, save that UInt is big endian where FixedUInt is little endian.

A new primitive encoding type, FlexSym, has been introduced to flexibly encode symbol IDs and symbol tokens with inline text.

tip

FlexSym makes it possible for a writer to emit any Ion value as binary without requiring a symbol table. This is generally less efficient when working with multiple values but there are use cases where it is convenient.

Type encoding changes

All Ion types use the new low-level encoding primitives described in the previous section. Ion 1.0's type descriptors have been supplanted by Ion 1.1's more general opcodes, which have been organized to prioritize the most commonly used encodings and make leveraging macros as inexpensive as possible.

Typed null values are now encoded in two bytes using the 0xEB opcode.

Symbol IDs greater than two bytes no longer have dedicated type descriptors- the 65537th and on symbols defined in a stream will take an extra byte each to represent in the stream.

Lists and S-expressions have two encodings: a length-prefixed encoding and a new delimited form that ends with the 0xF0 opcode.

Struct values have the option of encoding their field names as a FlexSym, enabling them to write field name text inline instead of adding all names to the symbol table. There is now also a delimited form.

Similarly, symbol values now also have the option of encoding their symbol text inline.

Annotation sequences are a prefix to the value they decorate, and no longer have an outer length container. They are now encoded with one of the six opcodes 0xE4 through 0xE9.

Opcodes 0xE4 through 0xE6 indicate one or more annotations encoded as symbol addresses.
Opcodes 0xE7 through 0xE9 indicate one or more annotations encoded as a FlexSym.

The 0xE6 encoding is similar to how Ion 1.0 annotations are encoded with the exception that there is no outer length in addition to the annotations sequence length.

Integers now use a FixedInt sub-field instead of the Ion 1.0 encoding which used sign-and-magnitude (with two opcodes).

Decimals are structurally identical to their Ion 1.0 counterpart with the exception of the negative zero coefficient. The Ion 1.1 FlexInt encoding is two's complement, so negative zero cannot be encoded directly with it. Instead, an opcode is allocated specifically for encoding decimals with a negative zero coefficient.

Timestamps no longer encode their sub-field components as octet-aligned fields.

The Ion 1.1 format uses a packed bit encoding and has a biased form (encoding the year field as an offset from 1970) to make common encodings of timestamp easily fit in a 64-bit word for microsecond and nanosecond precision (with UTC offset or unknown UTC offset). Benchmarks have shown this new encoding to be 40% smaller, 59% faster to encode and 21% faster to decode in-range timestamps. A non-biased, arbitrary length timestamp with packed bit encoding is defined for uncommon cases.

Encoding expressions in binary

See the binary E-expressions documentation to learn more about how e-expressions are encoded in binary.

Macros

Like other self-describing formats, Ion 1.0 makes it possible to write a stream with truly arbitrary content—no formal schema required. However, in practice all applications have a de facto schema, with each stream sharing large amounts of predictable structure and recurring values. This means that Ion readers and writers often spend substantial resources processing undifferentiated data.

Consider this example excerpt from a webserver's log file:

{
  method: GET,
  statusCode: 200,
  status: "OK",
  protocol: https,
  clientIp: ip_addr::"192.168.1.100",
  resource: "index.html"
}
{
  method: GET,
  statusCode: 200,
  status: "OK",
  protocol: https,
  clientIp: ip_addr::"192.168.1.100",
  resource: "images/funny.jpg"
}
{
  method: GET,
  statusCode: 200,
  status: "OK",
  protocol: https,
  clientIp: ip_addr::"192.168.1.101",
  resource: "index.html"
}

Macros allow users to define fill-in-the-blank templates for their data. This enables applications to focus on encoding and decoding the parts of the data that are distinctive, eliding the work needed to encode the boilerplate.

Using this macro definition:

(macro getOk (clientIp resource)
  {
    method: GET,
    statusCode: 200,
    status: "OK",
    protocol: https,
    clientIp: (.annotate "ip_addr" (%clientIp)),
    resource: (%resource)
  })

The same webserver log file could be written like this:

(:getOk "192.168.1.100" "index.html")
(:getOk "192.168.1.100" "images/funny.jpg")
(:getOk "192.168.1.101" "index.html")

Macros are an encoding-level concern, and their use in the data stream is invisible to consuming applications. For writers, macros are always optional—a writer can always elect to write their data using value literals instead.

For a guided walkthrough of what macros can do, see Macros by example.

Macros by example

Before getting into the technical details of Ion’s macro and module system, it will help to be more familiar with the use of macros. We’ll step through increasingly sophisticated use cases, some admittedly synthetic for illustrative purposes, with the intent of teaching the core concepts and moving parts without getting into the weeds of more formal specification.

Ion macros are defined using a domain-specific language that is in turn expressed via the Ion data model. That is, macro definitions are Ion data, and use Ion features like S-expressions and symbols to represent code in a Lisp-like fashion. In this document, the fundamental construct we explore is the macro definition, denoted using an S-expression of the form (macro name …) where macro is a keyword and name must be a symbol denoting the macro's name.

NOTE: S-expressions of that shape only declare macros when they occur in the context of an encoding module. We will completely ignore modules for now, and the examples below omit this context to keep things simple.

Constants

The most basic macro is a constant:

(macro pi            // name
  ()                 // signature
  3.141592653589793) // template

This declaration defines a macro named pi. The () is the macro’s signature, in this case a trivial one that declares no parameters. The 3.141592653589793 is a similarly trivial template, an expression in Ion 1.1's domain-specific language for defining macro functions. This macro accepts no arguments and always returns a constant value.

To use pi in an Ion document, we write an encoding expression or E-expression:

$ion_1_1
(:pi)

The syntax (:pi) looks a lot like an S-expression. It’s not, though, since colons cannot appear unquoted in that context. Ion 1.1 makes use of syntax that is not valid in Ion 1.0—specifically, the (: digraph—to denote E-expressions. Those characters must be followed by a reference to a macro, and we say that the E-expression is an invocation of the macro. Here, (:pi) is an invocation of the macro named pi.

That document is equivalent to the following, in the sense that they denote the same data:

$ion_1_1
3.141592653589793

The process by which the Ion implementation turns the former document into the latter is called macro expansion or just expansion. This happens transparently to Ion-consuming applications: the stream of values in both cases are the same. The documents have the same content, encoded in two different ways. It’s reasonable to think of (:pi) as a custom encoding for 3.141592653589793, and the notation’s similarity to S-expressions leads us to the term “encoding expression” (or "e-expression").

note

Any Ion 1.1 document with macros can be fully expanded into an equivalent Ion 1.0 document.

We can streamline future examples with a couple of conventions. First, assume that any E-expression is occurring within an Ion 1.1 document; second, we use the relation notation, ⇒, to mean “expands to”. So we can say:

(:pi) ⇒ 3.141592653589793

Parameters and variable expansion

Most macros are not constant—they accept inputs that determine their results.

(macro passthrough
  (x)   // signature
  (%x)  // template
)

This macro has a signature that declares a parameter called x, and it therefore requires one argument to be passed in when it is invoked. This creates a variable (i.e. named data) called x that can be referred to within the context of the template.

note

We are careful to distinguish between the views from “inside” and “outside” the macro: parameters are the names used by a macro’s implementation to refer to its expansion-time inputs, while arguments are the data provided to a macro at the point of invocation. In other words, we have “formal” parameters and “actual” arguments.

The body of this macro is our first non-trivial template, an expression in Ion’s new domain-specific language for defining macro functions. This template definition language (TDL) treats Ion scalar values as literals, giving the decimal in pi’s template its intended meaning.

In this example, the template expression (%x) is a variable expansion in the form (%variable_name). During macro evaluation, variable expansions are replaced by the contents of the referenced variable. Because this macro's template is an expansion of its only parameter, x, invoking the macro will produce the same value it was given as an argument.

(:passthrough 1)         => 1
(:passthrough "foo")     => "foo"
(:passthrough [a, b, c]) => [a, b, c]

Simple Templates

Here's a more realistic macro:

(macro price
  (a c)                             // signature
  { amount: (%a), currency: (%c) }) // template

This macro has a signature that declares two parameters named a and c. It therefore accepts two arguments when invoked.

(:price 99 USD) ⇒ { amount: 99, currency: USD }

Template expressions that are structs are interpreted almost literally; the field names are literal--is why the amount and currency field names show up as-is in the expansion--but the field “values” are arbitrary expressions. We call these almost-literal forms quasi-literals.

The template definition language also treats lists quasi-literally, and every element inside the list is an expression. Here’s a silly macro to illustrate:

(macro two_item_list (a b) [(%a), (%b)])

(:two_item_list foo bar) ⇒ [foo, bar]

E-expressions can accept other e-expressions as arguments. For example:

(:two_item_list (:price 99 USD) foo)
//              └──────┬──────┘
//                     └─── passing another e-expression as an argument

Expansion happens from the "inside out". The outer e-expression receives the results from the expansion of the inner e-expression.

(:two_item_list (:price 99 USD) foo)

  // First, the inner invocation of `price` is expanded...
  => (:two_item_list {amount: 99, currency: USD} foo)

  // ...and then the outer invocation of `two_item_list` is expanded.
  => [{amount: 99, currency: USD}, foo]

Invoking Macros from Templates

Templates are able to invoke other macros. In TDL, an s-expression starting with a . and an identifier is an operator invocation, where operators are either macros or special forms, which we'll explore later.

(macro website_url
  (path)
  (.make_string "https://www.amazon.com/" (%path)))

This macro's template is an s-expression beginning with .make_string, so it an invocation of a macro called make_string. make_string is a system macro (a built-in function) which concatenates its arguments to produce a single string.

(:website_url "gp/cart") ⇒ "https://www.amazon.com/gp/cart"

In TDL, it is legal for a macro invocation to appear anywhere that a value could appear. In this example, an invocation of make_string is being passed as an argument to an invocation of website_url.

(macro detail_page_url
  (asin)
  (.website_url (.make_string "dp/" (%asin))))

(:detail_page_url "B08KTZ8249") ⇒ "https://www.amazon.com/dp/B08KTZ8249"

note

This may not look like much of an improvement, but the full string

"https://www.amazon.com/dp/B08KTZ8249"

takes 38 bytes to encode while the macro invocation

(:detail_page_url "B08KTZ8249")

takes as few as 12 bytes in binary Ion. While text Ion spells out the macro name to be human-friendly, the binary Ion encoding uses the macro's integer address instead. Here's an illustration:

(:1 "B08KTZ8249")

This makes the e-expression both more compact and faster to decode. Readers can also avoid the cost of repeatedly validating the UTF-8 bytes of substrings that are 'baked into' the macro definition.

E-expressions Versus S-expressions

We've now seen two ways to invoke macros, and their difference deserves thorough exploration.

An E-expression is an encoding artifact of a serialized Ion document. It has no intrinsic meaning other than the fact that it represents a macro invocation. The meaning of the document can only be determined by expanding the macro, passing the E-expression's arguments to the function defined by the macro. This all happens as the Ion document is parsed, transparent to the reader of the document. In casual terms, E-expressions are expanded away before the application sees the data.

Within the template definition language (TDL), you can define new macros in terms of other macros, and those invocations are written as S-expressions. Unlike E-expressions, TDL macro invocations are normal Ion data structures, consumed by the Ion system and interpreted as TDL. Further, TDL macro invocations only have meaning in the context of a macro definition, inside an encoding module, while E-expressions can occur anywhere in an Ion document.

These two invocation forms are syntactically aligned in their calling convention, but are distinct in context and "immediacy". E-expressions occur anywhere and are invoked immediately, as they are parsed. S-expression invocations occur only within macro definitions, and are only invoked if and when that code path is ever executed by invocation of the surrounding macro.

Rest Parameters

Sometimes we want a macro to accept an arbitrary number of arguments, in particular all the rest of them. The make_string macro is one of those, concatenating all of its arguments into a single string:

(:make_string)                 ⇒ ""
(:make_string "a")             ⇒ "a"
(:make_string "a" "b")         ⇒ "ab"
(:make_string "a" "b" "c")     ⇒ "abc"
(:make_string "a" "b" "c" "d") ⇒ "abcd"

To make this work, the declaration of make_string is effectively:

(macro make_string (parts*) /*...*/)

The * is a cardinality modifier. A parameter's cardinality dictates both the number of argument expressions it can accept and the number of values its expansion can produce.

In the examples so far, all parameters have had a cardinality of exactly-one, which is the default. The parts parameter has a cardinality of zero-or-more, meaning:

It can accept zero-or-more argument expressions.
When expanded, it will produce zero-or-more values.

When the final parameter in the macro signature is zero-or-more, "all of the rest" of the argument expressions will be passed to that parameter.

(:make_string)
//           └── 0 argument expressions passed to `parts`
(:make_string "a")
//            └┬┘
//             └── 1 argument expression passed to `parts`
(:make_string "a" "b" "c" "d")
//            └──────┬──────┘
//                   └── 4 argument expressions passed to `parts`

At this point our distinction between parameters and arguments becomes more apparent, since they are no longer one-to-one: this macro with one parameter can be invoked with one argument, or twenty, or none.

tip

To declare a final parameter that requires at least one rest-argument, use the + modifier.

Arguments and results are streams

The inputs to and results from a macro are modeled as streams of values. When a macro is invoked, each argument expression produces a stream of values, and within the macro definition, each parameter name refers to the corresponding stream, not to a specific value. The declared cardinality of a parameter constrains the number of elements produced by its stream, and is verified by the macro expansion system.

More generally, the results of all template expressions are streams. While most expressions produce a single value, various macros and special forms can produce zero or more values.

We have everything we need to illustrate this, via another system macro, values:

(macro values (vals*) (%vals))

(:values 1)           ⇒ 1
(:values 1 true null) ⇒ 1 true null
(:values)             ⇒ _nothing_

The values macro accepts any number of arguments and returns their values; it is effectively a multi-value identity function. We can use this to explore how streams combine in E-expressions.

Splicing in encoded data

At the top level, an e-expression's resulting values become top-level values.

(:values 1 2 3) => 1 2 3

When an E-expression appears within a list or S-expression, the resulting values are spliced into the surrounding container:

[first, (:values), last]          ⇒ [first, last]
[first, (:values "middle"), last] ⇒ [first, "middle", last]
(first (:values left right) last) ⇒ (first left right last)

This also applies wherever a tagged type can appear inside an E-expression:

(first (:values (:values left right) (:values)) last) ⇒ (first left right last)

Note that each argument-expression always maps to one parameter, even when that expression returns too-few or too-many values.

(macro reverse (a b)
  [(%b), (%a)])

(:reverse (:values 5 USD))   ⇒ // Error: 'reverse' expects 2 arguments, given 1
(:reverse 5 (:values) USD)   ⇒ // Error: 'reverse' expects 2 arguments, given 3
(:reverse (:values 5 6) USD) ⇒ // Error: argument 'a' expects 1 value, given 2

In this example, the parameters expect exactly one argument, producing exactly one value. When the cardinality allows multiple values, then the argument result-streams are concatenated. We saw this (rather subtly) above in the nested use of values, but can also illustrate using the rest-parameter to make_string, which we'll expand here in steps:

(:make_string (:values) a (:values b (:values c) d) e)
//              ^^^^^^ next
  ⇒ (:make_string a (:values b (:values c) d) e)
//                               ^^^^^^ next
  ⇒ (:make_string a (:values b c d) e)
//                    ^^^^^^ next
  ⇒ (:make_string a b c d e)
  ⇒ "abcde"

Splicing within sequences is straightforward, but structs are trickier due to their key/value nature. When used in field-value position, each result from a macro is bound to the field-name independently, leading to the field being repeated or even absent:

{ name: (:values) }          ⇒ { }
{ name: (:values v) }        ⇒ { name: v }
{ name: (:values v ann::w) } ⇒ { name: v, name: ann::w }

An E-expression can even be used in place of a key-value pair, in which case it must return structs, which are merged into the surrounding container:

{ a:1, (:values), z:3 }             ⇒ { a:1, z:3 }
{ a:1, (:values {}), z:3 }          ⇒ { a:1, z:3 }
{ a:1, (:values {b:2}), z:3 }       ⇒ { a:1, b:2, z:3 }
{ a:1, (:values {b:2} {z:3}), z:3 } ⇒ { a:1, b:2, z:3, z:3 }

{ a:1, (:values key "value") } ⇒ // Error: struct expected for splicing into struct

Splicing in template expressions

The preceding examples demonstrate splicing of E-expressions into encoded data, but similar stream-splicing occurs within the template language, making it trivial to convert a stream to a list:

(macro list_of (vals*) [ (%vals) ])
(macro clumsy_bag (elts*) { '': (%elts) })

(:list_of)   ⇒ []
(:clumsy_bag) ⇒ {}

(:list_of 1 2 3)    ⇒ [1, 2, 3]
(:clumsy_bag true 2) ⇒ {'':true, '':2}

Mapping templates over streams: `for`

Another way to produce a stream is via a mapping form. The for special form evaluates a template once for each value provided by a stream or streams. Each time, a local variable is created and bound to the next value on the stream.

(macro price (a c) { amount: (%a), currency: (%c) })

(macro prices (currency amounts*)
  (.for
    // Binding pairs
    [(amt (%amounts))]
    //└┬┘ └────┬───┘
    // │       └─── stream to map over
    // └─────────── variable name

    // Template
    (.price (%amt) (%currency))
  )
)

The first subform of for is a list of binding pairs, S-expressions containing a variable names and a series of TDL expressions. Here, that TDL expression series is a single parameter expansion, so each individual value from the amounts stream is bound to the name amt before the price invocation is expanded.

(:prices GBP 10 9.99 12.)
  ⇒ {amount:10, currency:GBP} {amount:9.99, currency:GBP} {amount:12., currency:GBP}

More than one stream can be iterated in parallel, and iteration terminates when any stream becomes empty.

(macro zip (front* back*)
  (.for [(f (%front)),
        (b (%back))]
    [(%f), (%b)]))

(:zip (:values 1 2 3) (:values a b))
  ⇒ [1, a] [2, b]

Empty streams: `none`

The empty stream is an important edge case that requires careful handling and communication. The built-in macro none accepts no values and produces an empty stream:

(macro list_of (items*) [(%items)])

(:list_of (:none)) ⇒ []
(:list_of 1 (:none) 2) ⇒ [1, 2]
[(:none)]   ⇒ []
{a:(:none)} ⇒ {}

When used as a macro argument, a none invocation (like any other expression) counts as one argument:

(:pi (:none)) ⇒ // Error: 'pi' expects 0 arguments, given 1

The none macro is equivalent to an empty expression group ((::)), but unlike an expression group, it is not limited to use as a macro argument. (:none) can appear anywhere an expression can appear.

tip

While (:none) and (:values) both produce the empty stream, the former is preferred for clarity of intent and terminology.

Cardinality

As described earlier, parameters are all streams of values, but the number of values can be controlled by the parameter's cardinality. So far we have seen the default exactly-one and the * (zero-or-more) cardinality modifiers, and in total there are four:

Modifier	Cardinality
`!`	`exactly-one` value
`?`	`zero-or-one` value
`+`	`one-or-more` values
`*`	`zero-or-more` values

Exactly-One

Many parameters expect exactly one value and thus have exactly-one cardinality. This is the default cardinality, but the ! modifier can be used for clarity.

This cardinality means that the parameter requires a stream producing a single value, so one might refer to them as singleton streams or just singletons colloquially.

Zero-or-One

A parameter with the modifier ? has zero-or-one cardinality, which is much like exactly-one cardinality, except the parameter accepts an empty-stream argument as a way to denote an absent parameter.

(macro temperature (degrees scale?)
  {
    degrees: (%degrees),
    scale: (%scale)
  })

Since the scale accepts the empty stream, we can pass it an empty expression group:

(:temperature 96 F)    ⇒ {degrees:96, scale:F}
(:temperature 283 (::)) ⇒ {degrees:283}

Note that the result’s scale field has disappeared because no value was provided. It would be more useful to fill in a default value, which we can achieve with the default system macro:

(macro temperature (degrees scale?)
  {
    degrees: (%degrees),
    scale: (.default (%scale) K)
  })

(:temperature 96 F)    ⇒ {degrees:96,  scale:F}
(:temperature 283 (::)) ⇒ {degrees:283, scale:K}

To refine things a bit further, trailing arguments that accept the empty stream can be omitted entirely:

(:temperature 283) ⇒ {degrees:283, scale:K}

tip

The default macro is implemented with the help of a special form that can detect the empty stream: if_none.

Zero-or-More

A parameter with the modifier * has zero-or-more cardinality.

(macro prices (amount* currency)
  (.for [(amt (%amount))]
    (.price (%amt) (%currency))))

When * is on a non-final parameter, we cannot take “all the rest” of the arguments and must use a different calling convention to draw the boundaries of the stream. Instead, we need a single expression that produces the desired values:

(:prices (::) JPY)          ⇒ // empty stream
(:prices 54 CAD)           ⇒ {amount:54, currency:CAD}
(:prices (:: 10 9.99) GBP)  ⇒ {amount:10, currency:GBP} {amount:9.99, currency:GBP}

Here we use a non-empty expression group (:: /*...*/) to delimit the multiple elements of the amount stream.

One-or-More

A parameter with the modifier + has one-or-more cardinality, which works like * except:

+ parameters cannot accept the empty stream
When expanded, + parameters must produce at least one value. To continue using our prices example:

(macro prices (amount+ currency)
  (.for [(amt (%amount))]
    (.price (%amt) (%currency))))

(:prices (::) JPY)          ⇒ // Error: `+` parameter received the empty stream
(:prices 54 CAD)           ⇒ {amount:54, currency:CAD}
(:prices (:: 10 9.99) GBP)  ⇒ {amount:10, currency:GBP} {amount:9.99, currency:GBP}

On the final parameter, + collects the remaining (one or more) arguments:

(macro thanks (names+)
  (.make_string "Thank you to my Patreon supporters:\n"
    (.for [(name (%names))]
      (.make_string "  * " (%name) "\n"))))

(:thanks) ⇒ // Error: at least one value expected for + parameter

(:thanks Larry Curly Moe) =>
'''\
Thank you to my Patreon supporters:
  * Larry
  * Curly
  * Moe
'''

Expression Groups

The non-rest versions of multi-value parameters require some kind of delimiting syntax to contain the applicable sub-expressions. For the tagged-type parameters we have seen so far, you could use :values or some other macro to produce the stream, but that doesn't work for tagless types. The preferred syntax, supporting all argument types, is a special delimiting form called an expression group. Here is a macro to illustrate:

(macro prices
  (amount* currency)
  (.for [(amt (%amount))]
    (.price (%amt) (%currency))))

The parameter amount accepts any number of argument expressions. It's easy to provide exactly one:

(:prices 12.99 GBP) ⇒ {amount:12.99, currency:GBP}

To provide a non-singleton stream of values, use an expression group. Inside an E-expression, a group starts with (::

(:prices (::) GBP)       ⇒ _void_
(:prices (:: 1) GBP)     ⇒ {amount:1, currency:GBP}
(:prices (:: 1 2 3) GBP) ⇒ {amount:1, currency:GBP}
                           {amount:2, currency:GBP}
                           {amount:3, currency:GBP}

Within the group, the invocation can have any number of expressions that align with the parameter's encoding. The macro parameter produces the results of those expressions, concatenated into a single stream, and the expander verifies that each value on that stream is acceptable by the parameter’s declared encoding.

(:prices (:: 1 (:values 2 3) 4) GBP) ⇒ {amount:1, currency:GBP}
                                       {amount:2, currency:GBP}
                                       {amount:3, currency:GBP}
                                       {amount:4, currency:GBP}

Expression groups may only appear inside macro invocations where the corresponding parameter has ?, *, or + cardinality. There is no binary opcode for these constructs; the encoding uses a tagless format to keep things as dense as possible. As usual, the text format mirrors this constraint.

In TDL, an expression group is denoted using (.. and ). For example:

(macro foo (x*) { foo: (%x) })
(macro bar () (.foo (.. b a r))) // Argument to foo is 3 expressions in an expression group

(:bar) ⇒ { foo: b,
           foo: a,
           foo: r }

Optional Arguments

When a trailing parameter accepts the empty stream, an invocation can omit its corresponding argument expression, as long as no following parameter is being given an expression. We’ve seen this as applied to final * parameters, but it also applies to ? parameters:

(macro optionals (a* b? c! d* e? f*)
  (.make_list a b c d e f))

Since d, e, and f all accept the empty stream, they can be omitted by invokers. But c is required so a and b must always be present, at least as an empty group:

(:optionals (::) (::) "value for c") ⇒ ["value for c"]

Now c receives the string "value for c" while the other parameters are all empty. If we want to provide e, then we must also provide a group for d:

(:optionals (::) (::) "value for c" (::) "value for e")
  ⇒ ["value for c", "value for e"]

Tagless and fixed-width types

In Ion 1.0, the binary encoding of every value starts off with a “type tag”, an opcode that indicates the data-type of the next value and thus the interpretation of the following octets of data. In general, these tags also indicate whether the value has annotations, and whether it’s null.

These tags are necessary because the Ion data model allows values of any type to be used anywhere. Ion documents are not schema-constrained: nothing forces any part of the data to have a specific type or shape. We call Ion “self-describing” precisely because each value self-describes its type via a type tag.

If schema constraints are enforced through some mechanism outside the serializer/deserializer, the type tags are unnecessary and may add up to a non-trivial amount of wasted space. Furthermore, the overhead for each value also includes length information: encoding an octet of data takes two octets on the stream.

Ion 1.1 tries to mitigate this overhead in the binary format by allowing macro parameters to use more-constrained tagless types. These are subtypes of the concrete types, constrained such that type tags are not necessary in the binary form. In general this can shave 4-6 bits off each value, which can add up in aggregate. In the extreme, that octet of data can be encoded with no overhead at all.

The following tagless types are available:

Tagless type	Description
`flex_symbol`	Tagless symbol (SID or text)
`flex_string`	Tagless string
`flex_int`	Tagless, variable-width signed int
`flex_uint`	Tagless, variable-width unsigned int
`int8` `int16` `int32` `int64`	Fixed-width signed int
`uint8` `uint16` `uint32` `uint64`	Fixed-width unsigned int
`float16` `float32` `float64`	Fixed-width float

To define a tagless parameter, just declare one of the primitive types:

(macro point (flex_int::x flex_int::y)
  {x: (%x), y: (%y)})

(:point 3 17) ⇒ {x:3, y:17}

The tagless encoding has no real benefit here in text, as primitive types aim to improve the binary encoding.

This density comes at the cost of flexibility. Primitive types cannot be annotated or null, and arguments cannot be expressed using macros, like we’ve done before:

(:point null.int 17)   ⇒ // Error: primitive flex_int does not accept nulls
(:point a::3 17)       ⇒ // Error: primitive flex_int does not accept annotations
(:point (:values 1) 2) ⇒ // Error: cannot use macro for a primitive argument

While Ion text syntax doesn’t use tags—the types are built into the syntax—these errors ensure that a text E-expression may only express things that can also be expressed using an equivalent binary E-expression.

For the same reasons, supplying a (non-rest) tagless parameter with no value, or with more than one value, can only be expressed by using an expression group.

A subset of the primitive types are fixed-width: they are binary-encoded with no per-value overhead.

(macro byte_array
  (uint8::bytes*)
  [(%bytes)])

Invocations of this macro are encoded as a sequence of untagged octets, because the macro definition constrains the argument shape such that nothing else is acceptable. A text invocation is written using normal ints:

(:byte_array 0 1 2 3 4 5 6 7 8) ⇒ [0, 1, 2, 3, 4, 5, 6, 7, 8]
(:byte_array 9 -10 11)          ⇒ // Error: -10 is not a valid uint8
(:byte_array 256)               ⇒ // Error: 256 is not a valid uint8

As above, Ion text doesn’t have syntax specifically denoting “8-bit unsigned integers”, so to keep text and binary capabilities aligned, the parser rejects invocations where an argument value exceeds the range of the binary-only type.

Primitive types have inherent tradeoffs and require careful consideration, but in the right circumstances the density wins can be significant.

Macro Shapes

We can now introduce the final kind of input constraint, macro-shaped parameters. To understand the motivation, consider modeling a scatter-plot as a list of points:

[{x:3, y:17}, {x:395, y:23}, {x:15, y:48}, {x:2023, y:5}, …]

Lists like these exhibit a lot of repetition. Since we already have a point macro, we can eliminate a fair amount:

[(:point 3 17), (:point 395 23), (:point 15 48), (:point 2023 5), …]

This eliminates all the xs and ys, but leaves repeated macro invocations.

What we’d like is to eliminate the point calls and just write a stream of pairs, something like:

(:scatterplot (3 17) (395 23) (15 48) (2023 5) …)

We can achieve exactly that with a macro-shaped parameter, in which we use the point macro as an encoding:

(macro scatterplot (point::points*)
//                  ^^^^^
  [(%points)])

point is not one of the built-in encodings, so this is a reference to the macro of that name defined earlier.

(:scatterplot (3 17) (395 23) (15 48) (2023 5))
  ⇒
  [{x:3, y:17}, {x:395, y:23}, {x:15, y:48}, {x:2023, y:5}]

Each argument S-expression like (3 17) is implicitly an E-expression invoking the point macro. The argument mirrors the shape of the inner macro, without repeating its name. Further, expansion of the implied points happens automatically, so the overall behavior is just like the preceding variant and the points parameter produces a stream of structs.

The binary encoding of macro-shaped parameters are similarly tagless, eliding any opcodes mentioning point and just writing its arguments with minimal delimiting.

Macro types can be combined with cardinality modifiers, with invocations using groups as needed:

(macro scatterplot
  (point::points+ flex_string::x_label flex_string::y_label)
  { points: [(%points)], x_label: (%x_label), y_label: (%y_label) })

(:scatterplot (:: (3 17) (395 23) (15 48) (2023 5)) "hour" "widgets")
  ⇒
  {
    points: [{x:3, y:17}, {x:395, y:23}, {x:15, y:48}, {x:2023, y:5}],
    x_label: "hour",
    y_label: "widgets"
  }

As with other tagless parameters, you cannot replace a group with a macro invocation, and you cannot use a macro invocation as an element of an expression group:

(:scatterplot (:make_points 3 17 395 23 15 48 2023 5) "hour" "widgets")
  ⇒ // Error: Expression group expected, found :make_points

(:scatterplot (:: (3 17) (:make_points 395 23 15 48) (2023 5)) "hour" "widgets")
  ⇒ // Error: sexp expected with args for 'point', found :make_points

(:scatterplot (:: (3 17) (:point 395 23) (15 48) (2023 5)) "hour" "widgets")
  ⇒ // Error: sexp expected with args for 'point', found :point

This limitation mirrors the binary encoding, where both the expression group and the individual macro invocations are tagless and there's no way to express a macro invocation.

tip

The primary goal of macro-shaped arguments, and tagless types in general, is to increase density by tightly constraining the inputs.

Defining macros

A macro is defined using a macro clause within a module's macros clause.

Syntax

(macro name signature template)

Argument	Description
`name`	A unique name assigned to the macro. When constructing an anonymous macro `null` is used in the place of a unique name.
`signature`	An s-expression enumerating the parameters this macro accepts.
`template`	A template definition language (TDL) expression that can be evaluated to produce zero or more Ion values.

Example macro clause

//      ┌─── name
//      │     ┌─── signature
//     ┌┴┐ ┌──┴──┐
(macro foo (x y z)
  {           // ─┐
    x: (%x),  //  │
    y: (%y),  //  ├─ template
    z: (%z),  //  │
  }           // ─┘
)

Macro names

Syntactically, macro names are identifiers. Each macro name in a macro table must be unique.

In some circumstances, it may not make sense to name a macro. (For example, when the macro is generated automatically.) In such cases, authors must use null to indicate that the macro does not have a name. Anonymous macros can only be referenced by their address in the macro table.

Macro parameters

A parameter is a named stream of Ion values. The stream's contents are determined by the macro's invocation. A macro's parameters are declared in the macro signature.

Each parameter declaration has three elements:

A name
An optional encoding
An optional cardinality

Example parameter declaration

//     ┌─── encoding
//     │      ┌─── name
//     │      │┌─── cardinality
// ┌───┴───┐  ││
   flex_uint::x*

Parameter names

A parameter's name is an identifier. The name is required; any non-identifier (including null, quoted symbols, $0, or a non-symbol) found in parameter-name position will cause the reader to raise an error.

All of a macro's parameters must have unique names.

Parameter encodings

In binary Ion, the default encoding for all parameters is tagged. Each argument passed into the macro from the callsite is prefixed by an opcode (or "tag") that indicates the argument's type and length.

Parameters may choose to specify an alternative encoding to make the corresponding arguments' binary representation more compact and/or fixed width. These "tagless" encodings do not begin with an opcode, an arrangement which saves space but also limits the domain of values they can each represent. Arguments passed to tagless parameters cannot be null, cannot be annotated, and may have additional range restrictions.

When writing text Ion, the declared encoding does not affect how values are serialized. However, it does constrain the domain of values that that parameter will accept. When transcribing from text to binary, it must be possible to serialize all values passed as an argument using the parameter's declared encoding. This means that parameters with a primitive encoding cannot be annotated or a null of any type. If an int or a float is being passed to a parameter with a fixed-width encoding, that value must fit within the range of values that can be represented by that width. For example, the value 256 cannot be passed to a parameter with an encoding of uint8 because a uint8 can only represent values in the range [0, 255].

To specify an encoding, the parameter name is annotated with a primitive encoding or a macro reference. Encoding types may be qualified with their module names for disambiguation when there is more than one macro with the given name that is in scope, or when a macro name shadows a primitive encoding.

Primitive encodings

The following primitive encodings are provided by the system module.

Tagless encodings	Description
`flex_int`	Variable-width, signed int
`flex_uint`	Variable-width, unsigned int
`int8` `int16` `int32` `int64`	Fixed-width, signed int
`uint8` `uint16` `uint32` `uint64`	Fixed-width, unsigned int
`float16` `float32` `float64`	Fixed-width float
`flex_symbol`	`FlexSym`-encoded SID or text
`flex_string`	Variable-width string

Parameter cardinalities

A parameter name may optionally be followed by a cardinality modifier. This is a sigil that indicates how many values the parameter expects the corresponding argument expression to produce when it is evaluated.

Modifier	Cardinality
`?`	zero-or-one value
`*`	zero-or-more values
`!`	exactly-one value
`+`	one-or-more values

If no modifier is specified, the parameter's cardinality will default to exactly-one. An exactly-one parameter will always expand to a stream containing a single value.

Parameters with a cardinality other than exactly-one are called variadic parameters.

If an argument expression expands to a number of values that the cardinality forbids, the reader must raise an error.

When a parameter has a cardinality of zero-or-more or one-or-more, the arguments for that parameter are eligible to use rest argument syntax in the Ion text encoding.

Optional parameters

Parameters with a cardinality that can accept an empty expression group as an argument (? and *) are called optional parameters. In text Ion, their corresponding arguments can be elided from e-expressions and TDL macro invocations when they appear in tail position. When an argument is elided, it is treated as though an explicit empty group (::) had been passed in its place.

In contrast, parameters with a cardinality that cannot accept an empty group (! and +) are called required parameters. Required parameters can never be elided.

(:set_macros
    (foo (x y? z*) // `x` is required, `y` and `z` are optional
        [x, y, z]
    )
)

// `z` is a populated expression group
(:foo 1 2 (:: 3 4 5)) => [1, 2, 3, 4, 5]

// `z` is an empty expression group
(:foo 1 2 (::))       => [1, 2]

// `z` has been elided
(:foo 1 2)            => [1, 2]

// `y` and `z` have been elided
(:foo 1)              => [1]

// `x` cannot be elided
(:foo)                => ERROR: missing required argument `x`

Optional parameters that are not in tail position cannot be elided, as this would cause them to appear in a position corresponding to a different argument.

(:set_macros
    (foo (x? y) // `x` is optional, `y` is required
        [x, y]
    )
)

(:foo (::) 1) => [(::), 1] => [1]
(:foo 1)                   => ERROR: missing required argument `y`

Macro signatures

A macro's signature is the ordered sequence of parameters which an invocation of that macro must define. Syntactically, the signature is an s-expression of parameter declarations.

Example macro signature

(w flex_uint::x* float16::y? z+)

Name	Encoding	Cardinality
`w`	`tagged`	`exactly-one`
`x`	`flex_uint`	`zero-or-more`
`y`	`float16`	`zero-or-one`
`z`	`tagged`	`one-or-more`

Template definition language (TDL)

The macro's template is a single Ion value that defines how a reader should expand invocations of the macro. Ion 1.1 introduces a template definition language (TDL) to express this process in terms of the macro's parameters. TDL is a small language with only a few constructs.

A TDL expression can be any of the following:

A literal Ion scalar
A macro invocation
A variable expansion
A quasi-literal Ion container
A special form

In terms of its encoding, TDL is "just Ion." As you shall see in the following sections, the constructs it introduces are written as s-expressions with a distinguishing leading value or values.

A grammar for TDL can be found in the Grammar chapter.

Ion scalars

Ion scalars are interpreted literally. These include values of any type except list, sexp, and struct. null values of any type—even null.list, null.sexp, and null.struct—are also interpreted literally.

Examples

These macros are constants; they take no parameters. When they are invoked, they expand to a stream of a single value: the Ion scalar acting as the template expression.

$ion::
(module _
  (macros
    (macro greeting () "hello")
    (macro birthday () 1996-10-11)
    // Annotations are also literal
    (macro price () USD::29.95)
  )
)

(:greeting) => "hello"
(:birthday) => 1996-10-11
(:price)    => USD::29.95

Macro invocations

Macro invocations call an existing macro. The invoked macro could be a system macro, a macro imported from a shared module, or a macro previously defined in the current scope.

Syntactically, a macro invocation is an s-expression whose first value is the operator . and whose second value is a macro reference.

Grammar

macro-invocation   ::= '(.' macro-ref macro-arg* ')'

macro-ref          ::= (module-name '::')? (macro-name | macro-address)

macro-arg          ::= expression | expression-group

macro-name         ::= ion-identifier

macro-address      ::= unsigned-ion-integer

expression-group   ::= '(..' expression* ')'

Invocation syntax illustration

// Invoking a macro defined in the same module by name.
(.macro_name              arg1 arg2 /*...*/ argN)

// Invoking a macro defined in another module by name.
(.module_name::macro_name arg1 arg2 /*...*/ argN)

// Invoking a macro defined in the same module by its address.
(.0              arg1 arg2 /*...*/ argN)

// Invoking a macro defined in a different module by its address.
(.module_name::0 arg1 arg2 /*...*/ argN)

// Passing more than one argument expression for a single parameter using an expression group
(.macro_name (.. expr1 expr2 /*...*/ exprN) )

Examples

$ion::
(module _
  (macros
    // Calls the system macro `values`, allowing it to produce a stream of three values.
    (macro nephews () (.values Huey Dewey Louie))

    // Calls a macro previously defined in this module, splicing its result
    // stream into a list.
    (macro list_of_nephews () [(.nephews)])
  )
)

(:nephews)         => Huey Dewey Louie
(:list_of_nephews) => [Huey, Dewey, Louie]

important

There are no forward references in TDL. If a macro definition includes an invocation of a name or address that is not already valid, the reader must raise an error.

$ion::
(module _
  (macros
    (macro list_of_nephews () [(.nephews)])
    //                          ^^^^^^^^
    // ERROR: Calls a macro that has not yet been defined in this module.
    (macro nephews () (.values Huey Dewey Louie))
  )
)

Variable expansion

Templates can insert the contents of a macro parameter into their output by using a variable expansion, an s-expression whose first value is the operator % and whose second and final value is the variable name of the parameter to expand.

If the variable name does not match one of the declared macro parameters, the implementation must raise an error.

Grammar

variable-expansion ::= '(%' variable-name ')'

variable-name      ::= ion-identifier

Examples

$ion::
(module _
  (macros
    // Produces a stream that repeats the content of parameter `x` twice.
    (macro twice (x*) (.values (%x) (%x)))
  )
)

(:twice foo)     => foo foo
(:twice "hello") => "hello" "hello"
(:twice 1 2 3)   => 1 2 3 1 2 3

Quasi-literal Ion containers

When an Ion container appears in a template definition, it is interpreted almost literally.

Each nested value in the container is inspected.

If the value is an Ion scalar, it is added to the output as-is.
If the value is a variable expansion, the stream bound to that variable name is added to the output. The variable expansion literal (for example: (%name)) is discarded.
If the value is a macro invocation, the invocation is evaluated and the resulting stream is added to the output. The macro invocation literal (for example: (.name 1 2 3)) is discarded.
If the value is a container, the reader will recurse into the container and repeat this process.

Expansion within a sequence

When the container is a list or s-expression, the values in the nested expression's expansion are spliced into the sequence at the site of the expression. If the expansion was empty, no values are spliced into the container.

$ion::
(module _
  (macros
    (macro bookend_list (x y*) [(%x), (%y), (%x)])
    (macro bookend_sexp (x y*) ((%x) (%y) (%x)))
  )
)

(:bookend_list ! a b c) => ['!', a, b, c, '!']
(:bookend_sexp ! a b c) => (! a b c !)

(:bookend_sexp !) => (! !)

Expansion within a struct

When the container is a struct, the expansion of each field value is paired with the corresponding field name. If the expansion produces a single value, a single field with that name will be spliced into the parent struct. If the expansion produces multiple values, a field with that name will be created for each value and spliced into the parent struct. If the expansion was empty, no fields are spliced into the parent struct.

Examples

$ion::
(module _
  (macros
    (macro resident (id names*)
        {
            town: "Riverside",
            id: (.make_string "123-" (%id)),
            name: (%names)
        }
     )
  )
)

(:resident "abc" "Alice") =>
{
  town: "Riverside",
  id: "123-abc",
  name: "Alice"
}

(:resident "def" "John" "Jacob" "Jingleheimer" "Schmidt") =>
{
  town: "Riverside",
  id: "123-def",
  name: "John",
  name: "Jacob",
  name: "Jingleheimer",
  name: "Schmidt",
}

(:resident "ghi") =>
{
  town: "Riverside",
  id: "123-ghi",
}

Special forms

special-form       ::= '(.' ('$ion::')?  special-form-name expression* ')'

special-form-name  ::= 'for' | 'if_none' | 'if_some' | 'if_single' | 'if_multi' | 'parse_ion' | 'literal'

Special forms are similar to macro invocations, but they have their own expansion rules. See Special forms for the list of special forms and a description of each.

Special Forms

When a TDL expression is syntactically an S-expression and its first element is the symbol ., its next element must be a symbol that matches either a set of keywords denoting the special forms, or the name of a previously-defined macro. The interpretation of the S-expression’s remaining elements depends on how the symbol resolves. In the case of macro invocations, the elements following the operator are arbitrary TDL expressions, but for special forms that is not always the case.

Special forms are "special" precisely because they cannot be expressed as macros and must therefore receive bespoke syntactic treatment. Since the elements of macro-invocation expressions are themselves expressions, when you want something to not be evaluated that way, it must be a special form. Argument expressions are passed to the special form without interpretation, and each special form has custom logic for interpreting its arguments.

// The argument being passed in is the _expansion_ of the foo macro
(.regular_macro (.foo))

// This argument being passed in is literally `( '.' 'foo' )`.
(.special_form (.foo))

These special forms are part of the template language itself, and most are not addressable outside TDL; the E-expression (:if_none foo bar baz) must necessarily refer to some user-defined macro named if_none, not to the special form of the same name. The only exception is parse_ion, which is explicitly included in the system macro table.

`literal`

(literal (values*) /* Not representable in TDL */)

The literal form is an identity function that accepts its arguments as literal values and then produces them without any evaluation. Both literal and values are identity functions, but they differ in regard to how their arguments are interpreted:

// When the arguments are values, literal produces the same result as values
(.literal 1 2 3) ⇒ 1 2 3
(.values 1 2 3)  ⇒ 1 2 3

// When the arguments are TDL macros or special forms, literal produces different results than values 
(.literal (.make_string "a" "b")) ⇒ (.make_string "a" "b")
(.values (.make_string "a" "b"))  ⇒ "ab"

// When the arguments are TDL expression groups, literal produces different results than values
(.literal (.. true false)) ⇒ ( .. true false)
(.values (.. true false))  ⇒ true false

// When the arguments are TDL variable expansions, literal produces different results than values
// Assuming that the variable x is bound to "Hello"
(.literal (%x)) ⇒ ( % x )
(.values (%x))  ⇒ "Hello"

`if_none`

The if_none special form accepts three arguments—stream, true_branch, and false_branch—each of which may be a single value or a stream of zero-to-many values.

The if_none form is if/then/else syntax testing stream emptiness. It has three sub-expressions, the first being a stream to check. If and only if that stream is empty (it produces no values), the second sub-expression is expanded. Otherwise, the third sub-expression is expanded. The expanded second or third sub-expression becomes the result that is produced by if_none.

note

Exactly one branch is expanded, because otherwise the empty stream might be used in a context that requires a value, resulting in an errant expansion error.

(macro temperature (degrees scale?) 
       {
         degrees: (%degrees),
         scale: (.if_none (%scale) K (%scale)),
       })

(:temperature 96 F)     ⇒ {degrees:96,  scale:F}
(:temperature 283 (::)) ⇒ {degrees:283, scale:K}

To refine things a bit further, trailing optional arguments can be omitted entirely:

(:temperature 283) ⇒ {degrees:283, scale:K}

tip

If you're using if_none to specify an expression to default to, you can use the default system macro to be more concise.

(macro temperature (degrees scale)
    {
      degrees: (%degrees),
      scale: (.default (%scale) K),
    }
)

`if_some`

The if_some special form accepts three arguments—stream, true_branch, and false_branch—each of which may be a single value or a stream of zero-to-many values.

If stream evaluates to one or more values, it produces true_branch. Otherwise, it produces false_branch. Exactly one of true_branch and false_branch is evaluated. The stream expression must be expanded enough to determine whether it produces any values, but implementations are not required to fully expand the expression.

Example:

(macro foo (x)
       {
         foo: (.if_some (%x) [(%x)] null)
       })

(:foo (::))     => { foo: null }
(:foo 2)        => { foo: [2] }
(:foo (:: 2 3)) => { foo: [2, 3] }

The false_branch parameter may be elided, allowing if_some to serve as a map-if-not-none function.

Example:

(macro foo (x)
       {
         foo: (.if_some (%x) [(%x)])
       })

(:foo (::))     => { }
(:foo 2)        => { foo: [2] }
(:foo (:: 2 3)) => { foo: [2, 3] }

`if_single`

The if_single special form accepts three arguments—stream, true_branch, and false_branch—each of which may be a single value or a stream of zero-to-many values.

If stream evaluates to exactly one value, if_single produces the expansion of true_branch. Otherwise, it produces the expansion of false_branch. Exactly one of true_branch and false_branch is evaluated. The stream argument must be expanded enough to determine whether it produces exactly one value, but implementations are not required to fully expand the expression.

Example:

(macro foo (x)
       {
         foo: (.if_single (%x) (%x) [(%x)])
       })

(:foo (::))     => { foo: [] }
(:foo 2)        => { foo: 2 }
(:foo (:: 2 3)) => { foo: [2, 3] }

`if_multi`

The if_multi special form accepts three arguments—stream, true_branch, and false_branch—each of which may be a single value or a stream of zero-to-many values.

If stream evaluates to more than one value, it produces true_branch. Otherwise, it produces false_branch. Exactly one of true_branch and false_branch is evaluated. The stream argument must be expanded enough to determine whether it produces more than one value, but implementations are not required to fully expand the expression.

Example:

(macro foo (x)
       {
         foo: (.if_multi (%x) "zero or one" "many")
       })

(:foo (::))     => { foo: "zero or one" }
(:foo 2)        => { foo: "zero or one" }
(:foo (:: 2 3)) => { foo: "many" }

`for`

The for special form maps one or more streams to an output stream.

It accepts two arguments—stream_bindings and template. stream_bindings is a list or s-expression containing one or more s-expressions of the form (name expr0 expr1 ... exprN). The first value is a symbol to act as a variable name. The remaining expressions in the s-expression will be expanded and concatenated into a single stream; for each value in the stream, the for expansion will produce a copy of the template argument expression with any appearance of the variable replaced by the value.

For example:

(.for
  [(word                     // Variable name
   foo bar baz)]             // Values over which to iterate
  (.values (%word) (%word))) // Template expression; `(%word)` will be replaced
=>
foo foo bar bar baz baz

Multiple s-expressions can be specified. The streams will be iterated over in lockstep.

(.for
  ((x 1 2 3)   // for x in...
   (y 4 5 6))  // for y in...
  ((%x) (%y))) // Template; `(%x)` and `(%y)` will be replaced
=>
(1 4)
(2 5)
(3 6)

Iteration will end when the shortest stream is exhausted.

(.for
  [(x 1 2),    // for x in...
   (y 3 4 5)]  // for y in...
  ((%x) (%y))) // Template; `(%x)` and `(%y)` will be replaced
=>
(1 3)
(2 4)
// no more output, `x` is exhausted

Names defined inside a for shadow names in the parent scope.

(macro triple (x)
  //           └─── Parameter `x` is declared here...
  (.for
  //    ...but the `for` expression introduces a
  //  ┌─── new variable of the same name here.
    ((x a b c))
    (%x)
  //  └─── This refers to the `for` expression's `x`, not the parameter.
  )
)
(:triple 1) // Argument `1` is ignored
=>
a b c

The for special form can only be invoked in the body of template macro. It is not valid to use as an E-Expression.

`parse_ion`

Ion documents may be embedded in other Ion documents using the parse_ion form.

The parse_ion form accepts a single argument that must be a literal string or blob. It constructs a stream of values by parsing its argument as a single, self-contained Ion document.

The argument must be a literal value because macros are not allowed to contain recursive calls, and composing an embedded document from multiple expressions would make it possible to implement recursion in the macro system.

The data argument is evaluated in a clean environment that cannot read anything from the parent document. Allowing context to leak from the outer scope into the document being parsed would also enable recursion.

All values produced by the expansion of parse_ion are application values. (i.e. it is as if they are all annotated with $ion_literal.)

The IVM at the beginning of an Ion data stream is sufficient to identify whether it is text or binary, so text Ion can be embedded as a blob containing the UTF-8 encoded text.

Embedded text example:

(:parse_ion
    '''
    $ion_1_1
    $ion::(module _ (symbols ["foo" "bar"]]))
    $1 $2
    '''
)
=> foo bar

Embedded binary example:

(:parse_ion {{ 4AEB6qNmb2+jYmFy }} )
=> foo bar

The parse_ion form has an address in the system macro table, making it the only special form that can be invoked as an e-expression.

For normative examples, see parse_ion in the Ion conformance test suite.

System Macros

Many of the system macros MAY be defined as template macros, and when possible, the specification includes a template. Templates are given here as normative example, but system macros are not required to be implemented as template macros.

The macros that can be defined as templates are included as system macros because of their broad applicability, and so that Ion implementations can provide optimizations for these macros that run directly in the implementations' runtime environments rather than in the macro evaluator. For example, a macro such as add_symbols does not produce user values, so an Ion Reader could bypass evaluating the template and directly update the encoding context with the new symbols.

For normative examples, see system_macros in the Ion conformance test suite.

Stream Constructors

`none`

(macro none () (.values))

none accepts no values and produces nothing (an empty stream).

`values`

(macro values (v*) v)

This is, essentially, the identity function. It produces a stream from any number of arguments, concatenating the streams produced by the nested expressions. Used to aggregate multiple values or sub-streams to pass to a single argument, or to produce multiple results.

`default`

(macro default (expr* default_expr*)
    // If `expr` is empty...
    (.if_none (%expr)
        // then expand `default_expr` instead.
        (%default_expr)
        // If it wasn't empty, then expand `expr`.
        (%expr)
    )
)

default tests expr to determine whether it expands to the empty stream. If it does not, default will produce the expansion of expr. If it does, default will produce the expansion of default_expr instead.

`flatten`

(macro flatten (sequence*) /* Not representable in TDL */)

The flatten system macro constructs a stream from the content of one or more sequences.

Produces a stream with the contents of all the sequence values. Any annotations on the sequence values are discarded. Any non-sequence arguments will raise an error. Any null arguments will be ignored.

Examples:

(:flatten [a, b, c] (d e f))       => a b c d e f
(:flatten [[], null.list] foo::()) => [] null.list

The flatten macro can also be used to splice the content of one list or s-expression into another list or s-expression.

[1, 2, (:flatten [a, b]), 3, 4] => [1, 2, a, b, 3, 4]

`parse_ion`

parse_ion is a special form because (unlike macros) its argument must specifically be a literal value. However, because of its usefulness for embedding an Ion stream in another Ion stream, it has an address in the system macro table.

See Special forms: parse_ion.

Value Constructors

`annotate`

(macro annotate (ann* value) /* Not representable in TDL */)

Produces the value prefixed with the annotations anns¹. Each ann must be a non-null, unannotated string or symbol.

(:annotate (: "a2") a1::true) => a2::a1::true

`make_string`

(macro make_string (content*) /* Not representable in TDL */)

Produces a non-null, unannotated string containing the concatenated content produced by the arguments. Nulls (of any type) are forbidden. Any annotations on the arguments are discarded.

`make_symbol`

(macro make_symbol (content*) /* Not representable in TDL */)

Produces a non-null, unannotated symbol containing the concatenated content produced by the arguments. Nulls (of any type) are forbidden. Any annotations on the arguments are discarded.

`make_blob`

(macro make_blob (lobs*) /* Not representable in TDL */)

Produces a non-null, unannotated blob containing the concatenated content produced by the arguments. Nulls (of any type) are forbidden. Any annotations on the arguments are discarded.

`make_list`

(macro make_list (sequences*) [ (.flatten sequences) ])

Produces a non-null, unannotated list by concatenating the content of any number of non-null list or sexp inputs.

(:make_list)                  => []
(:make_list (1 2))            => [1, 2]
(:make_list (1 2) [3, 4])     => [1, 2, 3, 4]
(:make_list ((1 2)) [[3, 4]]) => [(1 2), [3, 4]]

`make_sexp`

(macro make_sexp (sequences*) ( (.flatten sequences) ))

Produces a non-null, unannotated sexp by concatenating the content of any number of non-null list or sexp inputs.

(:make_sexp)                  => ()
(:make_sexp (1 2))            => (1 2)
(:make_sexp (1 2) [3, 4])     => (1 2 3 4)
(:make_sexp ((1 2)) [[3, 4]]) => ((1 2) [3, 4])

`make_struct`

(macro make_struct (structs*) /* Not representable in TDL */)

Produces a non-null, unannotated struct by combining the fields of any number of non-null structs.

(:make_struct)    => {}
(:make_struct
  {k1: 1, k2: 2}
  {k3: 3}
  {k4: 4})        => {k1:1, k2:2, k3:3, k4:4}

`make_field`

(macro make_field (field_name value) /* Not representable in TDL */)

Produces a non-null, unannotated, single-field struct using the given field name and value.

The field_name parameter may be (or evaluate to) any non-null text value, and the value parameter may be (or evaluate to) any single value.

This can be used to dynamically construct field names based on macro parameters.

Example:

(macro foo_struct (extra_name extra_value)
       (make_struct 
         {
           foo_a: 1,
           foo_b: 2,
         }
         (make_field (make_string "foo_" (%extra_name)) (%extra_value))
       ))

Then:

(:foo_struct c 3) => { foo_a: 1, foo_b: 2, foo_c: 3 }

`make_decimal`

(macro make_decimal (coefficient exponent) /* Not representable in TDL */)

This is no more compact than the regular binary encoding for decimals. However, it can be used in conjunction with other macros, for example, to represent fixed-point numbers.

Both coefficient and exponent must be (or evaluate to) a single integer value.

(macro usd (cents) (.annotate USD (.make_decimal cents -2))

(:usd 199) =>  USD::1.99

note

It is not possible to use make_decimal to construct any negative zero value because Ion integers do not have signed zero.

`make_timestamp`

(macro make_timestamp (year month? day? hour? minute? second? offset_minutes?) /* Not representable in TDL */)

Produces a non-null, unannotated timestamp at various levels of precision. When offset is absent, the result has unknown local offset; offset 0 denotes UTC.

The make_timestamp macro has rules that cannot be expressed in the macro signature because it must construct a valid Ion timestamp value.

The arguments to this macro may not be any null value. The evaluated argument for the year parameter must be an integer from 1 to 9999 inclusive. The evaluated argument for the month parameter, if present, must be an integer from 1 to 12 inclusive. The evaluated argument for the day parameter, if present, must be an integer that is a valid, 1-indexed day for the given month. The evaluated argument for the hour parameter, if present, must be an integer from 0 to 23 inclusive. The evaluated argument for the day parameter, if present, must be an integer from 0 to 59 inclusive. The evaluated argument for the second parameter, if present, must be a decimal or integer value that is greater than or equal to zero and less than 60. The evaluated arguments for all other parameters, if present, must be integer values.

The offset_minutes and hour parameters may only be present if minute is present. Aside from offset_minutes, if any evaluated argument is present, the evaluated arguments for all parameters to the left must also be present. The precision of the constructed timestamp is determined by which parameters have non-empty arguments.

Example:

(macro ts_today 
       (uint8::hour uint8::minute uint32::seconds_millis)
       (.make_timestamp
         2022
         4
         28
         hour
         minute
         (.make_decimal (%seconds_millis) -3) 0))

Encoding Utility Macros

`repeat`

The repeat system macro can be used for efficient run-length encoding.

(macro repeat (n! value*) /* Not representable in TDL */)

Produces a stream that repeats the specified value expression(s) n times.

The evaluated argument for n must be a non-null integer value that is equal to or greater than zero.

(:repeat 5 0)          => 0 0 0 0 0
(:repeat 2 true false) => true false true false

`delta`

(macro delta (deltas*) /* Not representable in TDL */)

The delta system macro can be used for directed delta encoding. It produces a stream that is equal in length to the deltas argument, defined by the recurrence relation:

output₀ = delta₀
outputₙ₊₁ = outputₙ + deltaₙ₊₁

Example:

(:delta 1000 1 2 3 -4) => 1000 1001 1003 1006 1002

`sum`

(macro sum (a b) /* Not representable in TDL */)

Produces the sum of two non-null integer arguments.

Examples:

(:sum 1 2) => 3

`meta`

(macro meta (anything*) (.none))

The meta macro accepts any values and emits nothing. It allows writers to encode data that will be not be surfaced to most readers. Readers can be configured to intercept calls to meta, allowing them to read the otherwise invisible data.

When transcribing from one format to another, writers should preserve invocations of meta when possible.

Example:

(:values
    (:meta {author: "Mike Smith", email: "mikesmith@example.com"})
    {foo:2,foo:1}
)
=>
{foo:2,foo:1}

Updating the Encoding Context

These macros are defined in terms of templates, but they are not necessarily implemented as template macros. Each of these macros produces only system values and may only be invoked where system values can occur (i.e. at the top level of a data stream). None of these macros may be invoked in TDL.