Encoding Expressions

note

This chapter focuses on the binary encoding of e-expressions. Macros by example explains what they are and how they are used.

E-expression with the address in the opcode

If the value of the opcode is less than 64 (0x40), it represents an E-expression invoking the macro at the corresponding address—-an offset within the local macro table.

Invocation of macro address 7

┌──── Opcode in 00-3F range indicates an e-expression
│     where the opcode value is the macro address
│
07
└── FixedUInt 7

Invocation of macro address 31

┌──── Opcode in 00-3F range indicates an e-expression
│     where the opcode value is the macro address
│
1F
└── FixedUInt 31

Note that the opcode alone tells us which macro is being invoked, but it does not supply enough information for the reader to parse any arguments that may follow. The parsing of arguments is described in detail in the section Macro calling conventions. (TODO: Link)

E-expressions with biased FixedUInt addresses

While E-expressions invoking macro addresses in the range [0, 63] can be encoded in a single byte using E-expressions with the address in the opcode, many applications will benefit from defining more than 64 macros. The 0x4_ and 0x5_ opcodes can be used to represent macro addresses up to 1,052,734. In both encodings, the address is biased by the total number of addresses with lower opcodes.

If the high nibble of the opcode is 0x4_, then a biased address follows as a 1-byte FixedUInt. For 0x4_, the bias is 256 * low_nibble + 64 (or (low_nibble << 8) + 64).

If the high nibble of the opcode is 0x5_, then a biased address follows as a 2-byte FixedUInt.

For 0x5_, the bias is 65536 * low_nibble + 4160 (or (low_nibble << 16) + 4160)

Invocation of macro address 841

┌──── Opcode in range 40-4F indicates a macro address with 1-byte FixedUInt address
│┌─── Low nibble 3 indicates bias of 832
││
43 09
   │
   └─── FixedUInt 9

Biased Address : 9
Bias : 832
Address : 841

Invocation of macro address 142918

┌──── Opcode in range 50-5F indicates a macro address with 2-byte FixedUInt address
│┌─── Low nibble 2 indicates bias of 135232
││
52 06 1E
   └─┬─┘
     └─── FixedUInt 7686

Biased Address : 7686
Bias : 135232
Address : 142918

Macro address range biases for 0x4_ and 0x5_

Low Nibble0x4_ Bias0x5_ Bias
0644160
132069696
2576135232
3832200768
41088266304
51344331840
61600397376
71856462912
82112528448
92368593984
A2624659520
B2880725056
C3136790592
D3392856128
E3648921664
F3904987200

E-expression with the address as a trailing FlexUInt

The opcode 0xF4 indicates an e-expression whose address is encoded as a trailing FlexUInt with no bias. This encoding is less compact for addresses that can be encoded using opcodes 0x5F and below, but it is the only encoding that can be used for macro addresses greater than 1,052,734.

Invocation of macro address 4
┌──── Opcode F4 indicates an e-expression with a trailing `FlexUInt` macro address
│
│
F4 09
   │
   └─── FlexUInt 4
Invocation of macro address 1_100_000
┌──── Opcode F4 indicates an e-expression with a trailing `FlexUInt` macro address
│
│
F4 04 47 86
   └──┬───┘
      └─── FlexUInt 1,100,000

System Macro Invocations

E-expressions that invoke a system macro can be encoded using the 0xEF opcode followed by a 1-byte FixedUInt representing an index in the system macro table.

Encoding of the system macro values
┌──── Opcode 0xEF indicates a system symbol or macro invocation
│  ┌─── FixedInt 1 indicates macro 1 from the system macro table
│  │
EF 01

In addition, system macros MAY be invoked using any of the 0x00-0x5F or 0xF4-0xF5 opcodes, provided that the macro being invoked has been given an address in user macro address space.

E-expression argument encoding

The example invocations in prior sections have demonstrated how to encode an invocation of the simplest form of macro--one with no parameters. This section explains how to encode macro invocations when they take parameters of different encodings and cardinalities.

To begin, we will examine how arguments are encoded when all of the macro's parameters use the tagged encoding and have a cardinality of exactly-one.

Tagged encoding

When a macro parameter does not specify an encoding (the parameter name is not annotated), arguments passed to that parameter use the 'tagged' encoding. The argument begins with a leading opcode that dictates how to interpret the bytes that follow.

This is the same encoding used for values in other Ion 1.1 contexts like lists, s-expressions, or at the top level.

Encoding a single exactly-one argument

A parameter with a cardinality of exactly-one expects its corresponding argument to be encoded as a single expression of the parameter's declared encoding. (The following section will explore the available encodings in greater depth; for now, our examples will be limited to parameters using the tagged encoding.)

When the macro has a single exactly-one parameter, the corresponding encoded argument follows the opcode and (if separate) the encoded address.

Example encoding of an e-expression with a tagged, exactly-one argument

Macro definition
(:set_macros
  (foo (x) /*...*/)
)
Text e-expression
(:foo 1)
Binary e-expression with the address in the opcode
┌──── Opcode 0x00 is less than 0x40; this is an e-expression invoking
│     the macro at address 0.
│    ┌─── Argument 'x': opcode 0x61 indicates a 1-byte integer (1)
│  ┌─┴─┐
00 61 01
Binary e-expression using a trailing FlexUInt address
┌──── Opcode F4: An e-expression with a trailing FlexUInt address
│  ┌──── FlexUInt 0: Macro address 0
│  │    ┌─── Argument 'x': opcode 0x61 indicates a 1-byte integer (1)
│  │  ┌─┴─┐
F4 01 61 01

Encoding multiple exactly-one arguments

If the macro has more than one parameter, a reader would iterate over the parameters declared in the macro signature from left to right. For each parameter, the reader would use the parameter's declared encoding to interpret the next bytes in the stream. When no more parameters remain, parsing of the e-expression's arguments is complete.

Example encoding of an e-expression with multiple tagged, exactly-one arguments

Macro definition
(:set_macros
  (foo (a b c) /*...*/)
)
Text e-expression
(:foo 1 2 3)
Binary e-expression with the address in the opcode
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│    ┌─── Argument 'a': opcode 0x61 indicates a 1-byte integer (1)
│    │     ┌─── Argument 'b': opcode 0x61 indicates a 1-byte integer (2)
│    │     │     ┌─── Argument 'c': opcode 0x61 indicates a 1-byte integer (3)
│  ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
00 61 01 61 02 61 03
Binary e-expression using a trailing FlexUInt address
┌──── Opcode F4: An e-expression with a trailing FlexUInt address
│  ┌──── FlexUInt 0: Macro address 0
│  │    ┌─── Argument 'a': opcode 0x61 indicates a 1-byte integer (1)
│  │    │     ┌─── Argument 'b': opcode 0x61 indicates a 1-byte integer (2)
│  │    │     │     ┌─── Argument 'c': opcode 0x61 indicates a 1-byte integer (3)
│  │    │     │     │
│  │  ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
F4 01 61 01 61 02 61 03

Tagless Encodings

In contrast to the tagged encoding, tagless encodings do not begin with an opcode. This means that they are potentially more compact than a tagged type, but are also less flexible. Because tagless encodings do not have an opcode, they cannot represent E-expressions, annotation sequences, or null values of any kind.

Tagless encodings are comprised of the primitive encodings and macro shapes.

Primitive encodings

Primitive encodings are self-delineating, either by having a statically known size in bytes or by including length information in their serialized form.

Ion typePrimitive encodingSize in bytesEncoding
intuint81FixedUInt
uint162
uint324
uint648
flex_uintvariableFlexUInt
int81FixedInt
int162
int324
int648
flex_intvariableFlexInt
floatfloat162Little-endian IEEE-754 half-precision float
float324Little-endian IEEE-754 single-precision float
float648Little-endian IEEE-754 double-precision float
symbolflex_symvariableFlexSym

Example encoding of an e-expression with primitive, exactly-one arguments

As first demonstrated in Encoding multiple exactly-one arguments, the bytes of the serialized arguments begin immediately after the e-expression's opcode and (if separate) the macro address. The reader iterates over the parameters in the macro signature in the order they are declared. For each parameter, the reader uses the parameter's declared encoding to interpret the next bytes in the stream. When no more parameters remain, parsing is complete.

Macro definition
(:set_macros
  (foo (flex_uint::a int8::b uint16::c) /*...*/)
)
Text e-expression
(:foo 1 2 3)
Binary e-expression with the address in the opcode
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌─── Argument 'a': FlexUInt 1
│  │  ┌─── Argument 'b': 1-byte FixedInt 2
│  │  │    ┌─── Argument 'c': 2-byte FixedUInt 3
│  │  │  ┌─┴─┐
00 03 02 03 00
Binary e-expression using a trailing FlexUInt address
┌──── Opcode F4: An e-expression with a trailing FlexUInt address
│  ┌──── FlexUInt 0: Macro address 0
│  │  ┌─── Argument 'a': FlexUInt 1
│  │  │  ┌─── Argument 'b': 1-byte FixedInt 2
│  │  │  │    ┌─── Argument 'c': 2-byte FixedUInt 3
│  │  │  │  ┌─┴─┐
F4 01 03 02 03 00

Macro shapes

The term macro shape describes a macro that is being used as the encoding of an E-expression argument. A parameter using a macro shape as its encoding is sometimes called a macro-shaped parameter. For example, consider the following two macro definitions.

The point2D macro takes two flex_int-encoded values as arguments.

(macro point2D (flex_int::x flex_int::y)
  {
    x: (%x),
    y: (%y),
  }
)

The line macro takes a pair of point2D invocations as arguments.

(macro line (point2D::start point2D::end)
  {
    start: (%start),
    end: (%end),
  }
)

Normally an e-expression would begin with an opcode and an address communicating what comes next. However, when we're reading the argument for a macro-shaped parameter, the macro being invoked is inferred from the parent macro signature instead. As such, there is no need to include an opcode or address.

┌──── Opcode 0x01 is less than 0x40; this is an e-expression
│     invoking the macro at address 1: `line`
│    ┌─── Argument $start: an implicit invocation of macro `point2D`
│    │     ┌─── Argument $end: an implicit invocation of macro `point2D`
│  ┌─┴─┐ ┌─┴─┐
00 03 05 07 09
   │  │  │  └────   $end/$y: FlexInt 4
   │  │  └───────   $end/$x: FlexInt 3
   │  └────────── $start/$y: FlexInt 2
   └───────────── $start/$x: FlexInt 1

Any macro can be used as a macro shape except for constants--macros which take zero parameters. Constants cannot be used as a macro shape because their serialized representation would be empty, making it impossible to encode them in expression groups. However, this limitation does not sacrifice any expressiveness; the desired constant can always be invoked directly in the body of the macro.

(:add_macros
  // Defines a constant 'hostname'
  (hostname () "abc123.us_west.example.com")

  (http_ok (hostname::server page)
  //           └── ERROR: cannot use a constant as a macro shape
     {
        server: (%server),
        page: (%page),
        message: OK,
        status: 200,
     }
  )

  (http_ok (page)
    {
      server: (.hostname),
      //           └── OK: invokes constant as needed
      page: (%page),
      message: OK,
      status: 200,
    }
  )
)

Encoding variadic arguments

The preceding sections have described how to (de)serialize the various parameter encodings, but these parameters have always had the same cardinality: exactly-one.

This section explains how to encode e-expressions invoking a macro whose signature contains variadic parameters--parameters with a cardinality of zero-or-one, zero-or-more, or one-or-more.

Argument Encoding Bitmap (AEB)

If a macro signature has one or more variadic parameters, then e-expressions invoking that macro will include an additional construct: the Argument Encoding Bitmap (AEB). This little-endian byte sequence precedes the first serialized argument and indicates how each argument corresponding to a variadic parameter has been encoded.

Each variadic parameter in the signature is assigned two bits in the AEB. This means that the reader can statically determine how many AEB bytes to expect in the e-expression by examining the signature.

Number of variadic parametersAEB byte length
00
1 to 41
5 to 82
9 to 123
Nceiling(N/4)

Bits in the AEB are assigned from least significant to most significant and correspond to the variadic parameters in the signature from left to right. This allows the reader to right-shift away the bits of each variadic parameter when its corresponding argument has been read.

Example SignatureAEB Layout
()<No variadics, no AEB>
(a b c)<No variadics, no AEB>
(a b c?)------cc
(a b* c?)----ccbb
(a+ b* c?)--ccbbaa
(a+ b c?)----ccaa
(a+ b* c? d*)ddccbbaa
(a+ b* c? d* e)ddccbbaa
(a+ b* c? d* e f?)ddccbbaa ------ff
(a+ b* c? d* e+ f?)ddccbbaa ----ffee

Each pair of bits in the AEB indicates what kind of expression to expect in the corresponding argument position.

Bit sequenceMeaning?*+
00An empty stream. No bytes are present in the corresponding argument position.
01A single expression of the declared encoding is present in the corresponding argument position.
10A expression group of the declared encoding is present in the corresponding argument position.
11Reserved. A bitmap entry with this bit sequence is illegal in Ion 1.1.

As noted in the table above:

  • An empty stream (00) cannot be used to encode an argument for a parameter with a cardinality of one-or-more.
  • An expression group (10) cannot be used to encode an argument for a parameter with a cardinality of zero-or-one.

Expression groups

This section describes the encoding of an expression group. For an explanation of what an expression group is and how to use it, see Expression groups.

An expression group begins with a FlexUInt. If the FlexUInt's value is:

  • greater than zero, then it represents the number of bytes used to encode the rest of the expression group. The reader should continue reading expressions of the declared encoding until that number of bytes has been consumed.
  • zero, then it indicates that this is a delimited expression group and the processing varies according to whether the declared encoding is tagged or tagless. If the encoding is:
    • tagged, then each expression in the group begins with an opcode. The reader must consume tagged expressions until it encounters a terminating END opcode (0xF0).
    • tagless, then the expression group is a delimited sequence of 'chunks' that each have a FlexUInt length prefix and a body comprised of one or more expressions of the declared encoding. The reader will continue reading chunks until it encounters a length prefix of FlexUInt 0 (0x01), indicating the end of the chunk sequence. Each chunk in the sequence must be self-contained; an expression of the declared encoding may not be split across multiple chunks. See Example encoding of tagless zero-or-more with delimited expression group for an illustration.

tip

While it is legal to write an empty expression group for zero-or-more parameters, it is always more efficient to set the parameter's AEB bits to 00 instead.

Example encoding of tagged zero-or-one with empty group

(:add_macros
  (foo (a?) /*...*/)
)
(:foo) // `a` is implicitly empty
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa
│  │     a=00, empty expression group
00 00

Example encoding of tagged zero-or-one with single expression

(:add_macros
  (foo (a?) /*...*/)
)
(:foo 1)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=01, single expression
│  │    ┌──── Argument 'a': opcode 0x61 indicates a 1-byte int (1)
│  │  ┌─┴─┐
00 01 61 01

Example encoding of tagged zero-or-more with empty group

(:add_macros
  (foo (a*) /*...*/)
)
(:foo) // `a` is implicitly empty
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=00, empty expression group
│  │
00 00

Example encoding of tagged zero-or-more with single expression

(:add_macros
  (foo (a*) /*...*/)
)
(:foo 1)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=01, single expression
│  │    ┌──── Argument 'a': opcode 0x61 indicates a 1-byte int (1)
│  │  ┌─┴─┐
00 01 61 01

Example encoding of tagged zero-or-more with expression group

(:add_macros
  (foo (a*) /*...*/)
)
(:foo 1 2 3)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=10, expression group
│  │  ┌──── FlexUInt 6: 6-byte expression group
│  │  │    ┌──── Opcode 0x61 indicates a 1-byte int (1)
│  │  │    │     ┌──── Opcode 0x61 indicates a 1-byte int (2)
│  │  │    │     │     ┌─── Opcode 0x61 indicates a 1-byte int (3)
│  │  │  ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
00 02 0D 61 01 61 02 61 03
         └───────┬───────┘
      6-byte expression group body

Example encoding of tagged zero-or-more with delimited expression group

(:add_macros
  (foo (a*) /*...*/)
)
(:foo 1 2 3)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=10, expression group
│  │  ┌──── FlexUInt 0: delimited expression group
│  │  │    ┌──── Opcode 0x61 indicates a 1-byte int (1)
│  │  │    │     ┌──── Opcode 0x61 indicates a 1-byte int (2)
│  │  │    │     │     ┌─── Opcode 0x61 indicates a 1-byte int (3)
│  │  │    │     │     │   ┌─── Opcode 0xF0 is delimited end
│  │  │  ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ │
00 02 01 61 01 61 02 61 03 F0
         └───────┬───────┘
        expression group body

Example encoding of tagged one-or-more with single expression

(:add_macros
  (foo (a+) /*...*/)
)
(:foo 1)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=01, single expression
│  │  ┌──── Argument 'a': opcode 0x61 indicates a 1-byte int
│  │  │   1
00 01 61 01

Example encoding of tagged one-or-more with expression group

(:add_macros
  (foo (a+) /*...*/)
)
(:foo 1 2 3)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=10, expression group
│  │  ┌──── FlexUInt 6: 6-byte expression group
│  │  │  ┌──── Opcode 0x61 indicates a 1-byte int
│  │  │  │     ┌──── Opcode 0x61 indicates a 1-byte int
│  │  │  │     │     ┌─── Opcode 0x61 indicates a 1-byte int
│  │  │  │   1 │  2  │   3
00 02 0D 61 01 61 02 61 03
         └───────┬───────┘
      6-byte expression group body

Example encoding of tagged one-or-more with delimited expression group

(:add_macros
  (foo (a+) /*...*/)
)
(:foo 1 2 3)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=10, expression group
│  │  ┌──── FlexUInt 0: delimited expression group
│  │  │  ┌──── Opcode 0x61 indicates a 1-byte int
│  │  │  │     ┌──── Opcode 0x61 indicates a 1-byte int
│  │  │  │     │     ┌─── Opcode 0x61 indicates a 1-byte int
│  │  │  │     │     │      ┌─── Opcode 0xF0 is delimited end
│  │  │  │   1 │  2  │   3  │
00 02 01 61 01 61 02 61 03 F0
         └───────┬───────┘
        expression group body

Example encoding of tagless zero-or-more with expression group

(:add_macros
  (foo (uint8::a*) /*...*/)
)
(:foo 1 2 3)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=10, expression group
│  │  ┌──── FlexUInt 3: 3-byte expression group
│  │  │  ┌──── uint8 1
│  │  │  │  ┌──── uint8 2
│  │  │  │  │  ┌─── uint8 3
│  │  │  │  │  │
00 02 07 01 02 03
         └──┬───┘
   expression group body

Example encoding of tagless zero-or-more with delimited expression group

(:add_macros
  (foo (uint8::a*) /*...*/)
)
(:foo 1 2 3)
┌──── Opcode 0x00 is less than 0x40; this is an e-expression
│     invoking the macro at address 0.
│  ┌──── AEB: 0b------aa; a=10, expression group
│  │  ┌──── FlexUInt 0: Delimited expression group
│  │  │  ┌──── FlexUInt 3: 3-byte chunk of uint8 expressions
│  │  │  │            ┌──── FlexUInt 2: 2-byte chunk of uint8 expressions
│  │  │  │            │       ┌──── FlexUInt 0: End of group
│  │  │  │            │       │
00 02 01 07 01 02 03 05 04 05 01
            └──┬───┘    └─┬─┘
            chunk 1    chunk 2