Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Symbols

Symbols With Inline Text

If the high nibble of the opcode is 0xA_, it represents a symbol whose text follows the opcode. The low nibble of the opcode indicates how many UTF-8 bytes follow. Opcode 0xA0 represents a symbol with empty text ('').

0x8F 0x07 represents null.symbol.

Encoding of a symbol with empty text ('')
┌──── Opcode in range A0-AF indicates a symbol with inline text
│┌─── Low nibble 0 indicates that no UTF-8 bytes follow
A0
Encoding of a symbol with 14 bytes of inline text
┌──── Opcode in range A0-AF indicates a symbol with inline text
│┌─── Low nibble E indicates that 14 UTF-8 bytes follow
││  f  o  u  r  t  e  e  n     b  y  t  e  s
AE 66 6F 75 72 74 65 65 6E 20 62 79 74 65 73
   └──────────────────┬────────────────────┘
                 UTF-8 bytes
Encoding of a symbol with 24 bytes of inline text
┌──── Opcode F9 indicates a variable-length symbol with inline text
│  ┌─── Length: FlexUInt 24
│  │   v  a  r  i  a  b  l  e     l  e  n  g  t  h     e  n  c  o  d  i  n  g
F9 31 76 61 72 69 61 62 6C 65 20 6C 65 6E 67 74 68 20 65 6E 63 6f 64 69 6E 67
      └────────────────────────────────┬────────────────────────────────────┘
                                  UTF-8 bytes
Encoding of null.symbol
┌──── Opcode 0x8F indicates a typed null; a byte follows specifying the type
│  ┌─── Null type: symbol
│  │
8F 07

Symbols With a Symbol Address

Symbol values whose text can be found in the local symbol table are encoded using opcodes 0x50 through 0x57.

The opcodes 0x50 through 0x57 share the same 5 most-significant bits. The 3 least-significant bits are used as the 3 least-significant bits of the symbol ID. The opcode is followed by a FlexUInt, which, once decoded, represents the most-significant bits of the symbol ID.

To get the symbol ID from the opcode and FlexUInt is simple, and can be implemented using bitwise operations or simple arithmetic operations.

// Given an `opcode` and `flexUInt`...
let lsb = opcode & 0b111       // or opcode - 0x50
let msb = flexUInt << 3        // or flexUInt * 8
let symbolId = msb | lsb       // or msb + lsb

The reverse transformation is also simple:

// Given `symbolId`...
let opcode = 0x50 | (symbolId & 0b111)   // or 0x50 + (symbolId % 8)
let flexUInt = symbolId >>> 3            // or symbolId / 8 

The number of bytes required to encode symbol addresses is as follows:

SID RangeEncoded size, including opcode
$0..$10232
$1024..$1310713
$131072..$167772154
$16777216..$21474836475

This table only goes to ~2 billion, but the encoding itself does not have a limit on the number of symbol IDs. However, most Ion implementations will have some upper bound on the number of symbols that depends on the implementation language and/or the underlying hardware.

Encoding of symbol with SID 1 ($ion)
┌──── Opcode 0x51 indicates a symbol with SID; low 3 bits = 1
│  ┌─── FlexUInt 0 represents the high bits (0 << 3 = 0)
│  │
51 01
Encoding of symbol with SID 10
┌──── Opcode 0x52 indicates a symbol with SID; low 3 bits = 2  
│  ┌─── FlexUInt 1 represents the high bits (1 << 3 = 8)
│  │
52 03
Encoding of symbol with SID 1000
┌──── Opcode 0x50 indicates a symbol with SID; low 3 bits = 0
│  ┌─── FlexUInt 125 represents the high bits (125 << 3 = 1000)
│  │
50 FB 01