Implementation Considerations

This is a collection of thoughts and learnings that were gathered over the course of creating and maintaining ion-schema-kotlin and ion-schema-rust. They have been collected here in the hope that they will be useful for anyone creating a new implementation of Ion Schema or considering writing a related tool or library. This document is not an authoritative document, and implementers are not required to follow any of the suggestions presented here.

General Notes

Modeling the Ion Schema Language

The only model that is required by the specification is the Ion Schema Language itself. The original Kotlin implementation of Ion Schema only operated on the ISL Ion. Consequently, the parsing and evaluation of a schema were tightly coupled, making it more difficult to support multiple versions of ISL. Furthermore, programmatic manipulation of schemas was difficult because clients of the library had to manipulate the Ion DOM (or even Ion text) in order to construct or modify a schema. Therefore, it is highly recommended to implement at least one of ISL Data Objects or an internal AST.

ISL Data Objects

Internal Representation

Built-in types

It may be advantageous to hard-code only the most minimal set of types possible, and allow the remaining built-in types to be defined using ISL. (One potentially useful strategy is to have an IonSchemaCoreTypesAuthority where the schema id is the Ion Schema Version Marker. That way, if the built-in types ever change, it’s easy for an implementation to correctly load the right core types.)

As of Ion Schema 1.0 and 2.0, the hard coded types must be document and all of the Ion types ($null, $bool, $int, $float, $decimal, $string, $symbol, $blob, $clob, $timestamp, $list, $sexp, and $struct.)

The remaining built-in types can be defined as follows:

// The top type
type::{
  name: $any,
}

// The bottom type
type::{
  name: nothing,
  valid_values: [],
}

// Non-null variants of the Ion types; repeat similarly for each type
type::{
  name: bool,
  type: $bool,
  not: { valid_values: [null.bool] }
}

// Union types; repeat similarly for lob, $text, text, $number, number, any
type::{
  name: $lob,
  one_of: [ $blob, $clob ],
}

Caching Imports

It is possible to have an import graph like this (imports are down, so A imports B, etc.)

             A
            / \
           B   C
          / \ / 
         D   E
             |
             F
            /|\
           G H I

Here, we have two paths from A to E and everything that E imports. Therefore, we need to cache a schema by its schemaId when we load it, even it is not loaded directly by the user. If we don’t, then we risk performing double (or more) I/O when there is more than one path to import a given dependency.

Ranges

Ranges in Ion Schema can be confusing to implement. While there are many types of ranges that might appear similar, there are fundamentally two distinct types of ranges.

Discrete Range

Discrete (or integral) ranges are ranges over integers and integer-like things (such as timestamp precision, which is essentially an integer wrapped by an enum). These ranges are compared against some property of a value and are used for scale (ISL 1.0), exponent (ISL 2.0), precision, timestamp_precision, timestamp_offset, byte_length, container_length, codepoint_length, utf8_byte_length, and occurs.

Value Range

Value ranges are ranges over Ion values. These ranges are compared against other Ion values, and are only used with the valid_values constraint. They are further subdivided into NumberRange and TimestampRange.

The top type $any

In Ion Schema 2.0, every type implicitly starts out as the top type (i.e. type: $any). You should not actually add that in your implementation, since it is identical to a type definition with no constraints.

In other words, do not add a type: $any to the internal model of a type definition because that would be redundant.