Escape Characters

Strings and Symbols

The Ion text format supports unicode escape sequences only within quoted strings and symbols. Ion supports most of the escape sequences defined by C++, Java, and JSON.

The following sequences are allowed:

Unicode Code PointIon EscapeMeaning
U+0000\0NUL
U+0007\aalert BEL
U+0008\bbackspace BS
U+0009\thorizontal tab HT
U+000A\nlinefeed LF
U+000B\vvertical tab VT
U+000C\fform feed FF
U+000D\rcarriage return CR
U+0022\"double quote
U+0027\'single quote
U+002F\/forward slash
U+003F\?question mark
U+005C\\backslash
nothing\NLescaped NL expands to nothing
U+00HH\xHH2-digit hexadecimal Unicode code point
U+HHHH\uHHHH4-digit hexadecimal Unicode code point
U+HHHHHHHH\UHHHHHHHH8-digit hexadecimal Unicode code point

Any other sequence following a backslash is an error.

Note that Ion does not support the following escape sequences:

  • Java's extended Unicode markers, e.g., "\uuuXXXX"
  • General octal escape sequences, \OOO

Clobs

The rules for the quoted strings within a clob follow similarly to the string type, with the following exceptions. Unicode newline characters in long strings and all verbatim ASCII characters are interpreted as their ASCII octet values. Non-printable ASCII and non-ASCII Unicode code points are not allowed un-escaped in the string bodies. Furthermore, the following table describes the clob string escape sequences that have direct octet replacement for both all strings.

OctetIon EscapeMeaning
0x00\0NUL
0x07\aalert BEL
0x08\bbackspace BS
0x09\thorizontal tab HT
0x0A\nlinefeed LF
0x0B\vvertical tab VT
0x0C\fform feed FF
0x0D\rcarriage return CR
0x22\"double quote
0x27\'single quote
0x2F\/forward slash
0x3F\?question mark
0x5C\\backslash
0xHH\xHH2-digit hexadecimal octet
nothing\NLescaped NL expands to nothing

The clob escape \x must be followed by two hexadecimal digits. Note that clob does not support the \u and \U escapes since it represents an octet sequence and not a Unicode encoding.

It is important to note that clob is a binary type that is designed for binary values that are either text encoded in a code page that is ASCII compatible or should be octet editable by a human (escaped string syntax vs. base64 encoded data). Clearly non-ASCII based encodings will not be very readable (e.g. the clob for the EBCDIC encoded string representing "hello" could be denoted as{% raw %}{{ "\xc7\xc1%%?" }}{% endraw %}).