ltdk
e223eba392

1 month ago  

src  1 month ago  
.editorconfig  5 months ago  
.gitignore  5 months ago  
CHANGELOG.md  5 months ago  
Cargo.toml  5 months ago  
LICENSE.md  5 months ago  
Makefile  5 months ago  
README.md  5 months ago  
rusttoolchain.toml  5 months ago  
rustfmt.toml  5 months ago 
README.md
ious
Integers of unknown sizes.
Summary
An IOUS (Integer Of Unknown Size) is a variablelength integer which is encoded as part of a data stream. This is contrasted with an IOKS (Integer Of Known Size), which has a known size and used to represent the actual integer stored within an IOUS. The plural forms of IOUS and IOKS are IOUSes and IOKSes, and the names are inspired from the Rodents Of Unusual Size (ROUSes) from The Princess Bride.
While the IOUS format is generic over all sorts of parameters, it's designed so that any one implementation of it will be very easy to implement and very fast to both encode and decode. Instead of using continuation markers like most variablelength formats, it stores all length information at the beginning of the stream, often entirely avoiding the need for branching at all.
Conceptually, an IOUS indicates its length with the number of disabled bits at the beginning of the stream. In the most common case, with an IOUS formed of 8bit units capped at 9 units of length + data, any integer between 8 and 128 bits can be decoded using four instructions:
 Count the leading zeroes of the first byte. (the length)
 Shift the first byte left by the length.
 Shift the first byte right by the length.
 Reverse the order of the bytes. (on littleendian architectures)
Although this simple case is relatively straightforward, code that operates under a completely generic form of IOUSes can be much more complicated, although any individual instance of the format can be similarly optimised to a very fast and simple set of instructions.
The IOKS format exists solely as a means to describe the endianness of an IOUS after decoding or before encoding. With bytesized units, it's equivalent to a bigendian representation, but the possibility for alternatively sized units makes the endianness trickier to explain, and simpler to just describe as a separate format.
Units
IOUSes and IOKSes are comprised of multiple units, which represent standard fixedlength integer types. The signedness of the units extends to the signedness of the integer data itself, meaning that signed units can be used to represent signed integers and unsigned units can be used to represent unsigned integers.
IOKSes are always stored in "bigendian" format with the most significant unit first, although the endianness within units themselves is left unspecified. Unit conversion is outside the scope of this spec.
Since IOUSes are intended to be represented on their own without any additional data, they must contain at least one unit of data even if that data is an unsigned zero. Technically, unsigned IOKSes can unambiguously represent an integer zero with zero units, although the spec will assume that IOKSes have at least one unit for simplicity.
Length bits
IOUSes are designed to be very similar to variablelengthquantities (VLQs), with the difference that all length information is stored at the beginning of the IOUS. Conversion from an IOUS to an IOKS thus simply requires replacing the length information with zeroes or signextended bits depending on the units used.
IOUSes also have an upper bound on the number of bits that can be used to represent the length, called the ceiling, which both bounds the total size of an IOUS in units and allows representing a maximumlength IOUS with one fewer bit than otherwise required. This is specific to IOUSes, and does not affect IOKSes, since they contain no length information.
The capacity of the IOUS is the number of bits of data that can be stored at its maximum length. The ceiling plus the capacity is the maximum length of the IOUS.
Layout
To describe the layout of an IOUS, we need two parameters: a unit type and a ceiling size. To make describing things easier, we will represent the unit type by its total length in bits (unit bits) and its signedness (signed or unsigned).
Although bits may internally be stored differently for units, we will describe the format as if the bits within individual units are stored in order of descending significance, even though this may not be the case. In practice, machine instructions will allow operating on units as if they actually are stored this way, e.g. via bitshifting, which makes this assumption reasonable.
IOUSes are split into two parts: length bits and data bits, in that order. The total number of bits must add up to a multiple of the number of unit bits, since we will be storing the IOUS as a sequence of units.
The length bits will always start with some number of disabled bits (that number can be zero). If this number of bits is less than the ceiling, it must be followed by an enabled terminating bit to indicate the end of the length. An IOUS is considered invalid if it does not have valid length bits, which can only happen if it has zero units, or is exclusively units filled with disabled bits that add up to less than the ceiling; the latter can occur if the ceiling is larger than the unit bits times the number of units in the IOUS.
The portion of the length bits without the terminating zero is called the length data.
We round up the number of bits in the length bits to a multiple of the unit bits to determine a whole number of length units, which may contain some nonlength bits. We call these bits extra data bits since they are not represented in the length data, and merely add to the total number of data bits available. If there are a nonzero number of extra data bits, then the unit containing those bits is called the extra data unit.
Immediately following the length units is some number of data units, equal to the number of bits in the length data. The extra data bits, followed by the data units, constitutes the data bits. If the IOUS does not actually contain enough units to account for the extra units, it is considered invalid.
Conversion
Converting from an IOUS to an IOKS requires excluding the length units and then overwriting the length bits in the extra data unit with either zeroes or a sign extension depending on the unit type. To convert back from an IOKS to an IOUS, however, you will need the space for the length units, so that you have room to write the length bits.
Loading an IOKS into a system register simply requires padding the IOKS out to the necessary length, then converting the order of the units to the native endianness.
Creating an IOKS requires determining the number of units required, then converting the native endianness back to bigendian units. Once you have an IOKS, you can then mask in the length bits to create an IOUS.
As mentioned in the summary, one benefit of IOUSes is the ability to write fast and simple code for specific cases, and as such, you may not want to use the fully generic version in everyday cases. However, the full pseudocode for the generic conversions is left below for reference.
IOUS to IOKS
 If the buffer is empty, terminate with error.
 Set dataunits to zero.
 Set lengthunits to zero.
 Loop the following:
 Set tempunit to the unit at index lengthunits.
 Set tempbits to the number of leading disabled bits in tempunit.
 Add tempbits to dataunits.
 Increment lengthunits.
 If dataunits is greater than the ceiling:
 Set dataunits to the ceiling.
 Set tempbits to the ceiling modulo the unit bits.
 If tempbits is zero, set extradatabits to zero. Otherwise, Set extradatabits to the unit bits minus tempbits.
 Terminate the loop.
 If tempbits is less than the unit bits:
 Set extradatabits to the unit bits minus tempbits minus one. The minus one is the terminating bit.
 Terminate the loop.
 If lengthunits is not less than the length of the buffer, terminate with error.
 Set pastend to lengthunits plus dataunits.
 If the length of the buffer is less than pastend, terminate with error.
 If extradatabits is nonzero:
 Decrement lengthunits.
 Set tempbits to the unit bits.
 Subtract extradatabits from tempbits.
 Set tempunit to the unit at index lengthunits.
 Leftshift tempunit by tempbits.
 Rightshift tempunit by tempbits. This will be a logical shift for unsigned units and an arithmetic shift for signed units.
 The IOKS begins at index lengthunits and terminates at index pastend.
 The IOUS terminates immediately after the IOKS.
IOKS to IOUS
Preparation
When converting an IOKS to an IOUS, you'll want to ensure that the IOKS is as small as possible.
For unsigned integers, this means that you'll want to remove all leading zero units and mark down the number of leading disabled bits as the "trimmable" bits.
For positive signed integers, you can perform the same conversion as unsigned integers, with one caveat: you can trim one fewer bits than you would otherwise, as this has to be retained as the sign bit.
For negative signed integers, you'll want to remove leading negativeone units and mark down the number of leading enabled bits, also being able to trim one fewer bits.
Conversion
 Set trimmablebits to the number of trimmable bits. This quantity is described in the previous section.
 Set lengthbits to the number of units in the IOKS.
 Set lengthunits to lengthbits divided by the unit bits, rounding down.
 Set extradatabits to negative lengthbits Euclideanmodulo the unit bits.
 If extradatabits is greater than the trimmable bits:
 Set dataunit to the first unit in the IOKS.
 Rightshift tempunit by the unit bits minus one. This will be a logical shift for unsigned units and an arithmetic shift for signed units.
 Insert dataunit at the beginning of the IOKS.
 Set lengthunit to negative one.
 Leftshift lengthunit by extradatabits minus one.
 Set dataunit to the first unit in the IOKS.
 If dataunit is negative:
 And the bits of dataunit with lengthunit.
 Leftshift lengthunit by one.
 Or the bits of dataunit with lengthunit.
 Replace the first unit of the IOKS with dataunit.
 Insert lengthunits negativeone units at the beginning of the IOKS.
License
Released to the public domain to the greatest extent possible via the CC0 License.