256t.org

What?

256t.org is a domain dedicated to be a public specification for a specific type of content addressable storage. In this scheme the last element of up to 94 characters in a URL path defines the content at that URL. At some point in the future, it may evolve to also be a public utility for publishing content using the scheme. However, that is currently beyond the scope if this site.

Why?

Why 256t.org? A simple standard for a generic content addressable store seems generally useful to me.

How?

Every 94 character path can be used to retrieve content that matches the length and hash specified. If no content is available a 404 is returned instead.

A SHA-512 hash is 512 bits or 64 bytes long. 64 bytes can be stored in a 86 character base64 string.

(64 * 8) / 6 = 85.333... ~= 86

Content of 64 bytes or less can be stored directly in equal or lesser space. In such cases, the content itself should be base64 encoded and used with a minimum of padding rather than using its hash.

An 8 character base64 string can store 48 bits.

2^48 = 2^40 * 2^8
  = 2^10 * 2^10 * 2^10 * 2^10 * 2^8
  = 2^8  * 2^10 * 2^10 * 2^10 * 2^10
  = 256    K      M      G      T

The following can be treated as true enough despite being false: - A 94 character path uniquely determines content. (However, this is completely true for content less than 64 bytes.) - The content is immutable. (It could be replaced by different content that still meets the description.) - Content can be safely cached indefinitely.

Thus all HTTP meta information such as headers and eTags will indicated that the content can be cached indefinitely.

The 94 character content tag consists of an 8 character length prefix followed by a 86 character hash.

length of the content hash of the content the content iteself
when always length(content) > 64 length(content) <= 64
start 1 9 9
end 8 94 94
length 8 86 0 to 86
format Base64 Base64 Base64
info length(content) sha-512(content) content

Base64

More specifically, filename and URL safe Base64 aka base64url.

CID

This 94 character or less base64 string which identifies content will be referred to as a content identifier or CID.

When

Where

Any server could expose a base URL with contents that adhere to this spec. Alas, what servers host what content and how to find them is beyond the scope of this text. This server hosts a small set of CIDs here.

Beyond the Scope of This Text

These things are beyond the scope of this text.

Tools

Storage and Deployment

For deploying content-addressed storage using CIDs, this repository includes tools for uploading to Cloudflare R2 (S3-compatible storage).

R2 Upload Script

The .github/scripts/r2_upload.py script uploads files to R2 with metadata-based CID verification:

See .github/scripts/README.md for detailed usage, examples, and a Cloudflare Worker snippet showing how to override the ETag header for clients.

What about Collisions?

There are a few different types of collisions that are important to distinquish between: - accidental -- purely by chance - adversarial -- someone tried to cause it - existing -- a CID has been produced from different content - problem -- usage of the CID to get content returned the wrong content

The odds of a problem collision are quite low. There are two ways to minimize them: - Always verify CID content. There are many implementations to do so. It is easier to just lie than engineer a collision. - Reduce adversaries. If nobody is putting problem content where you might accept it, you are left with just accidents.

I'm comfortable just ignoring accidental problem collisions.

Implementations

The 256t.org specification has been implemented in multiple programming languages. Each implementation provides utilities to generate and verify content identifiers (CIDs).

Language Badge Code
Bash Bash bash
C C c
C++ C++ cpp
C# C# csharp
Clojure Clojure clojure
CMake CMake cmake
Crystal Crystal crystal
D D d
Dart Dart dart
Deno Deno deno
ECMAScript ECMAScript ecmascript
Elixir Elixir elixir
Elm Elm elm
Emacs Lisp Emacs Lisp emacs-lisp
Erlang Erlang erlang
F# F# fsharp
Fortran Fortran fortran
Go Go go
Groovy Groovy groovy
Haskell Haskell haskell
Java Java java
JavaScript JavaScript javascript
Julia Julia julia
Kotlin Kotlin kotlin
Lua Lua lua
Nim Nim nim
Node.js Node.js node
OCaml OCaml ocaml
Perl Perl perl
PHP PHP php
PowerShell PowerShell powershell
Prolog Prolog prolog
Python Python python
R R r
Racket Racket racket
Ruby Ruby ruby
Rust Rust rust
Scala Scala scala
Swift Swift swift
Tcl Tcl tcl
TypeScript TypeScript typescript
Unison Unison unison
Zig Zig zig