From a forum post, a set of bullet point advice on creating a custom archive format.
– – – –
– For terms: Archives are a directory/dictionary/list of filenames with offsets to chunks, and many data chunks (files). Compression comes later.
– Directories and subdirectories are just longer filenames. Unlimited file name length means unlimited directories.
– Padding the start of data chunks to 4 byte boundaries has performance advantages on many platforms when in RAM.
– For random access, compress chunks individually. Compressing everything at once tends to save more space, but you trade off random access.
– If there’s room in RAM, cache the archive. At the very least, caching the directory/dictionary can improve load times by not having to read it every time you request a file.
– If there’s more room in RAM, compress the archive as a whole, and cache the uncompressed copy after loading. But keep in mind, you need slightly more free memory than the sum of the uncompressed size and the compressed size.
– If running off a CD/DVD, seek time sucks. If you can’t cache everything important or commonly used in RAM, replicating your archive on different parts of the disk can improve seek time. A simple check of what sector the last file request was on disk, and a little math to pick the closest to where the laser sits.
– Seek time on memory cards/flash memory is significantly better than CDs/DVDs, but random access is still slower than continuous access.
– Prefer (if possible) ordering data on disk/archive in the same order it’s requested. Though that’s a bigger impact with CDs/DVDs, it’s something that can be considered when talking about your own archiving.
– Storing all files in the same directory in the same general part of the archive can achieve the previous point, so long as that data is of “Level 1″ or “Level 2″ type scope.
– Accessing distinctly common data first, then level specific data can save seek time if the common data is far away on disc.
– Storing or interweaving streamed data together can save seek time (music, video, static geometry).