With the new filesystem remapping and sfile blobs we are introducing a NFS-like index server. Basically all sfile reading and writing is handled by a single process that maintains a consistent view of the repository.
All the following will be hidden under the fslayer code. Normal bk code has no idea that something interesting or special is ever happening to get to the filesystem.
Initially only sfiles will be handled this way, but eventually other state files will be managed by a central server. (pfiles, dfiles, etc…)
When a new bk process wants to read sfiles it needs to find a index server. A file ROOT/.bk/INDEX is consulted and it will contain an IP/PORT number of another bk process. If no INDEX is found or if the info is found to be stale then a new index server is spawned. That process will use sccs_lockfile(".bk/INDEX.lock") to acquire permissions to serve this repository. This process will continue to serve this repository until no new requests have been received for several minutes and then will write out state and exit.
The current INDEX is also put in the environment so that subprocesses do not need to read .bk/INDEX to know the INDEX.
Connections to the INDEX server are persistent and used for the duration of the process if possible. This means that current bkd code will NOT be used.
Properties of an INDEX server:
-
The server is a non-forking daemon that maintains connections to multiple clients simultaneously using select/poll. This allows all state updates to be synchronous.
-
Any updates to state are kept in memory and written to an unbuffered logfile at .bk/UPDATE.log. When exiting the disk is updated and UPDATE.log is deleted before dropping lock. If an UPDATE.log is found at startup then the previous process is assumed to have crashed and the log is replayed.
-
filesystem state is cached in memory to minimize filesystem access This process can grow in size, but the data will only be stored once per repository.
-
just simple files that are handled by the index server can now become locking
-
Larry’s keymap server can just be a directory that is managed by the index server and doesn’t actually go to disk.
-
only one INDEX server per nested collection
Commands from client to index server:
-
read-file <path> return: one of the following: a) the file on the disk were this data lives b) a file/offset/size where the data is embedded c) the actual file contents (small files) The index server decides which is used.
Also returns info to build stat struct
Note: the data is not always returned, but the remote client reads the data directly. This means that the structures returned need to be "relatively" stable and don't get deleted except at controlled times.
This API is used for open() and stat()
-
unlink <path> return: status
-
rename <path> <path> return: status
-
chmod <path> <mode> return: status
-
getdir <dir> returns: list of files
NOTE: we could include stat info here because the index server probably has that information in memory. But to be useful the getdir() and walkdir() calls used by bk need to be modified.
-
mkdir <dir> return: status
-
rmdir <dir> return: status
-
update <path> (with data inline) return: status
Write new data to filesystem, where the data is actually sent to the index server directly. Usually small.
-
updatefs <path> <tmpfile> return: status
Write new data to filesystem where the data is stored in a tmpfile in ROOT/.bk. Filename should probably contain HOST.PID. The index server will rename or copy that data as needed.
Issues
-
client A calls read pathA and we return a path where the data can be found. Then client B can unlink or overwrite pathA. Since we don’t know when A will actually read the data we can’t change the file we returned to that user. It would almost be better if we allowed each client that is writing data to to write a blob directly then just tell the index server about that.