Intro to binary weave

With the binary sfile format we also enable BWEAVE in the ChangeSet file.

With BWEAVE the ChangeSet weave section of the s.ChangeSet file is empty, instead each cset delta gets a WEAVE() offset where the weave is stored in the heap.

WEAVE_INDEX(s, d) == offset in heap for the weave part of this delta

List of rootkeys

The rootkeys included in the ChangeSet weave occur multiple times so they are only stored once in the heapfile. This is done using a linked list of rootkeys in the heap. The field s→rkeyHead is loaded from the sccs_perfile() section of the s.ChangeSet file and points at the most recently added rootkey in the file. The rootkeys are stored in the heap as a u32 binary offset to the next key followed by the actual null terminated key. As follows:

<u32 nextkey><rootkey>\0

The last key in the list has a 0 for the nextkey pointer.

When a new file is to be added for a commit, list is walked to find the existing rootkey. See the HEAPFILE for more information on how this information is optimized.

BWEAVEv2

The original binary weave format is known as BWEAVEv2 and includes a repository feature bit. The heap at that offset contains at list of rootkey/deltakey pairs, the rootkeys are stored as offsets to the heap so they can be deduplicated. The deltakeys are stored null-terminated directly in the heap. The end of the list is marked by a 0 rootkey offset.

<rk1 off><dk1 ...>\0
<rk2 off><dk2 ...>\0
<0 u32>

The rootkey offset in the weave points at the 'nextkey' field of the rootkey linked list. So the actual rootkey is at HEAP(s, off+4).

This format shipped in bk-6 .

BWEAVEv3

In v3 the format of the binary weave changes. Now both the rootkeys and the delta keys are stored loose in the heap and the actual weave data is just rk/dk pairs written as 32-bit offsets.

<rk1 off><dk1 off>
<rk2 off><dk2 off>
<0 u32>			# no deltakey here

In this version both offsets point directly at the keys and so for the rootkey the offset is right after the 'nextkey' field.

This is larger than BWEAVEv2 as it uses 4 more byte per line in the weave. It also means that when reading the entire weave we stride two regions of memory instead of just one. But in the case where the cset weave needs to be scanned for certain rootkeys, the size of the data to be traversed is significantly smaller.

The difference between BWEAVEv2 and BWEAVEv3 can be see by looking at the heap usage on the ChangeSet file on the linux-2.6 repository.

BWEAVEv2

sfile: 11.4M->31.8M
heap1: 87.1M->207M
    cludes:   2.19M  1.1%  25694  89.3
  comments:    122M 59.1% 297952 429.4
     weave:   75.4M 36.5% 281343 281.1
    random:      17  0.0%      1  17.0
  userhost:    919K  0.4%  28768  32.7
  pathname:      10  0.0%      1  10.0
      zone:     147  0.0%     21   7.0
  csetFile:      78  0.0%      1  78.0
  symnames:   1.86K  0.0%    192   9.9
  rootkeys:   5.46M  2.6%  54890 104.2
  uniqhash:    512K  0.2%
    unused:     130  0.0%
  table1:   4.55M
  table2:   27.3M
 symlist:   2.29K

BWEAVEv3

sfile: 11.3M->31.8M
heap1: 93.0M->210M
    cludes:   2.19M  1.0%  25694  89.3
  comments:    122M 58.0% 297952 429.4
     weave:   8.61M  4.1% 281343  32.1
    random:      17  0.0%      1  17.0
  userhost:    919K  0.4%  28768  32.7
  pathname:      10  0.0%      1  10.0
      zone:     147  0.0%     21   7.0
  csetFile:      78  0.0%      1  78.0
 deltakeys:   70.8M 33.7% 988490  75.2
  symnames:   1.86K  0.0%    192   9.9
  rootkeys:   5.46M  2.6%  54890 104.2
  uniqhash:    512K  0.2%
    unused:       0  0.0%
  table1:   4.55M
  table2:   27.3M
 symlist:   2.29K

Note	The 75M weave was replaced with a 8.6M weave and a 71M table of delta keys.

Upgrading to BWEAVEv3

For traditional ascii and BWEAVEv2, sfiles the upgrade path in identical. Clone with --upgrade will write a BKFILE s.ChangeSet file and enable BWEAVEv3.

To revert to an ascii repository bk clone --compat continues to work. To revert to a BWEAVEv2 repository the user would need to use --compat with the new bk and then --upgrade with bk-6.x.

Implementation details

For ascii ChangeSet files the existing ascii weave is walked unconditionally at the end of sccs_init(). Then the weave is built on the fly into the in-memory copy of the heap. When writing an ascii ChangeSet weave the sccs_nextdata() function produces the old ascii weave on the fly. This means the BitKeeper is actually slower when reading ascii repositories with big ChangeSet files. This is most noticeable when running commands that read information from the delta table of the ChangeSet file, like in the GUI tools.

The reading and write v2 vs. v3 is hidden in the RKDKOFF macro on the read side and weave_set() function on the write side.

If the format that is read differs from the format written, then weave_cvt() will be called to generate v3 and weave_cvt2() will be called to generate v2.

Storing 1.0 markers in the weave

Commands like rset can be made faster if the cset weave indicated when a new file was created. Currently you have to walk the entire weave to know that a given rootkey will not be used again.

As part of the BWEAVEv3 code, the weave encoding can also include markers like this:

<u32 rkoff><u32 dkoff>   # normal rootkey/deltakey part for 1.1
<u32 rkoff><u32 0>	  # 0 means this rootkey won't appear again

So the last time each rootkey appears in the weave we have an additional weave line that mark this. This always code looking for information about a certain file to stop processing the weave when no more history for that file appears.

This information is added to the weave by commit, takepatch and repack. At the moment the information is advisory. The marker isn’t always present, but if it is present it is correct. Check verifies that.