Files
SweepStore/documentation/header.md

107 lines
4.4 KiB
Markdown
Raw Blame History

# Sweepstore Header Structure
The Sweepstore file format uses a structured header to manage file metadata and concurrency control. The header consists of three main parts: the static header, the concurrency header, and dynamic worker tickets.
## Static Header (Bytes 0-28)
The static header contains basic file information and pointers.
| Offset | Size | Field | Type | Description |
|--------|------|-------|------|-------------|
| 0 | 4 bytes | Magic Number | String | File identifier, must be "SWPT" |
| 4 | 12 bytes | Version | String | Version string (UTF-8), max 11 chars (padded with spaces) |
| 16 | 8 bytes | Address Table Pointer | int64 | Pointer to the address table location |
| 24 | 4 bytes | Free List Count | int32 | Number of entries in the free list |
| 28 | 1 byte | Is Free List Lifted | bool | Flag indicating if free list is lifted (0=false, 1=true) |
**Total Size:** 29 bytes
## Concurrency Header (Bytes 29-45)
The concurrency header manages multi-threaded access and coordination.
| Offset | Size | Field | Type | Description |
|--------|------|-------|------|-------------|
| 29 | 8 bytes | Master Identifier | int64 | Unique identifier for the master process |
| 37 | 4 bytes | Master Heartbeat | int32 | Heartbeat counter for the master process |
| 41 | 4 bytes | Number of Workers | int32 | Total number of concurrent worker tickets |
| 45 | 1 byte | Is Read Allowed | bool | Flag indicating if read operations are allowed (0=false, 1=true) |
**Total Size:** 17 bytes
## Worker Tickets (Starting at Byte 46)
Worker tickets are dynamically sized based on the number of workers specified in the concurrency header. Each ticket is 30 bytes.
**Base Offset Calculation:** `46 + (ticketIndex * 30)`
### Single Ticket Structure
| Relative Offset | Size | Field | Type | Description |
|-----------------|------|-------|------|-------------|
| 0 | 4 bytes | Identifier | int32 | Unique identifier for this worker |
| 4 | 4 bytes | Worker Heartbeat | int32 | Heartbeat counter for this worker |
| 8 | 1 byte | Ticket State | byte (enum) | Current state of the ticket (see SweepstoreTicketState) |
| 9 | 1 byte | Ticket Operation | byte (enum) | Current operation being performed (see SweepstoreTicketOperation) |
| 10 | 8 bytes | Key Hash | int64 | Hash of the key being operated on |
| 18 | 8 bytes | Write Pointer | int64 | Pointer to the write location |
| 26 | 4 bytes | Write Size | int32 | Size of the write operation |
**Ticket Size:** 30 bytes
## Enumerations
Enum fields are stored as single-byte integers. The following tables show the integer values for each enum state:
### SweepstoreTicketState (1 byte)
| Value | Name | Description |
|-------|------|-------------|
| 0 | IDLE | Ticket is idle and not performing any work |
| 1 | WAITING | Ticket is waiting for approval |
| 2 | APPROVED | Ticket has been approved to proceed |
| 3 | EXECUTING | Ticket is actively executing an operation |
| 4 | COMPLETED | Ticket has completed its operation |
### SweepstoreTicketOperation (1 byte)
| Value | Name | Description |
|-------|------|-------------|
| 0 | NONE | No operation assigned |
| 1 | READ | Read operation |
| 2 | MODIFY | Modify operation |
| 3 | WRITE | Write operation |
## Total Header Size Calculation
The total header size depends on the number of workers:
```
Total Header Size = 46 + (numberOfWorkers * 30) bytes
```
For example:
- 4 workers: 46 + (4 <20> 30) = 166 bytes
- 8 workers: 46 + (8 <20> 30) = 286 bytes
## Initialization
When initializing a new Sweepstore file using `initialiseSweepstoreHeader()`:
- Magic number is set to "SWPT"
- Version is set to "undefined"
- Address table pointer is set to null pointer
- Free list count is set to 0
- Is free list lifted flag is set to false
- Master identifier and heartbeat are set to 0
- Number of workers is set according to the parameter (default: 4)
- Read allowed flag is set to false
- All worker tickets are initialized with identifier set to 0, heartbeat set to 0, IDLE state (0), and NONE operation (0)
## Implementation Notes
- All multi-byte integers are stored in little-endian byte order
- The version string is padded with spaces and prefixed with a space character
- Boolean values are stored as single bytes (0 or 1)
- Enum values are stored as single-byte integers using their index values (0, 1, 2, etc.)
- Pointers use int64 for addressing, with -1 representing a null pointer
- The header is designed for concurrent access with heartbeat-based liveness detection