Pipeline Architecture: Lab vs Simulation
The package provides two distinct processing pipelines designed for different stages of the research workflow.
Lab Mode (Post-Facto)
High-Precision Analysis & Validation
The Lab Mode pipeline is designed for batch analysis where the complete dataset is available. It aims for parity with the legacy R implementation.
Characteristics
Memory: Full dataset loaded into memory.
Processing: Acausal algorithms (e.g., filtfilt, centered moving averages).
Calibration: Batch computation from the full dataset.
Depth: Acausal interpolation of missing values.
Configuration
To enable Lab Mode, set the following in your YAML config:
pipeline:
mode: "LAB"
calibration:
attachment_angle: "batch_compute"
magnetometer: "batch_compute"
depth:
mode: "interpolate"
true_integration: false # Set to true to use real timestamps for track generation (physically accurate but differs from R)
Clock Drift & Jitter
Real hardware sensors often have clock jitter (variable sampling rates).
Legacy R implementations use Fixed Step Integration (assuming dt = 1/freq), which ignores this jitter. This produces smooth tracks but mathematically implies that the animal speeds up/slows down inversely to the clock jitter if analyzed against real time.
Default Behavior (R-Compatibility): Uses fixed
dt. Matches legacy R output exactly. Velocity calculated from position/time ratio will show jitter.True Integration (``true_integration: true``): Uses actual time deltas (
real_dt). Matches the physical path to the clock. Velocity is stable, but positions differ slightly from R legacy tracks.
Diagnostic Output:
Lab Mode now includes a ClockDrift column in the output, measuring the deviation between the sensor clock and an ideal clock:
Drift = Timestamp_Actual - (Start_Time + Index * 1/Freq)
Simulation Mode (Real-Time)
On-Tag Algorithm Development
The Simulation Mode pipeline is designed for real-time processing where data arrives sample-by-sample. It uses fully causal algorithms with no lookahead.
Characteristics
Memory: Fixed memory footprint (O(1)).
Processing: Causal algorithms (e.g., lfilter, EMA).
Calibration: Online adaptive calibration or fixed parameters.
Depth: Real-time estimation (hold last value or predict).
Configuration
To enable Simulation Mode, set the following in your YAML config:
pipeline:
mode: "SIMULATION"
calibration:
attachment_angle: "fixed" # or "progressive" (future)
magnetometer: "fixed"
depth:
mode: "realtime"
Streaming Algorithm (11-Step Architecture)
The StreamingProcessor implements a strictly causal, high-fidelity motion model reconciled with validated post-facto pipelines. Each step processes data as it arrives with O(1) memory footprint.
1. Input Acquisition
Raw sensor data is extracted from the input record. Accelerometer values arrive in 0.1g units (1 count = 0.1g). Magnetometer values are in raw ADC counts.
ax_m, ay_m, az_m = record["int aX"], record["int aY"], record["int aZ"]
depth_raw = record["Depth"]
2. Attachment Correction
Locked calibration parameters for attachment roll and pitch are applied via rotation matrices (Rx then Ry). This aligns the sensor frame with the animal’s body frame.
Where \(\phi_{att}\) is attachment roll and \(\theta_{att}\) is attachment pitch.
3. Window Management (Causal Gsep)
A trailing moving average window of filt_len=48 samples (3 seconds at 16Hz) estimates the static (gravity) component. Dynamic acceleration is the residual.
static_x = sum(buffer_x) / filt_len
dyn_x = ax_att - static_x
odba_g = (|dyn_x| + |dyn_y| + |dyn_z|) / 10.0 # Convert to g
Warmup Behavior: During the first 48 samples, static = raw accel, dynamic = 0, ODBA = 0.
4. Dead Reckoning Timing
Time delta is computed from sensor timestamps (or fixed dt if clock_source: fixed_frequency).
5. Orientation (R-style pitchRoll2)
Pitch and roll are calculated from the static acceleration vector using R-compatible trigonometric formulas:
6. World Frame Transformation
Acceleration is rotated from body frame to world frame (NED convention) and gravity is removed:
7. High-Pass Filter (Bias Removal)
A 4th-order Butterworth high-pass filter (cutoff 0.05 Hz at 16 Hz sample rate) removes sensor bias and integration drift:
accel_z_filtered, zi = lfilter(highpass_b, highpass_a, [accel_z_no_gravity], zi=zi)
8. INS Depth Estimation (Kalman Filter)
A 2-state Kalman Filter (depth, vertical velocity) fuses:
Prediction: Vertical acceleration integration (16 Hz)
Update: Sparse pressure sensor depth measurements (~1 Hz)
Parameters: process_noise_depth=1e-4, process_noise_velocity=1e-3, measurement_noise=0.02
9. Multi-Scale Depth Smoothing
Adaptive EMA blending based on activity level:
Low activity (ODBA < 0.15): Favor slow EMA (τ=45s)
High activity (ODBA > 0.20): Favor fast EMA (τ=3s)
Transition: Linear blend
10. Magnetometer & Heading
Tilt-compensated magnetic heading using locked hard-iron calibration:
mx_n = (mx_raw - offset_x) / sphere_radius
heading = atan2(-my_world, mx_world)
11. Dead Reckoning Integration
Track position is updated using heading and speed:
pseudo_x += speed * dt * cos(heading)
pseudo_y += speed * dt * sin(heading)
Speed model: constant (1.0 m/s) or odba_scaled.
11b. Terrain and Altitude Processing
To model real-world swimming kinematics in complex benthic zones, the simulator estimates geographic coordinates and queries an external elevation service:
GPS Estimation: If raw GPS coordinates (latitude/longitude) are absent, coordinates are dynamically estimated in real time using a flat-earth approximation integrated from the
start_locationand dead-reckoning displacement (pseudo_xandpseudo_y).Centralized Elevation Query: The processor queries the
topobathysimservice at the configuredtopobathysim_urlusing a caching client at Zoom 11 (50-meter resolution).Benthic Telemetry: Calculates the animal’s exact height above the seafloor:
Where Depth is positive down and Seafloor_Elevation is the Mean Sea Level (MSL) altitude of the sea bottom (typically negative). The resulting Seafloor_Elevation and Altitude_Above_Seafloor are injected into the ZMQ stream.
Hardware-Specific Parallel Backends
To scale up to tens of thousands of parallel entities (e.g., Monte Carlo simulations or massive swarms), the simulator supports parallel GPU-resident backends:
NVIDIA Warp (CUDA)
Offloads Gsep ring buffers, running sums, and orientation calculations to NVIDIA GPUs using JIT-compiled CUDA kernels.
* Requirements: NVIDIA GPU, CUDA driver, and warp-lang package.
* Usage: Set backend: warp in the simulation configuration.
Apple MLX (Metal)
Specifically optimized for Apple Silicon (macOS) architectures. It leverages Apple’s MLX machine learning framework and the Apple Silicon GPU (Metal) using JIT-compiled execution graphs via mlx.core.compile.
* Unified Memory Advantage: Because the CPU and GPU share the same physical RAM space on Apple Silicon, data ingestion is zero-copy, avoiding the PCIe transfer bottlenecks typical of discrete GPU architectures.
* Performance: On an Apple M4 Max GPU (128 GB Unified Memory, 546 GB/s bandwidth), the MLX backend achieves over 30.7M Samples Per Second (SPS) for large swarms, outperforming vectorized NumPy by 3x.
* Requirements: Apple Silicon Mac and mlx package.
* Usage: Set backend: mlx in the simulation configuration.
Configuration Reference
Streaming Mode Configuration
The following YAML shows all configuration attributes (including global simulation parameters and individual entity-specific options) with their default values:
playback_speed: 1.0 # Real-time speed multiplier
loop: true # Loop simulation playback
backend: null # cpu | warp | mlx | null (auto)
topobathysim_url: "http://garnet.localdomain:9595" # Production tiled elevation service URL
entities:
- sim_id: sword_causal # Unique identifier for this entity
tag_id: RED001 # Biological tag ID (for metadata lookup)
sampling_rate_hz: 16.0 # Sensor sampling rate in Hz
clock_source: fixed_frequency # fixed_frequency | sensor_time
input_file: path/to/data.csv # Path to input CSV file
start_location: [41.52, -70.67] # [latitude, longitude] starting origin
calibration:
attachment_angle_mode: fixed # fixed | progressive (future)
magnetometer_mode: fixed # fixed | progressive (future)
# Pre-computed attachment angles (required if mode=fixed)
locked_attachment_roll_deg: 117.43
locked_attachment_pitch_deg: -5.99
# Pre-computed hard-iron calibration (required if mode=fixed)
locked_mag_offset_x: -21.15
locked_mag_offset_y: -12.89
locked_mag_offset_z: 30.27
locked_mag_sphere_radius: 56.51
depth_estimation:
algorithm: causal_sample_hold # causal_sample_hold | acausal_interp
dead_reckoning:
speed_model: constant # constant | odba_scaled
constant_speed_m_s: 1.0 # Speed when model=constant
ahrs:
enabled: false # true = Madgwick filter (future)
save_telemetry: true # Write output CSV
Key Configuration Decisions
clock_source:
fixed_frequency: Use nominal dt = 1/sampling_rate. Matches R legacy behavior.sensor_time: Use actual timestamp deltas. Physically accurate but differs from R.
calibration.attachment_angle_mode:
fixed: Use pre-computed angles fromlocked_attachment_*_deg.progressive(future): Online variance-based calibration.
dead_reckoning.speed_model:
constant: Fixed speed (default 1.0 m/s). Simple, reproducible.odba_scaled: Speed = ODBA × scaling factor. More realistic but noisier.