Development Log

Continuous Evolution

Tracking the progress of Haptic Vision beyond the initial 48-hour sprint.


Voice Control, Blind-First Features, and Data Tools

The rig now answers out loud. Hold the talk button, ask a question or give a command, and the response comes back from a model running entirely on the device. Nothing you say leaves the backpack.

This round also adds the everyday touches a daily user actually asks for: a step counter, a pocket mode that quiets the belt when you pick the camera up, directional cues, replies in the user's own language, and a far more natural voice option.

  • On-device voice commands and free-form questions about the live scene
  • Step counter, pocket mode, and compass-rose direction cues
  • Fifteen languages, plus an optional natural-sounding Piper voice
  • Offline tools to anonymize a session and auto-cut a highlight reel

Faster Face Search and SQL-Ready Sessions

Face recognition moved behind a vector index, so finding a known person stays quick even as the roster grows. The match results are identical to the old path; only the lookup got faster.

Every session log now also writes to a sibling SQLite database, so we can run real queries over a run instead of parsing text by hand.

  • Approximate-nearest-neighbor face search with the same match quality
  • Each session mirrored into a queryable SQLite database
  • Both features fall back cleanly when their optional libraries are absent

Voice Input and Knowing Where You Are

The first version of push-to-talk landed: hold the microphone, speak, and the system either runs the command or sends the question to the local model with the current view as context.

Inside a place you have walked before, the rig now keeps track of where you are along the recorded route, so "where am I" comes back grounded in the nearest landmark.

  • Hold-to-speak commands and questions
  • Continuous positioning within a saved place
  • Answers that name the closest landmark

Place-Aware Boot, Haptic Vocabulary, and Scene Reading

Saved places now carry a WiFi and Bluetooth fingerprint, so on startup the rig can recognize where it is and offer to navigate from there instead of from the original scan point.

The belt also speaks its own language now: short taps for furniture, two-tap rhythms for doorways, rising patterns for stairs, and a sharp warning pattern for hazards. And one tap reads any sign, room number, or label in view out loud.

  • Recognizes a saved place from a single boot-time scan
  • Distinct vibration patterns for furniture, doorways, stairs, and warnings
  • On-demand scene reading for signs, room numbers, and labels

Desktop Diagnostics & Headless Recording

We've stabilized the desktop-to-Jetson bridge. The system now features high-frequency sensor diagnostics and drift correction that mirrors the field rig, making it easier to calibrate sensors from a workstation.

Furthermore, the system can now run fully "headless" (without a screen), recording high-fidelity promo footage and snapshots directly to the disk for cleaner post-run analysis.

  • Real-time magnetometer drift correction
  • Headless promo capture and scene snapshots
  • High-frequency diagnostics and sensor liveness checks

Place Navigation Beta & Session Logging

Our biggest functional leap yet: Pre-scanned Place Navigation. The system now supports A* route planning through landmark-indexed environments, providing turn-by-turn haptic nudges and audio cues.

To support this, we've launched a comprehensive Session Logger that captures a full JSONL archive of every frame, YOLO detection, and sensor event for perfect reconstruction of navigation runs.

  • Turn-by-turn haptic + audio navigation cues
  • Hazard clustering (merging multiple objects into one sentence)
  • Full-environment JSONL session archiving

Narrative Engine Upgrade: Gemma 4

We've migrated to a fine-tuned Gemma 4 model for our Narrative Engine, resulting in a 40% reduction in description latency and more natural spatial awareness.

This upgrade brings richer scene understanding and smoother audio narration, making the system feel more responsive and context-aware during real-world navigation.


Curb Detection & Audio Pre-emption

Safety is our top priority. We've implemented dedicated Curb and Drop-off detection using RANSAC ground-plane comparison, alongside an intelligent "Overhead Obstacle" system to protect the user's head plane.

The audio pipeline now supports "Speech Pre-emption," allowing critical safety alerts to immediately silence lower-priority narrations so the wearer can react to hazards in real-time.

  • Cliff and drop-off detection (Fast-approach priority)
  • Overhead hazard detection with auto-calibration
  • Urgent audio alert pre-emption

Modular Architecture & Sensor Suites

The system has been completely refactored from a single script into a modular, package-based architecture. This allows us to scale vision, hardware, and audio subsystems independently.

We've also fully integrated the ZED 2i sensor suite, enabling barometer-based floor detection, IMU fall detection, and high-DPI face recognition.