I recently started creating a calorie tracking app that accepts voice commands for hands-free user-friendly interaction.
Understanding the Problem
Manually logging calories can be tedious. Many people already use voice assistants, so letting users say commands like “Add 500 calories” simplifies the process. The app calculates totals and compares them with daily goals, making it easier to stay on track without typing. In hindsight, I should have just let users say numbers; instead, I created a map of intents and calories so they can say, “Hey, add this hotdog.
Inspiration from My Dad's Health Journey
My dad’s diagnosis led him to track health data and calories more closely. Watching him manually note everything inspired me to make a more interactive tool.
Choosing Pico Voice and Rhino for Voice Interactions
- Wake Word Detection (Pico Voice): Listens for a trigger phrase (e.g., “Hey Add This”).
- Inference (Rhino): Understands commands once the app “wakes up,” like “Set my calorie goal to 2000.”
These two tools allowed me to integrate voice features without relying heavily on cloud-based services, keeping data processing on the device.
The Pico Voice Context YAML: Crafting Custom Commands
Setting up Rhino required defining a Context YAML:
contexts: - name: "calories" intents: - intent: "CALCULATE_CALORIES" examples: - "How many calories did I eat today?" - "What is my total calorie count?" - intent: "SET_GOAL" examples: - "Set my calorie goal to 2000" - "I want to eat 2500 calories today" - intent: "GET_GOAL" examples: - "What is my calorie goal?" - intent: "COMPARE_TO_GOAL" examples: - "How many calories am I away from my goal?" slots: - name: "calories_amount" type: "NUMBER" - name: "date" type: "DATE"
Intents define the user’s request (e.g., “SET_GOAL”), while slots capture variables like the calorie amount. Training this model taught me the power of well-structured voice interactions.
Integrating with My App: Making it Practical After training the model, I integrated it into a React Native app. Pico Voice’s SDK provided a straightforward way to detect the wake word, then pass recognized commands to my app’s state. The result: a hands-free calorie logging experience.
Lessons Learned Voice Interaction Adds Value: Users can track calories without manually typing. Rhino’s Flexibility: Customizing intents and slots tailors the experience to my app’s needs. Voice Processing Challenges: Fine-tuning models to handle diverse phrasings takes time.
I'm excited about the possibilities now that I’ve trained my own model. I’m considering connecting it to a centralized hypervisor like Home Assistant and integrating it with context-aware protocols for wake word detection and intent inference. It opens the door to building a highly customizable AI assistant that actually fits how I want to interact with it.