Building a Voice-Activated Calorie Tracking App with Pico Voice and Rhino

February 20, 2025

I recently started creating a calorie tracking app that accepts voice commands for hands-free user-friendly interaction.


Understanding the Problem

Manually logging calories can be tedious. Many people already use voice assistants, so letting users say commands like “Add 500 calories” simplifies the process. The app calculates totals and compares them with daily goals, making it easier to stay on track without typing. In hindsight, I should have just let users say numbers; instead, I created a map of intents and calories so they can say, “Hey, add this hotdog.


Inspiration from My Dad's Health Journey

My dad’s diagnosis led him to track health data and calories more closely. Watching him manually note everything inspired me to make a more interactive tool.


Choosing Pico Voice and Rhino for Voice Interactions

  • Wake Word Detection (Pico Voice): Listens for a trigger phrase (e.g., “Hey Add This”).
  • Inference (Rhino): Understands commands once the app “wakes up,” like “Set my calorie goal to 2000.”

These two tools allowed me to integrate voice features without relying heavily on cloud-based services, keeping data processing on the device.


The Pico Voice Context YAML: Crafting Custom Commands

Setting up Rhino required defining a Context YAML:

contexts:
  - name: "calories"
    intents:
      - intent: "CALCULATE_CALORIES"
        examples:
          - "How many calories did I eat today?"
          - "What is my total calorie count?"
      - intent: "SET_GOAL"
        examples:
          - "Set my calorie goal to 2000"
          - "I want to eat 2500 calories today"
      - intent: "GET_GOAL"
        examples:
          - "What is my calorie goal?"
      - intent: "COMPARE_TO_GOAL"
        examples:
          - "How many calories am I away from my goal?"
    slots:
      - name: "calories_amount"
        type: "NUMBER"
      - name: "date"
        type: "DATE"

Intents define the user’s request (e.g., “SET_GOAL”), while slots capture variables like the calorie amount. Training this model taught me the power of well-structured voice interactions.

Integrating with My App: Making it Practical After training the model, I integrated it into a React Native app. Pico Voice’s SDK provided a straightforward way to detect the wake word, then pass recognized commands to my app’s state. The result: a hands-free calorie logging experience.

Lessons Learned Voice Interaction Adds Value: Users can track calories without manually typing. Rhino’s Flexibility: Customizing intents and slots tailors the experience to my app’s needs. Voice Processing Challenges: Fine-tuning models to handle diverse phrasings takes time. Performance Matters: Reliable wake word detection and fast inference are crucial for smooth user interaction. A Spark of Curiosity: Event-Driven Semi-Autonomous Systems Bill Burnham (former CTO for U.S. Special Operations Command) points out that future enterprise IT must handle AI workloads. This aligns with my interest in event-driven, semi-autonomous systems that respond in real-time. It’s an area I’m eager to explore further.

What's Next for the Project? I plan to:

Add multi-day calorie tracking and progress charts. Tying It All Together: From Training My First Model to Object Recognition AI Building this voice-activated tracker was a lot of fun, and I’m already brainstorming new voice integration app ideas. It runs in the background once the app is opened, but I’m considering a headless process so it can work continuously. I’m also looking into service workers, which I used in a Next.js app Time Calculator to power a low-computation LLM.

Disclaimer: The views and opinions expressed in this blog post are solely my own and do not reflect the views of my employer, colleagues, or any affiliated organization.