Caption AI | Visual Guidance System


Design Manager with 1 direct report & 1 design contractor

Challenge: 3D Movements in 2D


How do you communicate and guide a user to make movements in 3-dimensional space, using visualization on a 2-dimensional screen?

One of the most challenging aspects of cardiac imaging is the fine motor skills involved to make micro-adjustments with the ultrasound probe in order to capture the exact beam of the heart. While developing the Caption AI guidance software, I wanted to create visual cues to guide users to make the correct movements to obtain better quality images. Internally, we referred to it as EchoGPS as it mimics navigational guidance for a driver to reach their desired destination.



Along with didactics from our in-house clinical specialists, I used a combination of research methods to acquaint myself with cardiac imaging technique:​

  • Fly-on-the-wall Observations: I shadowed two sonographers at Stanford Hospital to understand their user journey and expertise

  • 1:1 Interviews: I asked nurses and nurse practitioners about their first-time experience using ultrasound to better understand a novice perspective

  • Hands-on Training: I put myself in the shoes of a beginner* and considered ways to accelerate the learning curve

  • Secondary Research: I studied textbooks and watched online videos aimed for medical training, shown at right​

*And eventually became the most proficient non-clinical scanner in the office, championing the practice of dogfooding your own product!


The initial iteration of this concept was straightforward, using recognizable navigational arrows to communicate movement. The icons appeared one at a time with accompanying text—each icon at a different onscreen location to further differentiate from one another.

Here's what we learned:

  • Familiar movements such as rotate clockwise and counterclockwise were straightforward and easily followed, while more complex, 3-dimensional movements such as "Tail up" and "Tail down" were more challenging to conceptualize.

  • The varied locations for icon placement was not intuitive, nor was it understood after explanation. This nuance was lost on a novice user, who was already overwhelmed with scanning for the first time.

  • Some users relied on the prescriptive text only, while some identified as visual learners and found the icons useful, but hesitated when presented with unfamiliar 3D movements.

  • Throughout this process, I became acutely aware of the physical and emotional endeavor it takes to master a foreign technique to help care for a patient at bedside.

Early designs of prescriptive guidance icons during scanning


To align on goals of the project cross-functionally, my design team proposed the following principles for the redesign:

  • A user, both novice and trained, needs to inherently and immediately understand the desired movements communicated by the visual system.

  • The icons need to standalone and be understood without any accompanying text.

  • The icons need to be minimal in style, without use of unnecessary elements like ornaments and shadows.

  • The icons need to be clearly differentiated from one another, have appropriate level of contrast, and be readable from a distance.

  • The focus of the system needs to be on the physical movement desired, rather than the user or object making the movement.

  • The visual system needs to be scalable across various organs and probes.



Based on my design intuition and from developing the design principles, I believed that a more minimalist representation of the movements without depicting the probe would be most understandable.

My hypothesis was that seeing an illustrated probe onscreen would be distracting to making critical movements, and an unnecessary cognitive leap for users. Although I was alone in this notion, I understood that it would take testing and user feedback to establish the right solution.

The validation test plan consisted of 15 internal and external users, with varying expertise in scanning. Users were shown both icon sets without text, and asked to make the desired movements one-by-one on a dummy. The order of the sets shown was randomized. Observations were mapped to capture accuracy and hesitation for each individual desired movement.

Prior to the exercise, 80% of users expressed a preference for realistic icons, but once completing the test, all but one felt that the minimalistic set was more intuitive. The data supported this preference:

  • New Concept A | Representative

    • Accuracy: 83% 

    • Hesitation: 32%

  • New Concept B | Minimalistic

    • Accuracy: 96%

    • Hesitation: 14%

As a result, we implemented the minimalistic system (as seen left), which will scale across multiple scanning views, different types of probes, and across screen sizes. Post-implementation user testing with 6 additional med students, 2 ED physicians, and 8 intensivists has shown consistent results of movement accuracy and usability.

  • Early Alignment: Sharing the design principles for the project and aligning on them cross-functionally was key to many discussions during the redesign. Having a shared understanding of goals to assess the validity of the explorations became a helpful baseline to keep moving forward.

  • Recognize Individual Contributor Strengths: One of the designers on my team has a background in industrial design and could iterate quickly through three-dimensional variations that required perspective drawing. This was a productive way of leveraging her strength during the generative ideation phase.

  • Understanding the User First-Hand: By becoming a novice scanner myself, I gained insight into initial challenges and pain points in the experience, and instilled empathy into the design for the target audience.


  • Combining Design Intuition + Data: Merely relying on my design intuition would not have been persuasive enough for making a decision on the final solution. Pairing an initial hypothesis with data from user testing was a powerful way of demonstrating the effectiveness of the proposed visual system.