As part of my internship with PayPal, I got a chance to create the PayPal action on google assistant. The idea was to demonstrate how google assistant users can use PayPal to perform quick payments by using voice commands. For this demo, the usecase we focussed on was peer to peer transactions. Here is a video with the demo in action:
Just to be clear, the demo was working in live, which means that it was linked to my actual PayPal account and actual money was being sent and received and my actual balance was being fetched. How I did it technically is described later in the post. First lets look at the challenges of designing the action.
Food for thought
As you see in the video, designing for voice interface presents very interesting challenges. There are many set interaction patterns on screen based UI that have become so common that we forget the important function that they perform sub-consciously for the user. One such pattern that I highlight in the video is display picture. Display pictures help users “recognize rather than recall” the recipient of the money they want to send. Voice UI has no such support for the user. Which raises an important question, will the user be able to trust this medium to perform transactions at all? There are many such challenges, some of them are:
- Onboarding is tricky. What is the first thing that you do when you download a new app or visit a new website. You poke around to see whats in it. This option to poke around is not available in voice UI. People treat voice UI like an alien object, unsure of what it can do or how to start communicating with it. It is like early days of the internet and designers have to find creative ways to solve this problem, till someone solves it for everyone.
- Voice UI is an open slate and users will always bump in the edges of your action’s limitations. You cannot prevent users from making a demand that your action cannot fulfill, hence it becomes very important to be mindful of not only the usecases that you want to address but also the ones you don’t. For example in the PayPal action, it was natural for people to say something like “send $5 to Ben and $10 to Jerry”. While on the app the same user will naturally go through the flow twice to make these two transactions, voice UI somehow gives them an expectation that they can do two transactions with one command. As designers we have to be mindful of this behavior and design our response accordingly. But at the end of the day technological limitations are a huge and very real. You cannot design for all the edge cases and there will be times when your action will have no idea what the user is saying and end up giving a generic error message. Lack of a proper error message will make the user confused about what the problem is and the whole experience gets frustrating.
- You are making a personal assistant and people want it to behave like one. There is something about the affordance of voice that makes users expect a very personal experience. For example while user testing the PayPal action, one user gave the command “Send money to my daughter”. For someone talking to their personal assistant, this is a legit demand. Same is the case while dealing with contacts. The users don’t want their assistants to get confused with multiple contacts of same name everytime. For example if the user says “Message Ajay I’ll be 10 minutes late for dinner”, as personal assistant I should be able to deduce which “Ajay” is the user talking about in most of the cases, no matter how many people with the name Ajay are in my contact list. While this capability is a good to have feature in screen based UI, it becomes critical in voice UI. This is because there is no good way for user to choose between a list of 10 items in voice UI.
- This brings me to the next point on the importance of personal assistant evolving and becoming smarter with use. This is not specific to voice UI but having an AI first approach to personal assistants is a must. Your users don’t want to go through the “50 first dates” experience with your action. There has to be a familiarity in interactions of you personal assistant with time.
- All the basics remain the same. It is easy to forget that we are not changing the human nature just by changing the UI to voice. So when you are taking your app from a screen based to a voice UI remember that the user is still looking for same confirmations and micro interactions. A very good example for this is “recognition rather than recall”. We all know about this principle and thats why we give an option to users to upload their display pictures, so that when other users are looking for them they can recognize your picture rather than recall your full name. This principle obviously is true for voice UI as well. For example while user testing the action for PayPal, I asked one question to every user, will you trust this action to carry out big amount transactions. The answer was no, all the time. That was because the action failed the “recognition rather than recall” principle. In absence of a visual confirmation, users had no way to be absolutely sure that they have selected the right person to send their money to.
The technology behind building action on google is equally interesting. Understanding the limitations of NLP was fascinating. Working with PayPal APIs was fun and gave me a chance to see how PayPal works on the backend.
I have written a detailed post on how to build:
This is basically how my webhook in the demo is working:
- I need three variables to make the payment happen, amount, payee and funding instrument.
- API AI is good at understanding amount. 99% of time it is able to understand the value and currency accurately.
- Payee can be set by saying the name, email address or phone number. When the user says a name, I fetch a list of all my PayPal contacts and look for that name in my contact list. Handling cases of multiple contacts with same name is a gold mine of discussion on how to make the experience good. We used an AI first approach in which we try to guess which of the matching contacts is the user most likely to be sending money in the current context. If it still cannot be resolved we asked user to provide the phone number or email address.
- Funding instrument was defaulted to the preferred payment method that user has set in the app.
- Code was written in such a way that user is allowed to change any detail at any point. This makes the whole experience very fluid and natural.
The internship ended with a showcase where I got a chance to demo it to several users. This gave me a chance to collect some good and bad feedback.
- Security of one of the biggest concerns of the users. What if someone walks in my home and transfer all my money to their account? This was a known concern. Since the aim of the demo was to show the capabilities of PayPal in this medium, security was not a main concern for us.
- People loved to interact with it. They were able to talk to the demo naturally and were amazed by apt responses that they got back.
- Onboarding was a concern, people didn’t know how to start talking so I always had to do one small demo before they could interact with it.