TLDR: Even though I agree with Jared on many of his arguments (better don’t ever argue with Jared), as someone who works most of the time for early stage products, who have to learn about their growth potential, I still think the NPS1 is a metric you could work with. That said, you have to be careful about your expectations in the NPS, and think critical about what the metric means. The NPS is not a perfect metric, though I am still looking for a better alternative. Please let me explain.
Think about your expectations
As always, if you want to find out if something makes sense to use, it starts with your expectations. Reading Jared’s article you might assume his expectations are pretty high. He uses examples where the NPS impacts bonus payments and where people think the NPS might give insights on customer loyalty as well as being an overall rating for UX.
In projects that I was lucky to be part of, we used the NPS with much lower expectations. We interpreted the NPS as a proxy for how we are doing in the context of an overall customer opinion and the product’s potential for word of mouth growth. We didn’t treat it as an absolute number like revenues or user retention metrics, more like a prediction. We thought about it more like a weather forecast. Something that helped us to adjust our plans based on the feedback that we got by users.
“Better than nothing” never sounds exciting, but often it’s the best option available
In the field of startups talking to your customers is key, and you can’t do enough of it. The reality shows though, that in small teams time and resources for research is limited. And shipping simply feels better than finding out, if you ship the right things. So in the end teams often don’t track any qualitative data on a regular basis over a longer period of time. Having a lot of quantitative data gives teams also the feeling that they are good. I don’t want to discuss at this point the pros and cons of qualitative data, though I think you are missing out, if you are not trying to get at least some qualitative feedback on a regular basis.
“Better than nothing” never sounds exciting, but if you are working with limited resources for an early stage product, it’s often the best option you have. In the following a few thoughts on what we used the NPS for. What it can help you with, and a few comments on what it doesn’t help you with or what you even should not do with it:
Let’s start with the things we used it for:
Just because we used the NPS only as a rough proxy doesn’t mean we would not have requirements and some expectations in it:
- We wanted to get feedback on the growth / word of mouth potential of our product. (With focus on the overall usefulness and value of the product.)
- We wanted a tool/method that is easy to implement and low cost.
- We wanted a tool that is not in the way of the user (few questions, not annoying uncomfortable to answer, familiar concept for some users).
- We wanted to see a performance trend over time.
- We wanted to install a trigger that nudges users to give us open ended qualitative feedback.
- We wanted a “realtime feedback channel” to the team.
- We wanted a way to identify the users who already really like our product, even though they might not use it a lot yet.
On all the points above the NPS does the trick. Maybe not perfectly but, again, it’s better than nothing. I want to pick three things from the list that from my perspective had the most positive impact on the projects:
– Having a “real time human feedback channel” for the full team:
We created a dedicated NPS chat channel. It posted every rating, every user comment immediately to our team chat. This sounds like a little thing, though you might underestimate how much you will love and learn from this channel. The fact that you get the feedback unfiltered and immediately makes it feel a bit more like “real human interaction”. It was interesting to see how quickly productive discussions evolved about certain issues based on the customer comments. And even if the occasional troll comes around, it’s in the end much more fun to laugh about such comments together.
– Getting a performance trend for how we are doing over time.
“Over time” is the important part here. For us the NPS was always just about how are we doing today, this week compared to yesterday, last week, last month etc.
Which gave us important insights when we changed something in the product, but also it gave us interesting data for our marketing. It’s one thing to see conversion rates for a campaign or press article, it’s something different to see how well the products resonate with each of the traffic sources. It became pretty clear early on that the NPS correlates a lot to from what kind of source the most traffic was coming from.
And yes, you might argue that you get such data also via some cohort analysis of retention data, though via the NPS you get that data much faster and more importantly something that quantitative data can’t give you. There are many reasons why someone is not coming back to your product immediately, but still kind of like it. If you see in a cohort positive NPS numbers, but low retention, that might give you some clues about what to do next. Also being able to segment all the people that think positive about your product for retargeting, community building etc. is very useful.
– We wanted to install a trigger that nudges users to give us open-ended qualitative feedback:
We put feedback buttons everywhere, but unfortunately people use them in most cases only, if they need some support or your product is broken. It turns out that the NPS method does a nice job in animating users to give open-ended feedback. In behavioural science people write about an effect, that if you ask people first for a small favour, afterwards they are more willing to do you the bigger favour that you were actually looking for. In the case of NPS you trigger users first with a fairly simply thing to do, ask for a rating, and afterwards you ask them for an overall comment. As people start to think about your product already while answering the effortless first question, they are in the next step less overwhelmed from an empty input field and are more likely to write you a quick comment.
The things you should not use the NPS for:
– Using it as a team goal connected to financial rewards or bonuses
Jared has mentioned in his article a few flaws that make it obvious that NPS is not a fair goal to measure the performance of a team. Mainly because it’s kind of easy to game.
Also, the score is impacted by the current source that brings the most traffic to your site, as mentioned above, which maybe has nothing to do with your product team’s performance in the first place.
I have seen several times that people try to compare NPS between companies, even for different industries.
There is as far as I know no defined procedure for measuring a NPS. So given the fact that companies track the NPS in different ways, I don’t think comparing the numbers make sense. At least if you can’t make sure that you are in control of both measurements.
– Overall rating for UX
First of all it depends on your definition of UX. But if you have something in mind that focuses on the experience and usability of the digital product, I don’t think that based on the question used in the NPS survey you can see it as a UX metric. The product might be beautifully designed, functional, everything flawlessly executed, but if you simply don’t solve a problem of the user, or your business model has weaknesses, you might not get a high NPS.
Secondly, the rating that you get from the NPS is always in the context of what the user has experienced just right now. Which means if you ask someone right after onboarding you will get feedback on that, not so much on the rest of the product.
– Customer Loyalty
That’s not what the NPS question asks. It is not about “will you use it again”. If you care about loyalty, you better have a look into your retention metrics.
– Be careful with your measurement process
Don’t compare your numbers when you changed anything on the way how you track. One thing that made me uneasy with the NPS was that no one tells you when, how and who you should ask your NPS questions. Which is obviously a big deal. Some companies ask you via email, some companies during the user experience, some companies ask everyone no matter if you are a recurring user or if you are a first time user. All these and many more questions have a huge impact on what results you will see.
In our projects we focused on first time users, asking them after a certain amount of screens and time on site. We tried to find a sweet spot where users had enough time on site that they could make a statement at all. We focused though on getting an opinion about the first impression of a user. We were totally aware of the fact the we won’t get feedback from users who already were confused or didn’t like the product at all, cause they left already. Which is though not a big deal, cause we have these numbers via the usage data. So in our case it made sense to look at the NPS numbers always together with the bouncing data. And a quick tip: Anytime you see a huge spike or drop in your NPS, first double check if the trigger for the NPS question still is at the same place as before.
Accepting some compromises
You might think now, all good, but after reading Jared’s article, that NPS still seems ridiculous.
Well I don’t want to disagree with Jared, though from the perspective “better than nothing”, I still think I could work with it, because of lack of a better alternative. Also some of the arguments in Jared’s article are based on our expectations not so relevant, as they might be for someone else.
Here are a few comments on some of on Jared’s arguments:
“NPS uses an 11-point scale. That’s a large scale with a lot of numbers where the distinction isn’t clear. You and I could have the exact same experience, yet I’d give it a 7 and you’d give it a 6. Is there a meaningful difference?”
The scale, it’s the biggest pain point. We had a huge discussion in the team on that one as well. We switched even in one project to a 6 point scale. However I think the challenge here is more an overall issue that you might have with any kind of survey answer with a scale. You and I could have the exact same kind of experience and give it a different rating no matter what the scale says. No matter if its a 3 point scale or a 11 point scale. The meaningful difference comes with a bigger number n.
I personally don’t think the 0 to 10 scale is a bad choice, cause people internationally are pretty familiar with it. School ratings for example are often even better, but that doesn’t work if you ask people globally.
Though I still don’t get why “5 and 6” are already detractors.. In the end though it doesn’t matter to much, see the next point.
“An average of 5 doesn’t sound good or bad. It’s neutral. However, -40 sounds awful. (Not as awful as -100, but still pretty bad.)”
That’s true, though we didn’t care about the “absolute number” so much anyway. It’s much more about if you are better than yesterday.
“Small incremental improvements should result in small incremental score increases. Only large improvements should result in large score changes. Yet, for reasons nobody can explain, NPS doesn’t work that way.”
Another good point by Jared. Though I personally can live with this, cause these incremental improvements doesn’t work in real life that way as well. The decision to recommend something or to pull out your credit card, is a binary thing. It’s a “step function”. At a certain point you do it. If you are close to it doesn’t matter too much. Given the issue that it is hard to track incremental differences in ratings anyways, cause it’s a subjective rating by users, I think the value is in identifying people who are “all in” and support you, and these people who are “pretty disappointed”.
“What if you made the product better enough for each to put respond with an eight? 8, 8, 8, 8, 8, 8, 8, 8, 8, and 8.
The average is 8. Yet, NPS is now… 0.
Make that data set to be all nines: 9, 9, 9, 9, 9, 9, 9, 9, 9, and 9.
The average is 9. And miraculously, NPS is 100!
That’s 100% improvement over 8, so woohoo! You get your bonus, finally. By nudging that data just a little, you changed the NPS score from the middle of the scale to the highest possible score. Aren’t you a genius?”
That’s a funny argument, though this is again about the step function issue that you might have in the real world as well. If you have on one day 15 users and all left without buying something (even though all of them were very close to doing so), and you fire your sales team because of zero revenues, and the next day, without changing much, 15 out of 15 users buy, then all you can learn is that you should not fire your sales team after day one. This is an issue of “n”. We should in that case not discuss the fact that revenues are indeed a very good metric to rate the performance of a sales team.
The higher “n” the lower the probabilty that you get such an extreme case and “close positive events” and “close negative events” will cancel each other out.
“The best research questions are about past behavior, not future behavior. Asking a study participant “Will you try to live a healthy lifestyle?” or “Are you going to give up sugar?” or “Will you purchase this product?” requires they predict their future behavior. We are more interested in what they’ve done than what they’ll do. We’re interested in actual behavior, not a prediction of behavior.”
Again, valid point. Though working with a digital product has the nice side effect that a lot of things that happened in the past are tracked, and we don’t have to ask too much about that anyways.
If I want to have though a forecast, then I need to ask about the future. For an early stage product getting some “forecast”/”prediction” data is helpful. It might help you to course correct earlier.
“Netflix paired it with another critical question. They asked all new subscribers “Were you referred to us by a friend or colleague?” Netflix saw a steady increase in new subscribers and growth when people said yes to these questions. And when people stopped saying yes, they saw an cancellations increase and new subscriber acquisition slow down. “
This is an excellent method, and if you have a product that has already a big userbase, that’s a very good question to ask.
Unfortunately you can’t connect this data to the user experience of that person that did this referral (if you have not some sophisticated referral code system). You loose the information about what might have worked so well for that user or from what traffic source this person came from, that she was so excited to make the recommendation etc.
“We’ve seen many participants rate a 0 because they couldn’t think of anyone to give the recommendation to. Others rated a 10 because they had friends who worked at the company.”
The later is something you always have to deal with. You are never sure if someone’s mum is sneaking in and gives a review.
The former is actually valuable data from my perspective. It tells you about growth potential. If that guy does not know anyone who he thinks this product is useful for, than that is relevant data. Mums data matters of course, too.
“Experience and NPS score rarely match
If United asked me to rate the service of a given flight on a scale from zero to ten, I’d rarely give it above a five. (Fives are the days when nobody gets beaten.)
Am I a loyal customer of United? If I were to honestly answer either the NPS question (future behavior) or the Netflix variant (past behavior), I’d rate United pretty high.”
NPS is not about experience only in the first place, I totally agree. It’s much more about the overall usefulness and value that someone gets out of the product. Is the NPS important for someone who builds an early stage product, you bet. This product market fit information is often much more useful than getting UX data.
“We Can’t Reduce User Experience To A Single Number
This is the biggest flaw of NPS. It tries to achieve an outcome that can’t be achieved. It’s appealing to our management because it promises to solve a problem that can’t be solved so simply.
Customer experience is the sum total of all the interactions our customers have with our products, sites, employees, and the brand. Every sequence of interactions will differ for every customer.”
Amen. On the second thought though I would love to add, that it is only the flaw of NPS if you expect from the NPS that it does exactly that. Which is kind of the point of this article.
If you have a different set of expectations in the NPS I think it’s still a pretty helpful method to use. Personally I don’t think the initial goal of a NPS was to “Reduce User Experience To A Single Number”. It’s at least called Net Promoter Score, and not UX Score.
To wrap this up:
Any method or tool that is easy to implement, that gives you feedback that you can connect to a reason, that also gives you an easy way to communicate the results to the team, and most importantly allows you to track development over time, has value.
It still means though that you have to analyse and interpret these informations you get here.
If you think there is a magic number that tells you the full story about how well you are doing overall, and how loyal your customers will be, and how much they like your product and and and, then you are very likely wrong.
Therefore I totally agree we the concerns in Jared’s article, especially if you might expect too much from the NPS.
If though you work in an early stage product phase, and you are looking for the same things that I mentioned in the list of expectations above, I still can recommend the NPS.
(Please don’t get me wrong: If you have the resources for a dedicated user research and analytics team, you should have better methods than a NPS score. If your executive team really wants just one number instead of a 3 slides report, your team might be able to create a custom shaped metric for your product and market.)
- The NPS was an idea introduced back in 2003 in the Harvard Business Review and became a few years ago pretty popular in the startup world. Here is the original article.