How We Cut Critical Issues in Half While Doubling WAU

Many times, our work experiments aren’t well defined to measure results, or they don’t turn out the way we hoped. But sometimes, you circle back to check in on a series of small investments over a long period in pursuit of one goal, only to find they were wildly successful. This is one of those stories: the practical steps that my team took in 2020 to reduce our critical issues by 100% while growing our weekly active users (WAU) by 150%.

There was no magic formula, just a series of smart decisions, consistency, and proactivity. I’ve identified 5 things that influenced our ability to deliver massive stability gains during a period of high growth.

Shore up the weak parts of your application

We all know what those are. For us, that was telephony. We routinely got complaints about phone calls dropping, poor call quality, calls not connecting or disconnecting properly, and more. It had the potential to ruin an interview, and canceled interviews meant lost revenue.

There was no simple fix for these issues, just a good old fashioned rewrite of our telephony code based on all the knowledge we’d gained in the two years since we originally wrote that part of the app. We had made the best decisions with the knowledge we had then, and now we knew more and could improve.

We had tried fixes around the edges over the previous six months, but it became clear that we needed to invest in overhauling that part of our application to meet our customers’ expectations that “it just works” every time.

Train and equip front-line staff

We are equipped with a team of front-line ninjas who help customers with any technical issues that may arise during their interviews. It is these exceptional humans who bear the brunt of any bugs, and they are adept at helping customers manage firewalls, browser plugins, browsers that are years out of date, people who just have their computers on mute but insist that everyone’s sound is not working, and so much more.

To improve and scale the relationship between the Product and Customer Success teams, we did three simple things:

  1. Improved communication about upcoming changes, recent releases, vendor outages, and more. This communication came in the form of weekly meetings between the heads of our two teams, as well as internal product announcements and trainings with designated team leads, who were responsible for communicating out the changes to their reports.
  2. Created a solid, clear escalation process for issues. By aligning on what defines a “critical issue,” we prevented the “boy who cried wolf” syndrome in which the engineering team downplays issue reports by the front line teams after too many false alarms. We dialed in the definitions, communicated them, and created a separate place to report issues of lesser importance.
  3. Equipped support agents with workarounds for known issues. One of the best improvements we’ve made to our critical issue process is to state clearly (usually in Slack) what the issue is, how and when it occurs, who and how many people are impacted, any known workarounds, and an estimated resolution time. These things get continually updated as more information comes in, and our teams now know to look in one place for the latest information.

Build better partnerships with vendors

When a part of your ecosystem goes down, you also go down. We had been experiencing serious stability issues with one of our main subprocessors, to the point that I was considering switching to another vendor, despite that being akin to major surgery.

As a last-ditch effort before undergoing that change, our key leaders escalated our experience with our account manager, the regional sales director, the product director, and others on up the chain. It was unpleasant, but necessary, to share the pattern of issues we’d had with them and ask for a plan for improvement. They provided it, and have followed through, allowing them to retain our business and preventing us from having to make an unplanned infrastructure change.

What I learned through that experience is to be loud, be insistent, and be consistent with asking for what you need — and be ready to walk away if you aren’t getting what you need.

Review and improve infrastructure regularly

Our team meets weekly to review our infrastructure health, utilization, account activity, and more. We discuss available upgrades and the right time to undertake them. We identify processes that should be optimized, queries that should be refined, and anomalies that should be investigated.

Then we do the hardest part, which is write the stories during the meeting and discuss prioritizing them in the sprint planner that immediately follows the infrastructure review meeting. Doing so keeps the momentum going on necessary work, and I credit it as the reason that our infrastructure could scale to more than double our previous usage without a hiccup.

Unify the reporting structure

I credit our ability to act on the vulnerabilities and opportunities we find in the infrastructure review meeting to our unified reporting structure. For the last almost two years, product and engineering have been unified to report under me. As such, there is less tension between what customer-facing work the product manager wants to get done in a sprint and what infrastructure work the development team feels is important.

It creates a virtuous cycle — when developers don’t feel that they have to fight to spend time on anything that doesn’t directly benefit a user, they are more thoughtful about prioritizing the truly important things and letting other things wait. It’s a mentality of possibility and abundance rather than scarcity and zero-sum. I’ve also seen it create a happier development team that feels respected and empowered.

I don’t think it matters whether the teams roll up under engineering or product. From what I am seeing, it is pretty evenly split between the top role being CTO or CPO, as the CPO title gains momentum. Regardless, bringing both organizations under the same umbrella leads to more aligned goals, incentives, and priorities.

The result

The result of all these efforts was that we cut our critical issues in half when comparing the first half of 2020 to the second half of 2020. In that same time, our weekly active users jumped 150% as lockdowns forced a digital migration. November and December were already historically our busiest months of the year, and the conditions of 2020 only amplified that. With the processes we have in place now, I feel confident that we are equipped for continued growth and scale without significant growing pains!

Originally published at https://ashleywali.com on January 28, 2021.

Product leader @discuss.io. Inspired by @adilwali, our 2 boys, & the beauty of the PNW. Passionate about gender equality, travel, & weekends at home.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store