Applying HumanOps to on-call

Originally written for the StackPath blog.

One of the two core foundations of SaaS monitoring is alerting (the other being metric visualization and graphing). Alerting is designed to notify you when things go wrong in your data center, that there’s a problem with your website performance, or if you’re experiencing server downtime. More specifically, infrastructure monitoring and website monitoring are designed to notify you in such a way that you can respond and try to fix it. That often means waking people up, interrupting dinners, and taking people away from their family to deal with a problem.

When the very nature of a product deliberately has a negative impact of the quality of life of your customers, it is your responsibility as the vendor to consider how to mitigate that impact. Trying to understand how StackPath Monitoring impacts our customers through their on-call processes was why we started HumanOps.

So how do you apply HumanOps principles to (re)designing your approach to on-call?

HumanOps is made up of 4 key principles. These are explained in more detail in the What is HumanOps post, but essentially it boils down to:

  1. Humans build & operate systems that have a critical business impact.
  2. Humans require downtime. They get tired, get stressed, and need breaks.
  3. As a result, human wellbeing directly impacts system operations.
  4. As a result, human wellbeing has a direct impact on critical business systems.

These can be applied through considering some key questions about how on-call processes work.

How is on-call workload shared across team members?

It’s standard practice to have engineers be on-call for their own code. Doing so provides multiple incentives to ensure the code is properly instrumented for debugging, has appropriate documentation for colleagues to debug code they didn’t write, and, of course, to rapidly fix alerts which are impacting your own (or your colleagues) on-call experience. If you’re being woken up by your own bad code, you want to get it fixed pretty quickly!

With the assumption that engineers support their own code, the next step is to share that responsibility fairly. This becomes easier as the team grows but even with just 2-3 people, you can have a reasonable cycle of on/off call. We found that 1-week cycles Tuesday – Tuesday work well. This is a long enough period to allow for a decent “off-call” time and has a whole working day buffer to discuss problems that might have occurred over the weekend.

You also want a formal handoff process so that the outgoing on-call engineer can summarize any known issues to the person taking over.

How do you define primary and secondary escalation responsibilities?

The concept of primary/secondary is a good way to think about on-call responders and the Service Level Agreement they commit to with each role.

The primary responder typically needs to be able to acknowledge an alert and start the first response process within a couple of minutes. It means they have to be by an internet connected computer at all times. This is not a 24/7 NOC, which is a different level of incident response.

Contrast this with a secondary who may be required to respond within 15-30 minutes. They are there as a backup in case the primary is suddenly unreachable or needs help, but not necessarily immediately available. This is an important distinction in smaller teams because it allows the secondary to go out for dinner or be on public transport/driving for a short period of time (i.e. they can live a relatively normal life!). You can then swap these responsibilities around as part of your weekly on-call cycle.

What are the expectations for working following an incident?

An alert which distracts you for 10 minutes early evening is very different from one which wakes you up at 3 a.m. and takes 2 hours to resolve, preventing you from going back to bed again because it’s now light outside.

In the former situation, you can still be productive at work the next day, but in the latter, you’re going to be very fatigued.

It’s unreasonable to expect on-call responders to be completely engaged the day after an incident. They need to have time to recover and shouldn’t feel pressured to turn up and be seen.

The best way I’ve seen to implement this is to have an automatic “day off” policy which is granted without any further approval, and leave it to the discretion of the employee to decide if they need a full day, work from home, or just to sleep in for the morning.

Recovery is necessary for personal health but also to avoid introducing human errors caused by fatigue. Do you really want someone who has been up all night dealing with an incident committing code into the product or logging into production systems?

This should be tracked as a separate category of “time off” in your calendar system so that you can measure the impact of major on-call incidents on your team.

It also applies if there is a daytime alert which takes up a significant amount of time during a weekend or holiday. The next work-day should be taken as vacation to make up for it.

Having the employee make the decision, but with it defaulting to “time off allowed” avoids pressure to come in to work regardless. Reducing the cultural peer pressure is more challenging, but managers should set the expectation that it is understood that you will take that time off, and make sure that everyone does.

How do you measure whether your on-call process is improving?

Metrics are key to HumanOps. You need to know how many alerts are being generated, what percentage happen out of hours, what your response times are, and whether certain people are dealing with a disproportionate number of alerts.

These metrics are used for two purposes:

  1. To review your on-call processes. Do you need to move schedules around for someone who might have had more of their fair share of alerts? Are people taking their recovery time off? Are people responding within the agreed SLAs? If not, why not?
  2. To review which issues should be escalated to engineering planning. If alerts are being caused by product issues they need to be prioritized for rapid fixes. Your engineers should be on-call so they will know what is impacting them, but management needs to buy into the idea that any issues that wake people up should be top priority to fix.

Eliminating all alerts is impossible, but you can certainly reduce them. You can then track performance over time. You’ll only know how you’re doing if you measure everything though!

How are you implementing HumanOps?

We’re interested in hearing how different companies run their on-call so we can share the best ideas within the community. Let me know how you’re implementing the HumanOps principles. Also, we encourage you to come along to one of our HumanOps events to discuss with the community. Email me or mention me on Twitter @davidmytton.

Configuring for security, privacy and convenience

Balancing security, privacy and convenience is not easy. I’ve spent quite a lot of time figuring out how to configure my various computer systems with this goal in mind.

Computers are supposed to make our lives more convenient and you sometimes have to trade privacy for convenience e.g. Outlook processing emails to allow you to use Focused Inbox. AI is going to bring a lot of productivity improvements but I always prefer when that is processed on device, as with Siri Suggestions for things like when to leave for an event.

You also have to consider your adversary.  There are reasonable steps you can take without seriously damaging convenience to provide safeguards against criminals and data profiling. But if you are trying to evade active government surveillance rather than just avoid being swept up in mass snooping, then things get significantly more difficult.

Targeted surveillance is, and should be, allowed (with appropriate legal safeguards). That is not what I’m trying to protect against here. Good security should be expected by all. Privacy is about having choice and control over your personal data.

Here’s how I approach it as of Oct 2018. I expect these practices to change over time. In no particular order:

  • Only use Apple mobile devices. They are the only company that builds privacy by design into their products. Their business model is to sell high priced hardware, not to sell your data. They have 5 year lifecycles on software updates which are delivered regularly, unlike Android which requires updates to go through carriers (usually delays by months, or forever). Buying direct from Google means giving up all your privacy. And the Apple model is to run as much computation on-device, whereas Google is the opposite – all processing is in their cloud environment, which is secure, but has no privacy.
  • Don’t get an Alexa device or Google Home. If you want a voice assistant, Apple’s HomePod with iOS 12 Shortcuts works very well.
  • iOS is the only secure OS that achieves the security, privacy and convenience balance. Any sensitive work should be restricted to iOS devices only. macOS is the next best option. If you don’t need convenience, use Tails.
  • Configure macOS and iOS for privacy. In particular, this means using full disk encryption and strong passwords.
  • Don’t use any Google services and be sure to pay for key services like email, calendar and file storage. If you’re not paying then your data is the product – you want a vendor who has a sustainable business model in selling the service/product itself, not your data. Running your own systems significantly reduces the security aspect of the balance, so it’s better to use either iCloud (if you don’t want your own domain), Microsoft Office365 (which is what I use) or Fastmail. For £10/m I get access to 1TB of OneDrive storage, Mail, Calendar and the full suite of Office products. I pay an extra £1.50/m on top of that for Advanced Threat Protection. Microsoft allows you to select the country where data is stored, has privacy by design and has a good record of defending against government access requests. The Outlook iOS app is actually very good but the Exchange protocol is supported by every client, so you have a good choice. Focused Inbox is great. Bigger corporates like Microsoft have significantly more resources to invest in security (which is why I prefer Office365 over Fastmail).
  • Unfortunately, Apple Maps is still rubbish compared to Google Maps. They’re generally comparable in major cities so I always prefer Apple Maps until the last-mile destination directions, where Apple Maps is regularly inaccurate. At that point I switch to Google Maps on iOS.
  • Don’t store anything unencrypted on cloud storage providers that you would be concerned about leaking if someone gained access. Encrypt these files individually. You can use gpg on Mac but it’s not especially user friendly. I prefer Keybase but it still requires using the command line. These files will be inaccessible on mobile so you may want to consider using 1Password document storage instead, for small files (they have a total storage limit of 1GB). Office files can be password protected themselves, which uses local AES encryption.
  • Delete files you don’t need any more and aren’t required to keep for tax records. In particular, set your email to delete all messages after a period – the shorter the better. I delete all my emails after 1 year. Configure macOS Finder Preferences to remove items from the Trash automatically after 30 days.
  • Don’t send attachments via email. You might delete your emails after a time but the recipients probably don’t. Instead, share them using an expiring link to online cloud storage.
  • Use a password manager and 2 factor authentication. These are just security basics.
  • Don’t use Google Chrome. Only use Safari or Firefox. Configure your browser to auto clear your history on a retention period that allows convenience but also privacy. I set mine to clear after 1 week. I’ve never needed to go back any further. Be sure the “Prevent cross-site tracking” option is configured in Safari settings.
  • Set up DuckDuckGo for your search provider on macOS and iOS. I’ve not used Google search for years.
  • Buy 1Blocker X for iOS and 1Blocker for macOS (see a comparison of other options) to block trackers and ads in Safari.
  • Set up Little Snitch outbound firewall and be sure you know which apps you’re approaching outbound internet access for.
  • Set up Micro Snitch to be notified whenever your mic and camera are in use. Cover your device cameras as a backup.
  • Don’t use SMS – disable fallback in iOS settings. WhatsApp encryption is good but all the metadata about who you are communicating with is shared with Facebook. Unfortunately, it has built up a considerable network effect so it is necessary to use it to communicate in the Western world. Few people use Signal, which is the best so  follow this guide to maximise WhatsApp privacy. iOS allows you to configure deleting iMessages after a period of time. I have mine set to delete after 30 days. You have to manually clear your WhatsApp conversations.
  • Don’t plug anything directly into any USB charging port in airports, hotels, or anywhere else. Use a USB data blocker adapter first.
  • Back up your files to cloud storage but only if they are encrypted locally first. Arq is a good tool to do this. Don’t use the same cloud storage as your main files e.g. I use OneDrive for my files and Amazon S3 for Arq backups.
  • Always use a VPN when connected to public wifi, or any network you don’t control, but don’t use a free VPN. This site has a good comparison but I use Encrypt.me on macOS and iOS. Encrypt.me is owned by StackPath, my current employer, so I know how all the internal infrastructure is set up i.e. we don’t log traffic. However, I also used it prior to joining StackPath and before Encrypt.me itself was acquired. Encrypt.me is a great consumer VPN but if you want more control and configuration options e.g. OpenVPN support, StrongVPN is another product from StackPath.
  • Change your DNS servers to use a privacy-first DNS provider, such as Cloudflare DNS. Do not use your default ISP DNS or Google DNS. If you have an OpenWRT router, configure it to use Cloudflare DNS over TLS because otherwise your ISP can still sniff your DNS requests.
  • Better yet, buy a router that allows you to configure DNS over TLS and connect to a VPN directly. I have a GL-AR750S configured to force all DNS over Cloudflare DNS over TLS and it is permanently connected to StrongVPN. This means all connections from home are encrypted before they even hit my ISP. The only downside is having to disconnect the VPN when using BBC iPlayer, because it detects the VPN. My wifi uses Mac whitelisting so only specific devices are allowed to connect.
  • Pay for Cifas protective registration and register your phone numbers on the TPS list.
  • Use Apple Pay wherever possible. The vendor doesn’t get access to any information about you and can only identify your payment information from a token specific to each transaction. This protects privacy and if the vendor is breached, your card details are safe. The usual contactless limit doesn’t apply to Apple Pay, which is limited only by your card limit.
  • Don’t buy Samsung TVs. There’s no need for any TV to connect to the internet so don’t connect them in the first place. Use a dedicated device like an Apple TV for your TV interface, it has a better UI anyway.
  • Be mindful of sharing photos online directly from your phone. They usually embed the location of the photo in the EXIF data.

Have I missed something? Let me know what else you’re doing.

Leaving the policing of the internet up to Google and Facebook

Consumers typically don’t want to pay for services that the internet has taught them should be “free”. Social networking, email, calendars, search, messaging…these are all “free” on a cash basis, but have a major cost to your privacy.

The best analogy I have heard to describe how these services work was in an episode of Sam Harris’ podcast with Jaron Lanier.

To paraphrase: imagine if when you viewed an article on Wikipedia, it customised the content based on thousands of variables about you based on such things as where you are, who you are friends with, what websites you visit and how often, how old you are, your political views, what you read recently, what your recent purchases are, your credit rating, where you travel, what your job is and many other things you have no idea about. You wouldn’t know the page had changed, or how it differed from anyone else. Or even if any of the inferred characteristics were true or not.

That’s how Google and Facebook work, all in the name of showing ads.

I don’t have a problem with trading privacy for free services per se. The problem is the lack of transparency with how these systems work, and the resultant lack of understanding by those making the trade off (ToS;DR). For the market mechanism to work, you have to be well informed.

We’re starting to see this with how governments are trying to force the big platforms to police the content they host but leaving the details to platforms themselves. Naturally, they are applying algorithms and technology to the problem, but how the rules are being applied is completely opaque. There’s no way to appeal. By design, the hueristics constantly change and there’s no way to understand how they have been applied.

Policing content is a problem that has been solved in the past through the development of Western legal systems and the rule of law. The separate powers of the state – government, judiciary and legislature – counter-balance each other with various checks and stages to allow for change. It’s not perfect, but it has had hundreds of years of production deployment and new version releases!

What has changed is the scale. And the fact that governments are delegating the responsibility of the implementation to a small number of massive, private firms.

It’s certainly not that the government could do a better job at solving this. Indeed, they would likely make even more of a mess of it e.g. EU cookie notice laws. But private companies can’t be allowed to do it by themselves.

The solution requires open debate, evidence based review, a robust appeals system, transparency into decision making and the ability for the design to be changed over time. But it also needs to be mostly automated and done at internet-scale. Unfortunately, right now I’m not sure such a solution exists.

Regulation always favours the large incumbents, stifling innovation and freedom of expression. Perhaps it is time for the legislative process to adopt a more lightweight, agile process with a specific, long term goal that successive governments can work towards. There tends to be a preference for huge, wide-ranging regulatory schemes which try to do everything in one go. Instead, we should be making small changes, focusing on maximum transparency and taking the time to measure and iterate. The tech companies need to apply good engineering processes to how they are developing their social policy, in public.

But without any incentive to do so, we risk ending up with a Kafka-esque system that might achieve the goal at a macro level, but will have many unintended consequences.

A practical guide to HumanOps – what it is and how to get started

Originally written for the StackPath blog.

Humans are a critical part of operating systems at scale, yet we rarely pay much attention to them. Most of the time, energy and investment goes into picking the right technologies, the right hardware, the right APIs. But what about the people actually building and scaling those systems?

In 2016, Server Density launched HumanOps. It started with an event in London to hear from some of the big names in tech about how they think about the teams running infrastructure.

How can you reach your high availability goals without a team that is able to build reliable systems, and respond when things go wrong? How does sleep and fatigue affect system uptime? System errors are tracked, but what about human error? Can it be measured, and mitigated?

With the acquisition of Server Density by StackPath, I am pleased that HumanOps now has a team dedicated to continuing to build the community. We’re open to anyone taking on responsibility for a local meetup but will also be running our own series of events in major cities around the world. The first of these kicked off this week in San Francisco.

 

What is HumanOps?

The problem

A superhero culture exists within technical systems operations.

Being woken up to fix problems, losing sleep to make an amazing fix live in production and then powering through a full day of work is considered to be heroic effort.

There is little consideration for the impact this approach has on health, family and long term well-being.

The aim

Running complex systems is difficult and there will sometimes be incidents that require heroic effort. But these should be rare, and there should be processes in place to minimise their occurrence, mitigating the effects when they do happen.

HumanOps events are about encouraging the discussion of ideas and best practices around how to look after the team who look after your systems.

It considers that the human aspects of designing high availability systems are just as important as the selection of technologies and architecture choices.

It’s about showing that mature businesses can’t afford to sacrifice their teams and how the best managed organisations achieve this.

If Etsy, Facebook, Spotify and the UK Government can do this. So can you.

How to implement HumanOps

The first step to implementing HumanOps is to understand and accept the key principles.

Key principles

  1. Humans build & operate systems that have critical business impact.
  2. Humans require downtime. They get tired, get stressed and need breaks.
  3. As a result, human wellbeing directly impacts system operations.
  4. As a result, human wellbeing has a direct impact on critical business systems.

HumanOps systems and processes follow from these principles.

HumanOps systems & processes

There are many areas of operations where HumanOps can be applied, but there are a few core areas which are worth starting with first. Each one of these could be a separate blog post so here are a series of questions to start thinking about your own process design.

  • On call
    This is where the most impact occurs. Being woken up to deal with a critical incident has a high impact, so it is important to design the on-call processes properly. Some key questions to ask: how is the workload shared across team members? How often is someone on-call and how long do they get off-call? What are the response time expectations for people at different escalation levels (e.g. do you have to stay at home by your computer or can you go out but with a longer response time?). Do you get time off after responding to an incident overnight? If so, is there any pressure to forgo that e.g. it should be automatic rather than requiring an active request. Do managers follow the same rules and set an example? Do you expect engineers to support their own code? Do you consider additional compensation for each on-call incident or is it baked into their standard employment contract? Do you prioritise bugs that wake people up?
  • Metrics
    You can’t improve something without measuring it. Critical out of hours incidents will happen, but they should be rare. Do you know your baseline alert level and whether that is improving? Do you have metrics about the number of alerts in general, number of alerts out of hours? Do you know if one person is dealing with a disproportionate number of alerts? Do you know which parts of the system are generating the most alerts? How long does it take for you to respond and then resolve incidents? How does this link to the business impact – revenue, user engagement, NPS? Are these metrics surfaced to the management team?
  • Documentation
    Only the smallest systems can be understood by a single person. This means writing and keeping documentation up to date needs to be a standard part of the development process. Runbooks should be linked to alerts to provide guidance on what alerts mean and how to debug them. Checklists must form a part of all human performed tasks to mitigate the risk of human error. How do you know when documentation is out of date? Who takes responsibility for updating it? How often do you test?
  • Alerts
    Most system operators know the pain of receiving too many alerts which are irrelevant and don’t contain enough information to resolve the problem. This is where linked documentation comes in but the goal should be that alerts don’t reach humans except as a last resort. Interrupting a human should only happen if only a human can resolve the problem. This means automating as much as possible and triggering alerts based on user-impacting system conditions, not just on component failures where the system can continue to operate. Are your alerts actionable? Do they contain enough information for the recipient to know what to do next? Are they specific enough to point to the failure without resulting in a flood if there is a major outage?
  • Simulation
    A large part of the stress of incidents is the uncertainty of the situation coupled with the knowledge that it is business / revenue impacting. Truly novel outages do happen but much of the incident response process can be trained. Knowing what you and each of your team members need to do and when will streamline response processes. Emergency response teams do this regularly because they know that major incidents are complex and difficult to coordinate ad-hoc. Everyone needs to know their role and what to do in advance. War gaming scenarios to test all your systems, people and documentation helps to reveal weaknesses that can be solved when it doesn’t matter as much, and teach the team that they can apply haste without speed. How is the incident initially triaged? What are the escalation processes? How does stakeholder communication work? What happens if your tools are down too e.g. is your Slack war room hosted in the same AWS region as your core infrastructure?

The idea behind HumanOps principles is to provide a framework for focusing on the human side of infrastructure.

What’s the point of spending all that time and money on fancy equipment if the people who actually operate it aren’t being looked after? Human wellbeing is not just a fluffy buzzword – it makes business sense too.

The idea behind HumanOps events are to share what works and what doesn’t, and demonstrate that the best companies consider their human teams to be just as important as their high tech infrastructure.

Over the coming months I’ll be writing more about each of these topics and sharing the videos of other organisations explaining how they do it, too.

If you’re interested in attending, speaking or even running a HumanOps event near you, check out the website event listings and get in touch if there’s nothing nearby.

Should companies be required to publish security reviews?

I recently attended a cyber security conference about the current preparedness and future of cyber crime and security in the UK.

One of the audience members made a comment about how seriously businesses take their own security. He thought that, as with annual financial returns, business should be required to certify their own security credentials on an annual basis.

Many incidents of fraud occur not through cards being physically stolen, but through breaches in security at the shops we buy products from. The 2013 breach at Target is an example, the result of which might be that we decide not to shop there again.

Where we can make these consumer choices, the market is operating as it should. But it’s more challenging if the problem exists further down the chain. Perhaps the vendor used by the store for credit checking is the one that suffers a breach, such as at Equifax in 2017. Or more recently, the Ticketmaster incident, which was blamed on a third party component in their customer support system. How can consumers check several orders down into the supply chain?

Of course this is the idea behind one of the GDPR requirements to provide a list of all the third parties that data is being transferred to. But with companies like PayPal sharing data with hundreds of organisations, is it reasonable to expect consumers to check them all? Or any of them? And what would they actually check?

The Government already runs a certification programme called Cyber Essentials. If you want to sell into certain areas of government then you have to have a Cyber Essentials certification. Requiring vendors to certify helps with the government’s supply chain assurance at the same time as encouraging adoption of a UK standard.

But only around 10,000 businesses have certified in the 4 years the scheme has been operating. Is it a lack of awareness about the scheme or do customers and suppliers outside of government just not care? Maybe a combination of both.

As a consumer, you can’t easily assess security from the outside. You can only go on whether there have ever been any historical incidents and even then, that doesn’t tell you much about the state of their security today. So perhaps that audience member was onto something with requiring annual reporting?

There is also a power dynamic at work. The UK Government can mandate all of its suppliers comply with a particular certification because they all want to sell to government. But what if it were the other way around? Or swap the Government for another big organisation. Good luck requiring your suppliers to implement something similar if you’re just a small business.

It is impossible to have 100% security and breaches are inevitable, but as a customer you want to know that companies are taking basic steps to protect you – things like using strong passwords and keeping their systems up to date. It sounds simple, but one of the more interesting statistics from the conference I attended was that 80-90% of instances of cyber crime could be prevented by people having strong passwords and by keeping their computers and devices up to date. Surely these are basic security precautions all businesses should be expected to take.

Companies are already required to submit financial reports and annual statements about company details to Companies House. Would adding a security questionnaire to that return make a difference?

Voluntary compliance is often the first step because the companies that don’t provide the information are liable to be asked: why not? But then Cyber Essentials is already voluntary and not many businesses have certified. Maybe more would participate if it was free (there’s currently a £300 fee) and it just asked you about the current status, rather than requiring active steps to achieve a certification. Perhaps a grading system could indicate what level of security a business has in place which could show on the Companies House search record.

How many people would actually check this? Financial information about companies is already available but how often are returns checked before signing a contract? Suppliers sometimes run credit checks before offering credit terms but then there are multiple outcomes, such as the length of credit. A security check could only really have two outcomes – to do business, or not.

Last year, I wrote about how the supply side of the market was broken in relation to the security of consumer devices. Consumers should be able to expect product security just like they expect product safety. The good news is that they can indeed now expect this. In March this year, a report was released by the Department for Digital, Culture, Media and Sport alongside a new code of practice. Device manufacturers now have an incentive to build their products with security by design. If they don’t, the next step is regulation.

This is good for assurance of the security of consumer internet of things devices, but at what point does not using a secure password and keeping your systems up to date become negligence? Is the next step extending secure by design from internet of things devices to day to day general company administration?

A missed opportunity in recruiting

If you’ve ever applied for a job anywhere, you probably had a terrible experience.

Submitting an application into a black hole.

Waiting weeks without hearing anything. Maybe never hearing anything at all.

Vague instructions and trying to guess what the selection criteria are.

Delays getting an answer from early interviews.

Lack of any feedback if you get to later interviews.

More delays getting an offer…then, suddenly, time is of the essence and you must make a decision right now!

For most candidates at most companies, this is probably familiar. How does it make you feel about that company? They might be building awesome products, using the latest tech and working on a problem you really want to be part of. You start off with a great impression from their cool products, external marketing and great reputation, only to leave the process disappointed.

Recruiters are a waste of time – not only do they do a terrible job for their clients but they usually contribute to the reputation damage inflicted by badly run processes. But the companies themselves are just as bad. Once a recruiter hands the process over, then they could still run things properly.

Recruitment is odd in that it usually fails – the most common outcome is the failure of the candidate. That’s by design. Many more people interact with the company through the recruitment process than will ever be employed there.

So why not make them advocates? Or at least not detractors.

Even with the disappointment of not being selected for a job, the company can still leave the candidate with a positive impression.

A well run recruitment process should always send replies quickly and keep the candidate informed at all stages. The candidate should never have to chase for a response. It should be run quickly, with progression to the next stage happening over the course of days or within 1-2 weeks. Schedules sometimes don’t fit but with people being the most crucial aspect of the success of a business, making time for candidates should be a priority. And if a candidate dedicates time to the process, the least you can do is let them know why they weren’t successful in the end.

Every company uses a system to process applications. Communication should be built in, it can even be automated at the early stages. There is no excuse.

Why? Because the candidate might become a customer. They might tell their friends (who could be suitable candidates). They might apply for another position in the future.

Recruitment is another opportunity to build the company brand. To do some marketing. To enhance reputation and show off. It should be treated as such.

The SaaS conference marketing challenge

2009, when Server Density started, was very early in SaaS. Most software was still sold on-premise with licensing. Some well known products like Salesforce, Xero and GMail (G-Suite/Google Apps) were delivered SaaS-only but they were the minority.

This meant that the understanding of SaaS marketing was also very early. “Growth hacking” wasn’t a thing and a lot of marketing was still around AdWords and banner ads. Indeed, one of our more effective early campaigns was a banner ad on the newly launched Server Fault as part of the Stack Overflow community!

Content marketing was also new. I was able to build up a huge following over the years simply by writing good quality technical content that would appeal to my target audience. The Server Density blog was and remains the biggest source of traffic and leads to the product.

2018 is very different. We’ve reached saturation point for all of the above low-cost channels. You have to do them all but they are only a small part of the marketing mix.

The biggest component in SaaS marketing today is events and conferences. This has been growing over the last few years but attending, speaking at and sponsoring events is now a huge, if not the largest, aspect of SaaS marketing spend. You have to pay to play.

Regardless of who you’re targeting – from developers to small businesses and from startups to enterprise IT managers – being at conferences is a highly effective method of generating leads, and talking to your existing customers.

Potential customers use conferences to discover new vendors. It’s the new way to search for products to evaluate. This surprised me when I was manning our Server Density booth – the number of potential users who come up and ask about your product as part of an evaluation they’re starting. Or because they’re interested in what’s new. These are kind of people you’d expect to hate any commercialisation – that stereotype is outdated.

Existing customers are just as important. If you don’t have a stand, they’ll wonder why you’re not there. They want to see the vendor they picked with a huge presence and lots of marketing materials, and probably t-shirts and swag they can take home, too. It validates their past decision and is also another channel to market to them for cross selling new products or explaining new functionality. Conferences are a legitimate channel for customer success!

If you’re not at all the big industry events, you’re not being seen.

The challenge is that it is expensive.

The cost of sponsoring combined with travel, hotel and food for several team members in high, not to mention any marketing collateral, banners, swag and all the other booth materials. Just sponsoring for your logo to appear isn’t sufficient – you have to have the booth table, too. And you need a good location with plenty of traffic. If you don’t, your competitors will. That’s not cheap.

This is hard for startups. You need a team of people working the conferences and managing the logistics not just a few times a year but a few times per month. The spend quickly ramps up. But the reasons are obvious – it’s difficult to match the lead volume and quality, because you can qualify and demo on the spot. This is why all your competitors are doing it, and it’s why you need to be doing it too.

It’s also a big reason why you can’t do SaaS without significant funding. Without it, you simply can’t compete with the spending levels needed to get the conference machine going.

Office productivity – where Google and Microsoft have an advantage over AWS

One of the lessons of the High Growth Handbook is that the most successful software companies start out with a single product, but soon shift to using their distribution advantage to offer a portfolio of products:

Startups tend to succeed by building a product that is so compelling and differentiated that it causes large number of customers to adopt it over an incumbent. This large customer base becomes a major asset for the company going forward. Products can be cross sold to these customers, and the company’s share of time or wallet can expand. Since focusing on product is what caused initial success, founders of breakout companies often think product development is their primary competency and asset. In reality, the distribution channel and customer base derived from their first product is now one of the biggest go-forward advantages and differentiators the company has.

This advantage is fairly clear when it comes to public cloud providers.

When AWS first launched, it began with basic infrastructure primitives: storage (S3) and compute (EC2). Over time, it has added a vast number of products into the ecosystem.

This is a classic enterprise model: if you buy one product in the suite, when you need something else you will look to the vendor you already have a contract with first. This is because it simplifies management interfaces, network configuration, security, support, billing and legal agreements.

AWS certainly has an advantage here – it has the biggest mindshare amongst developers. The ecosystem effects of people with the right technology experience are compelling. Google is competing hard, but AWS is ahead when it comes to the size of the portfolio.

Yet AWS has a weakness when it comes to the office productivity suite. This is already a massive lead generator for Microsoft and Azure, and it could become a big source of customers for Google too.

Microsoft has been leveraging its licensing advantage amongst the largest, enterprise customers who use their productivity products – Office, Exchange, Windows. For a long time, Azure was being pushed to be licensed as part of the deal. If you’re already using Microsoft products, it makes sense to consider Azure first.

Whilst Microsoft might have a good base within the enterprise, Google has a similar foothold within the technology community. Pretty much every startup uses G Suite for email, calendar, docs, etc. Most of these use AWS. But the improvements in Google Cloud Platform, and the security and identity products in particular, are making the G Suite to G Cloud cross-sell more compelling.

Hows does AWS compare? WorkMail and WorkDocs. Not particularly compelling products, and products which seem to have been neglected. I don’t know anyone using either of these. Why would you?

This is one major area that AWS is significantly behind.

The Microsoft / Azure demographic is quite different from those using AWS and Google, but as G Suite and GCP become more tightly integrated, it will become a big differentiator for them.

The Brexit startup opportunity

It might seem like Brexit the only thing the Government is doing right now but in the 2017-2019 Parliament so far, some 23 Bills have received Royal Assent with more than half of those in 2018.

Some of these bills have introduced big changes, such as the Data Protection Act or the Space Industry Act. The former implementing GDPR and the latter paving the way for the UK to enhance its position in the space industry through new launch capabilities.

However, Brexit is taking up a significant part of any policy discussions inside and out of government. Touching every possible area, it is the most important and challenging question of modern times, something which is unlikely to change any time soon. This presents an opportunity for new businesses.

I was recently at an investment forum where we saw 12 startups pitch for funding. The format was very similar to when I was pitching for an initial pre-seed investment into my own software as a service business in 2009: just a few minutes to explain the what, why and how of your idea. But what was different were the types of companies and their approaches to monetisation.

The old approach where the majority of companies focused only on user growth, dealing with revenue later, was gone. These were companies with real business models actually charging for the value their products deliver to the customer rather than relying on vague notions of maximising users and selling them to advertisers.

Everyone always looks at Google as the example of an amazing ad-driven business, and it is. But there are very few situations where you can mirror the user intent of actively searching for something right now. In that context, a relevant ad makes perfect sense. Or if you know so much about a user that you can predict what they might want whilst they browse a social network feed. But these opportunities are rare. Isn’t it actually easier (and better) to build something so useful your users want to give you money for it to continue to exist?

Not only that but most of the pitches were for businesses hoping to tackle what I like to call “real problems”: healthcare and mental health, cyber security, new takes on financial risk, insurance, and several others.

What stood out to me was how many of these startups were addressing challenges which actually attempt to solve some of the big problems in society today. Bringing the startup model of new, innovative thinking to areas which might typically have only been considered solvable by government or the charity sector.

With the public sector grappling with Brexit, it is encouraging to see the forces of competition, revenue and profit coming in to propose solutions to bigger issues than how many more clicks can we get on an ad.

Whilst Elon Musk is often held up as one of the few entrepreneurs tackling big challenges, if the small sample size of the investment forum I attended is anything to go by, there are actually many more. The tech industry shouldn’t just be associated with “eyeballs” or libertarian Silicon Valley culture – it should be about tackling the big problems. For me, this means cyber security, healthcare and space as the areas of biggest opportunity over the coming decade. All areas that were once exclusive to the public sector. What else might also benefit from this approach?

Everyone is asking whether there are any real opportunities in Brexit, for there are certainly obvious downsides. With the public sector busy dealing with the incredible difficulties of extracting ourselves from the EU, this is a unique time to be considering how startups can step up.

A basic startup employee security checklist

Unless you’re just starting a new business from scratch, it is difficult to force big security policy changes across everyone in the company.

There are lots of things you “should” be doing. Whether this is rolling out a new device management platform to ensure everyone has the latest software updates or moving everyone to use a single-sign-on platform for all company logins, if you don’t do it from day one then it simply takes time to change existing practices.

Various events might trigger a revamp of your approach to security. It might be a big customer asking for supply chain assurances, it might be trying to sell into a particular industry like finance of government, or it may even be a security incident.

Security is never “done”. Rolling out device management across all company computer equipment is a big, time consuming project. But there are small wins that employees can do that will set the organisation apart from most other businesses, because most companies are horribly insecure.

At Server Density, we used a simple checklist that everyone would verify every 6 months. Once the initial setup is done when an employee joins, it only takes a couple of minutes to verify. It addresses the basics of ensuring the doors are locked and doesn’t require any specialist knowledge for most steps. Here’s the checklist.

A basic startup employee security checklist

This is specific to the services we used at Server Density, so may need adjusting for your own environment.

  1. Have you enabled 2 factor auth on key accounts?
    1. Braintree.
    2. Google.
    3. Github.
    4. [… All key company services listed here]
  2. Do you have full disk encryption enabled?
  3. Are you storing any sensitive or important files locally e.g. customer lists, strategy documents, private keys?
    1. If so, are they actually local or have they been placed into a cloud “dropbox” (e.g. Google Drive, Dropbox).
    2. If they are in a cloud dropbox, ensure they are either removed (and deleted from the cloud service) or encrypted (use PGP).
    3. If you subsequently encrypt a previously plain text file, be sure the cloud service has not just written a new version and you cannot restore the previous version!
  4. Are you running the latest OS version?
  5. Do you have a strong OS password?
  6. Confirm the password activates on sleep / screensaver.
  7. Are you running the latest browser version?
    1. Be sure to restart Chrome regularly so it can apply updates.
    2. Enable click-to-play to prevent browser plugin vulnerabilities.
  8. Are you using a password manager e.g. 1Password?
    1. Do you have a strong master password?
    2. Is the master password different from your OS password?
    3. Are you using different passwords for every account?
  9. Do you have a passcode on your mobile device?
  10. Review your Google Account security
    1. If you set a backup email, make sure it also has multi factor authentication enabled.
    2. Install this Chrome Extension to protect against phishing on your Google account.