How to learn product management

For the last few months since the StackPath acquisition, I have been shedding all the administrative tasks of a CEO of a small startup and focusing more and more of my time on product.

This has been initially scoped to integrating Server Density monitoring into the StackPath platform but has been broadening to multiple products across the platform.

I am used to shifting between many different tasks and responsibilities so focusing entirely on product has been a new experience for me. As a result, I have been spending as much time as possible learning about what it means to do product management.

Learning something new is a great time to write about the experience. There are valuable insights that can be shared from a beginner mindset. Once you “know” something, you think about problems in a different way.

So this post is a collection of the resources I’ve found useful in learning about running product engineering ~6 months into the role.

Books for product managers

Product management podcasts

I have yet to find a good podcast that is just about product management, so here are some specific episodes from more general podcasts that I’d recommend listening to.

  • Masters of Scale: Marissa Mayer – I find this podcast series very difficult to listen to because it is incredibly over-produced, but this one made me do further research into how Google runs product management and so was valuable in that sense!
  • The A16Z podcast as a whole is worth listening to, but specifically related to product I would suggest High Growth in Product (and tech) which is a podcast interview with Elad Gil of the High Growth Handbook mentioned above. Also listen to The Basics of Growth Part 1 and Part 2.

Events for product managers

Generally I don’t find attending conferences to be a good use of time. The travel, disruption to routine and low signal to noise ratio of talks means I’d usually much rather watch the videos after. However, I have found these to be worth the time:

  • Mind the Product Conference is the main conference but I attended only the Leadership Forum, which was worth it because of the small number of attendees.
  • Every industry has their own niche conference which is worth attending just to understand the overall landscape. For monitoring, it’s Monitorama. And for SaaS in general, it’s SaaStr. Be very picky and very specific.

Product management blogs

Where specific articles but not the whole blog is useful, they’re listed in the next section. These blogs are worth subscribing to in their entirety.

Articles for product managers

Product management videos and talks

  • Customer Obsession – from ProductTank San Francisco, this talk outlines: the balancing act of delighting customers in hard-to-copy margin-enhancing ways; how “customer obsession” helped Netflix to create a highly personalized experience; and the principles of customer obsession through a case study — “Should Netflix send a free trial reminder to its customers at the end of their four-week trial?”
  • Mastering the problem space for product/market fit – this is a framework covering the universal conditions and patterns that have to hold true to achieve product/market fit. Each layer in the pyramid is a key hypothesis that you need to get right in order to build the next layer and ultimately achieve product/market fit.

Good quotes on product management

Some select quotes from the linked content above that’s worth highlighting by itself.

In the 10+ years since AWS’s debut, Amazon has been systematically rebuilding each of its internal tools as an externally consumable service. A recent example is AWS’s Amazon Connect — a self-service, cloud-based contact center platform that is based on the same technology used in Amazon’s own call centers. Again, the “extra revenue” here is great — but the real value is in honing Amazon’s internal tools.

If Amazon Connect is a complete commercial failure, Amazon’s management will have a quantifiable indicator (revenue, or lack thereof) that suggests their internal tools are significantly lagging behind the competition. Amazon has replaced useless, time-intensive bureaucracy like internal surveys and audits with a feedback loop that generates cash when it works — and quickly identifies problems when it doesn’t. They say that money earned is a reasonable approximation of the value you’re creating for the world, and Amazon has figured out a way to measure its own value in dozens of previously invisible areas.

Why Amazon is eating the world

Perhaps most importantly, the product manager is the voice of the customer inside the business, and thus must be passionate about customers and the specific problems they’re trying to solve. This doesn’t mean the product manager should become a full-time researcher or a full-time designer, but they do need to make time for this important work. Getting out to talk to customers, testing the product, and getting feedback firsthand, as well as working closely with internal and external UX designers and researchers, are all part of this process.

Product Leadership

Many books emphasize the first two points—corporate strategy and culture setting. However, you will find that in practice you have little time in a high-growth, rapidly scaling company to think deeply about those points until you hire a strong executive team and manage your own time properly.

High Growth Handbook

There’s no point in defining what to build if you don’t know how it will get built. This doesn’t mean a product manager needs to be able to code, but understanding the technology stack — and most importantly, the level of effort involved — is crucial to making the right decisions.

Product Leadership

Another lesson that I learned from Brian Chesky—one way to think about when to upgrade executives—is that a really great executive is about six to twelve months ahead of the curve. They’re already planning for and acting on things that are going to be important six to twelve months in the future. A decent executive is delivering in real time, now to one to three months in advance.

High Growth Handbook

The trick to creating a great product team is to think of them as the product. This is not an objectification but rather a thought exercise. After all, they are the product that creates the product. Without them, there is no product. Amazing teams make amazing products. Seen from this perspective, the task of how to hire, onboard, train, and develop them becomes another product design problem. The approach that successful leaders take to creating great product is the same approach they take to creating great product teams.

Product Leadership

Often the hardest part of the communication is communicating the “why” behind the product road map, prioritization, and sequencing. Part of this will be creating a framework that establishes why some things are prioritized higher than others—and it’s important that all other functions buy into this framework.

High Growth Handbook

Out of the goals will come the specific features for development. Like a ripple effect with the vision at the center, the objectives or goals are generated and they in turn generate the features that support those goals. Never start with features. Even if your business or product is based on a “feature concept,” ask yourself what the bigger problem is and why it needs solving. Any feature shouldn’t be considered, prioritized, or delivered in a vacuum. Without a vision to guide the product creation, a project can quickly become a collection of cool solutions lacking a core problem to guide them. Features need to be directly tied to the product or organization’s strategic goals.

Product Leadership

For example, if you as the designer/manager discover that you as the worker can’t do something well, you need to fire yourself as the worker and get a good replacement

^ Principles: Life and Work

If you are not evolving your organizational design, it might be an indicator that your product strategy is getting stale. In our experience, most rigid organizational structures are built to create processes for predictability, not successful outcomes.

Product Leadership

As GV’s Ken Norton says, “I like to start with the problem. Smart people are very solution-oriented, so smart people like to talk about what the solution is going to look like to the problem. But successful people think about the problem. Before I talk about this product, or this feature, or this device I’m going to build, I must understand the problem at a deep level. Then success is easy to articulate, because you know what it’s like when that problem is solved.”

Product Leadership

“By-and-large” is the level at which you need to understand most things in order to make effective decisions. Whenever a big-picture “by-and-large” statement is made and someone replies “Not always,” my instinctual reaction is that we are probably about to dive into the weeds—i.e., into a discussion of the exceptions rather than the rule, and in the process we will lose sight of the rule.

^ Principles: Life and Work

How to hire engineers: the interview process

Originally written for the Seedcamp resources website.

Earlier this year, I wrote about the first step in hiring – how to source candidates. Once you have applications, then you need to evaluate them to decide who you might want to hire.

Regardless of how urgent the need is to fill the position, finding the right people, not just for the role today but for how your business will change in the future, is crucial to success. This post will take you through how to create a robust selection process for hiring engineers.

The goals of the process

You have to remember that you are still in a sales process. You are not just trying to match applications against your person spec but you are also trying to convince them to accept the offer you might make at the end. This means there are several goals to consider:

  1. Evaluate applications against what you are looking for in team members now, and in the future. You need to balance the requirements of the job today with an ability to adapt as the business changes. This is particularly important in early-stage startups. Past experience may be relevant to demonstrate ability to execute, but knowledge of specific technologies is probably not – the best engineers can learn new skills, languages, frameworks and systems.
  2. Continue to demonstrate why your business is a great place to work. This comes in multiple parts, the first of which is well before you even get applications. Building your profile and supporting website materials is important for getting applications in the first place. It is just as important that the interview process runs smoothly, the candidates always know where they are at, what they need to do next and what the timeline is. You need to provide regular updates and fast responses. Their time must be valued more than your own and you need to explain to them why they should be joining the company if you make them an offer. You can never take for granted that just because they have applied to you, they will actually accept any offer.
  3. Build a diverse team. This is assisted by the design of the process but also requires you to have the appropriate HR policies in place e.g. flexible working, generous holiday allowances, clear maternity/paternity policies, etc. Thinking about this from the beginning and designing your processes to consider the challenges of diversity means you do not need to do things like positive discrimination, which I do not think is a good way to tackle the diversity problem in tech. The goal is to increase the diversity of the application pool and run an unbiased process to select the best candidates from that pool. Google has some useful guides on diversity in general and there are several good resources for working on gender diversity.

The basic foundation for running a good engineering interview process is valuing the time of the candidates. They likely have full time jobs and/or consulting gigs, so you cannot ask candidates to spend many hours on the phone, doing coding tasks or building projects. Of course they will need to give up some time to dedicate to the process but you should work hard to minimise it.

Step 1: Application

The usual application is a simple form which asks the candidate to submit their basic details, a CV/resume and a short cover letter explaining why they are interested in the job. The cover letter is the most important aspect and the only element that is actually examined at this stage.

In the job ad I include an instruction which asks the applicant to mention a keyword in their cover letter. If the keyword isn’t present then the application is instantly rejected. This is specifically to filter out mass, shotgun-type applications and to test for attention to detail.

The best people will usually only ever apply to a small number of positions. You want to find people who take the time to consider the company and role well in advance of ever applying, which means reading the full job ad and description.

Where possible, this step should be automated. Only collecting the minimum amount of information e.g. email, cover letter means you can systematically ignore any other details of the application, such as the CV, name, which might introduce bias. Be aware of protected characteristics and things you cannot ask.

Just like college degrees being mostly irrelevant for engineering positions (unless you have some very specific scientific knowledge you require), some companies are now excluding CV submission entirely. This is worth considering as another way to remove potential for bias. The only thing I find CVs useful for is to research interview questions in advance, but everything you need to know you can simply ask the candidate when you speak to them later.

Step 2: Writing exercise

I have found there is a good correlation between ability to write well and coding ability. Programming is all about clear and accurate communication, whether that is directly in code itself or communication about the project with real people!

I test this by requiring candidates to do a short writing exercise whereby they have an hour to research the answer to a particular question, and write up the response. The question should be relatively easy because the focus is on their written answer. You are simply looking for accurate spelling and grammar. Any mistakes should mean an instant rejection – if they are unable to write such a short piece without mistakes or proper proofreading then that indicates a lack of care and attention.

The task should take no more than an hour and you are not looking for technical accuracy of the response. This is purely an assessment of clear and accurate communication.

Step 3: Coding exercise

Designing a good coding exercise is tricky. It needs to be representative of the kind of skills you need for the role. It should allow the candidate to demonstrate a wide range of skills, from writing clear code to tests and documentation. And it should be straightforward to build in a short period of time – a couple of hours is ideal.

One of the more successful exercises I have used in the past is to ask the candidate to build a simple client for a public API. This tests many things such as working with real world systems, understanding credential management and dealing with network issues and error handling.

Whatever you pick, you want the candidate to be able to create a self contained package or repository, with some basic installation and setup documentation so that you can evaluate both whether it works, and the implementation itself.

Before starting this, as an engineering team you need to create a list of objective criteria that you can score the exercise against. These can include things like checking the documentation is accurate, test coverage, code linting, etc. You can determine your own criteria but they should be as objective as possible so that each evaluator can compare their conclusions.

Once the candidate sends you their completed exercise, the code should be given to several of your engineers to evaluate. This should be done blind so the evaluators only see the code, and they do not discuss the details with each other. This gives you several independent evaluations and avoids any bias. Be sure to instruct the candidate not to include any identifying information in the package e.g. a Github URL or their name in an auto-generated copyright code comment.

Step 4: In-person pair programming

At this point you have done most of the evaluation and believe the candidate has the skills you’re looking for. The final stage is to evaluate actually working alongside you in a more realistic situation. For this, I prefer to meet candidates in person and have them work alongside their potential colleagues.

I have done this stage remotely in the past but have found that it is more effective to meet someone in person. You can then evaluate what they are like as a person. However, this is also the stage where there is most risk of bias. You can mitigate this by involving multiple people from your team so that one person doesn’t have a veto.

In the interests of speed and efficiency, I try and schedule all final interviews within the same week. This may not always be possible but I try to batch them as closely as possible. This makes the best use of your team’s time and means that candidates can get a response quickly.

You should cover all travel costs for the candidates, booking tickets for them rather than making them pay with reimbursement – they shouldn’t have to loan your company their own money! If they have to travel a long distance, offer overnight accommodation, transfers and food. Also ensure they have a direct contact who is available 24/7 in an emergency. You want candidates focused on the interview, not worrying about logistics.

Again, you need to determine what the best approach to evaluating their capabilities is. I have found that getting them to actually work on your codebase is a good way to see how they deal with an unfamiliar environment and start to learn a new system. You can ask them to fix a known bug, or introduce a simple bug into the code and work with them to fix it. You are not testing them on their knowledge, but on how they approach the problem. Whether or not they fix the problem isn’t important.

Remember that this continues to be a sales process. Take the time to introduce them to key members of team, show them around the office and, if they’re not local, the area where they’ll be working. Be sure to show off and explain why you want them to join. This is the job of everyone on the team – multiple people telling them about the company is a lot better than just the hiring manager or CEO!

Step 5: The response

Anyone who gets past step 1 should receive a response to their application whether they are successful or not. One of the worst things about applying for a job is not knowing what the decision was.

The challenge with giving a negative result is that candidates will often ask for feedback and may argue with it. It is up to you whether you want to do this at all, but I usually offer detailed feedback only if a candidate reaches step 3 or 4. Failing step 2 is only for poor spelling/grammar, which you can build into an auto-generated response.

If you are going to make an offer, do it as quickly as possible. Include the key information about the compensation package, start date and anything else you need from the candidate. Be sure to review the legal requirements for a formal job offer first.

Don’t use exploding offers and don’t pressure the candidate. During the step 4 interview, you may want to ask them what their evaluation criteria are and whether they are looking elsewhere. Asking them when they think they will be able to reply to you is probably fine, but  don’t ask about salary expectations.

What not to do

You may notice that certain things are not present in the above process.

  • No questions about their background and experience. It is not necessary – you are evaluating them based on their skills and how they apply to them today, not what they claim to have done in the past. That said, in step 4, you may want to ask a few questions about how they may have tackled similar problems in the past, or what interesting challenges they have solved if you are hiring for a very specific problem area. But really, you want to put as much time as possible into designing your coding exercises so they are representative of the problems the candidate would have to solve if they were working at your company. Let them demonstrate their ability, not talk about it.
  • No knowledge questions or puzzles. The ability to recall function definitions or solve theoretical problems is not particularly useful for evaluating whether someone can write good software.
  • No whiteboarding. You may want to use a whiteboard to explain specific system architecture but there is no place for actually coding on a whiteboard, on paper, or anywhere that isn’t a modern IDE or code editor of the candidate’s choice. Nobody codes in isolation without access to the internet to look things up. Everyone has their own preferred coding environment and the coding interview will likely place them in an unfamiliar setup without their usual shortcuts and window layout, so be sure to make allowances for this too.
  • No phone interviews. Again, get the candidate to demonstrate their ability through real tasks, not be explaining what they might do or have done.

Applying HumanOps to on-call

Originally written for the StackPath blog.

One of the two core foundations of SaaS monitoring is alerting (the other being metric visualization and graphing). Alerting is designed to notify you when things go wrong in your data center, that there’s a problem with your website performance, or if you’re experiencing server downtime. More specifically, infrastructure monitoring and website monitoring are designed to notify you in such a way that you can respond and try to fix it. That often means waking people up, interrupting dinners, and taking people away from their family to deal with a problem.

When the very nature of a product deliberately has a negative impact of the quality of life of your customers, it is your responsibility as the vendor to consider how to mitigate that impact. Trying to understand how StackPath Monitoring impacts our customers through their on-call processes was why we started HumanOps.

So how do you apply HumanOps principles to (re)designing your approach to on-call?

HumanOps is made up of 4 key principles. These are explained in more detail in the What is HumanOps post, but essentially it boils down to:

  1. Humans build & operate systems that have a critical business impact.
  2. Humans require downtime. They get tired, get stressed, and need breaks.
  3. As a result, human wellbeing directly impacts system operations.
  4. As a result, human wellbeing has a direct impact on critical business systems.

These can be applied through considering some key questions about how on-call processes work.

How is on-call workload shared across team members?

It’s standard practice to have engineers be on-call for their own code. Doing so provides multiple incentives to ensure the code is properly instrumented for debugging, has appropriate documentation for colleagues to debug code they didn’t write, and, of course, to rapidly fix alerts which are impacting your own (or your colleagues) on-call experience. If you’re being woken up by your own bad code, you want to get it fixed pretty quickly!

With the assumption that engineers support their own code, the next step is to share that responsibility fairly. This becomes easier as the team grows but even with just 2-3 people, you can have a reasonable cycle of on/off call. We found that 1-week cycles Tuesday – Tuesday work well. This is a long enough period to allow for a decent “off-call” time and has a whole working day buffer to discuss problems that might have occurred over the weekend.

You also want a formal handoff process so that the outgoing on-call engineer can summarize any known issues to the person taking over.

How do you define primary and secondary escalation responsibilities?

The concept of primary/secondary is a good way to think about on-call responders and the Service Level Agreement they commit to with each role.

The primary responder typically needs to be able to acknowledge an alert and start the first response process within a couple of minutes. It means they have to be by an internet connected computer at all times. This is not a 24/7 NOC, which is a different level of incident response.

Contrast this with a secondary who may be required to respond within 15-30 minutes. They are there as a backup in case the primary is suddenly unreachable or needs help, but not necessarily immediately available. This is an important distinction in smaller teams because it allows the secondary to go out for dinner or be on public transport/driving for a short period of time (i.e. they can live a relatively normal life!). You can then swap these responsibilities around as part of your weekly on-call cycle.

What are the expectations for working following an incident?

An alert which distracts you for 10 minutes early evening is very different from one which wakes you up at 3 a.m. and takes 2 hours to resolve, preventing you from going back to bed again because it’s now light outside.

In the former situation, you can still be productive at work the next day, but in the latter, you’re going to be very fatigued.

It’s unreasonable to expect on-call responders to be completely engaged the day after an incident. They need to have time to recover and shouldn’t feel pressured to turn up and be seen.

The best way I’ve seen to implement this is to have an automatic “day off” policy which is granted without any further approval, and leave it to the discretion of the employee to decide if they need a full day, work from home, or just to sleep in for the morning.

Recovery is necessary for personal health but also to avoid introducing human errors caused by fatigue. Do you really want someone who has been up all night dealing with an incident committing code into the product or logging into production systems?

This should be tracked as a separate category of “time off” in your calendar system so that you can measure the impact of major on-call incidents on your team.

It also applies if there is a daytime alert which takes up a significant amount of time during a weekend or holiday. The next work-day should be taken as vacation to make up for it.

Having the employee make the decision, but with it defaulting to “time off allowed” avoids pressure to come in to work regardless. Reducing the cultural peer pressure is more challenging, but managers should set the expectation that it is understood that you will take that time off, and make sure that everyone does.

How do you measure whether your on-call process is improving?

Metrics are key to HumanOps. You need to know how many alerts are being generated, what percentage happen out of hours, what your response times are, and whether certain people are dealing with a disproportionate number of alerts.

These metrics are used for two purposes:

  1. To review your on-call processes. Do you need to move schedules around for someone who might have had more of their fair share of alerts? Are people taking their recovery time off? Are people responding within the agreed SLAs? If not, why not?
  2. To review which issues should be escalated to engineering planning. If alerts are being caused by product issues they need to be prioritized for rapid fixes. Your engineers should be on-call so they will know what is impacting them, but management needs to buy into the idea that any issues that wake people up should be top priority to fix.

Eliminating all alerts is impossible, but you can certainly reduce them. You can then track performance over time. You’ll only know how you’re doing if you measure everything though!

How are you implementing HumanOps?

We’re interested in hearing how different companies run their on-call so we can share the best ideas within the community. Let me know how you’re implementing the HumanOps principles. Also, we encourage you to come along to one of our HumanOps events to discuss with the community. Email me or mention me on Twitter @davidmytton.

Configuring for security, privacy and convenience

Balancing security, privacy and convenience is not easy. I’ve spent quite a lot of time figuring out how to configure my various computer systems with this goal in mind.

Computers are supposed to make our lives more convenient and you sometimes have to trade privacy for convenience e.g. Outlook processing emails to allow you to use Focused Inbox. AI is going to bring a lot of productivity improvements but I always prefer when that is processed on device, as with Siri Suggestions for things like when to leave for an event.

You also have to consider your adversary.  There are reasonable steps you can take without seriously damaging convenience to provide safeguards against criminals and data profiling. But if you are trying to evade active government surveillance rather than just avoid being swept up in mass snooping, then things get significantly more difficult.

Targeted surveillance is, and should be, allowed (with appropriate legal safeguards). That is not what I’m trying to protect against here. Good security should be expected by all. Privacy is about having choice and control over your personal data.

Here’s how I approach it as of Oct 2018. I expect these practices to change over time. In no particular order:

  • Only use Apple mobile devices. They are the only company that builds privacy by design into their products. Their business model is to sell high priced hardware, not to sell your data. They have 5 year lifecycles on software updates which are delivered regularly, unlike Android which requires updates to go through carriers (usually delays by months, or forever). Buying direct from Google means giving up all your privacy. And the Apple model is to run as much computation on-device, whereas Google is the opposite – all processing is in their cloud environment, which is secure, but has no privacy.
  • Don’t get an Alexa device or Google Home. If you want a voice assistant, Apple’s HomePod with iOS 12 Shortcuts works very well.
  • iOS is the only secure OS that achieves the security, privacy and convenience balance. Any sensitive work should be restricted to iOS devices only. macOS is the next best option. If you don’t need convenience, use Tails.
  • Configure macOS and iOS for privacy. In particular, this means using full disk encryption and strong passwords.
  • Don’t use any Google services and be sure to pay for key services like email, calendar and file storage. If you’re not paying then your data is the product – you want a vendor who has a sustainable business model in selling the service/product itself, not your data. Running your own systems significantly reduces the security aspect of the balance, so it’s better to use either iCloud (if you don’t want your own domain), Microsoft Office365 (which is what I use) or Fastmail. For £10/m I get access to 1TB of OneDrive storage, Mail, Calendar and the full suite of Office products. I pay an extra £1.50/m on top of that for Advanced Threat Protection. Microsoft allows you to select the country where data is stored, has privacy by design and has a good record of defending against government access requests. The Outlook iOS app is actually very good but the Exchange protocol is supported by every client, so you have a good choice. Focused Inbox is great. Bigger corporates like Microsoft have significantly more resources to invest in security (which is why I prefer Office365 over Fastmail).
  • Unfortunately, Apple Maps is still rubbish compared to Google Maps. They’re generally comparable in major cities so I always prefer Apple Maps until the last-mile destination directions, where Apple Maps is regularly inaccurate. At that point I switch to Google Maps on iOS.
  • Don’t store anything unencrypted on cloud storage providers that you would be concerned about leaking if someone gained access. Encrypt these files individually. You can use gpg on Mac but it’s not especially user friendly. I prefer Keybase but it still requires using the command line. These files will be inaccessible on mobile so you may want to consider using 1Password document storage instead, for small files (they have a total storage limit of 1GB). Office files can be password protected themselves, which uses local AES encryption.
  • Delete files you don’t need any more and aren’t required to keep for tax records. In particular, set your email to delete all messages after a period – the shorter the better. I delete all my emails after 1 year. Configure macOS Finder Preferences to remove items from the Trash automatically after 30 days.
  • Don’t send attachments via email. You might delete your emails after a time but the recipients probably don’t. Instead, share them using an expiring link to online cloud storage.
  • Use a password manager and 2 factor authentication. These are just security basics.
  • Don’t use Google Chrome. Only use Safari or Firefox. Configure your browser to auto clear your history on a retention period that allows convenience but also privacy. I set mine to clear after 1 week. I’ve never needed to go back any further. Be sure the “Prevent cross-site tracking” option is configured in Safari settings.
  • Set up DuckDuckGo for your search provider on macOS and iOS. I’ve not used Google search for years.
  • Buy 1Blocker X for iOS and 1Blocker for macOS (see a comparison of other options) to block trackers and ads in Safari.
  • Set up Little Snitch outbound firewall and be sure you know which apps you’re approaching outbound internet access for.
  • Set up Micro Snitch to be notified whenever your mic and camera are in use. Cover your device cameras as a backup.
  • Don’t use SMS – disable fallback in iOS settings. WhatsApp encryption is good but all the metadata about who you are communicating with is shared with Facebook. Unfortunately, it has built up a considerable network effect so it is necessary to use it to communicate in the Western world. Few people use Signal, which is the best so  follow this guide to maximise WhatsApp privacy. iOS allows you to configure deleting iMessages after a period of time. I have mine set to delete after 30 days. You have to manually clear your WhatsApp conversations.
  • Don’t plug anything directly into any USB charging port in airports, hotels, or anywhere else. Use a USB data blocker adapter first.
  • Back up your files to cloud storage but only if they are encrypted locally first. Arq is a good tool to do this. Don’t use the same cloud storage as your main files e.g. I use OneDrive for my files and Amazon S3 for Arq backups.
  • Always use a VPN when connected to public wifi, or any network you don’t control, but don’t use a free VPN. This site has a good comparison but I use on macOS and iOS. is owned by StackPath, my current employer, so I know how all the internal infrastructure is set up i.e. we don’t log traffic. However, I also used it prior to joining StackPath and before itself was acquired. is a great consumer VPN but if you want more control and configuration options e.g. OpenVPN support, StrongVPN is another product from StackPath.
  • Change your DNS servers to use a privacy-first DNS provider, such as Cloudflare DNS. Do not use your default ISP DNS or Google DNS. If you have an OpenWRT router, configure it to use Cloudflare DNS over TLS because otherwise your ISP can still sniff your DNS requests.
  • Better yet, buy a router that allows you to configure DNS over TLS and connect to a VPN directly. I have a GL-AR750S configured to force all DNS over Cloudflare DNS over TLS and it is permanently connected to StrongVPN. This means all connections from home are encrypted before they even hit my ISP. The only downside is having to disconnect the VPN when using BBC iPlayer, because it detects the VPN. My wifi uses Mac whitelisting so only specific devices are allowed to connect.
  • Pay for Cifas protective registration and register your phone numbers on the TPS list.
  • Use Apple Pay wherever possible. The vendor doesn’t get access to any information about you and can only identify your payment information from a token specific to each transaction. This protects privacy and if the vendor is breached, your card details are safe. The usual contactless limit doesn’t apply to Apple Pay, which is limited only by your card limit.
  • Don’t buy Samsung TVs. There’s no need for any TV to connect to the internet so don’t connect them in the first place. Use a dedicated device like an Apple TV for your TV interface, it has a better UI anyway.
  • Be mindful of sharing photos online directly from your phone. They usually embed the location of the photo in the EXIF data.

Have I missed something? Let me know what else you’re doing.

Leaving the policing of the internet up to Google and Facebook

Consumers typically don’t want to pay for services that the internet has taught them should be “free”. Social networking, email, calendars, search, messaging…these are all “free” on a cash basis, but have a major cost to your privacy.

The best analogy I have heard to describe how these services work was in an episode of Sam Harris’ podcast with Jaron Lanier.

To paraphrase: imagine if when you viewed an article on Wikipedia, it customised the content based on thousands of variables about you based on such things as where you are, who you are friends with, what websites you visit and how often, how old you are, your political views, what you read recently, what your recent purchases are, your credit rating, where you travel, what your job is and many other things you have no idea about. You wouldn’t know the page had changed, or how it differed from anyone else. Or even if any of the inferred characteristics were true or not.

That’s how Google and Facebook work, all in the name of showing ads.

I don’t have a problem with trading privacy for free services per se. The problem is the lack of transparency with how these systems work, and the resultant lack of understanding by those making the trade off (ToS;DR). For the market mechanism to work, you have to be well informed.

We’re starting to see this with how governments are trying to force the big platforms to police the content they host but leaving the details to platforms themselves. Naturally, they are applying algorithms and technology to the problem, but how the rules are being applied is completely opaque. There’s no way to appeal. By design, the hueristics constantly change and there’s no way to understand how they have been applied.

Policing content is a problem that has been solved in the past through the development of Western legal systems and the rule of law. The separate powers of the state – government, judiciary and legislature – counter-balance each other with various checks and stages to allow for change. It’s not perfect, but it has had hundreds of years of production deployment and new version releases!

What has changed is the scale. And the fact that governments are delegating the responsibility of the implementation to a small number of massive, private firms.

It’s certainly not that the government could do a better job at solving this. Indeed, they would likely make even more of a mess of it e.g. EU cookie notice laws. But private companies can’t be allowed to do it by themselves.

The solution requires open debate, evidence based review, a robust appeals system, transparency into decision making and the ability for the design to be changed over time. But it also needs to be mostly automated and done at internet-scale. Unfortunately, right now I’m not sure such a solution exists.

Regulation always favours the large incumbents, stifling innovation and freedom of expression. Perhaps it is time for the legislative process to adopt a more lightweight, agile process with a specific, long term goal that successive governments can work towards. There tends to be a preference for huge, wide-ranging regulatory schemes which try to do everything in one go. Instead, we should be making small changes, focusing on maximum transparency and taking the time to measure and iterate. The tech companies need to apply good engineering processes to how they are developing their social policy, in public.

But without any incentive to do so, we risk ending up with a Kafka-esque system that might achieve the goal at a macro level, but will have many unintended consequences.

A practical guide to HumanOps – what it is and how to get started

Originally written for the StackPath blog.

Humans are a critical part of operating systems at scale, yet we rarely pay much attention to them. Most of the time, energy and investment goes into picking the right technologies, the right hardware, the right APIs. But what about the people actually building and scaling those systems?

In 2016, Server Density launched HumanOps. It started with an event in London to hear from some of the big names in tech about how they think about the teams running infrastructure.

How can you reach your high availability goals without a team that is able to build reliable systems, and respond when things go wrong? How does sleep and fatigue affect system uptime? System errors are tracked, but what about human error? Can it be measured, and mitigated?

With the acquisition of Server Density by StackPath, I am pleased that HumanOps now has a team dedicated to continuing to build the community. We’re open to anyone taking on responsibility for a local meetup but will also be running our own series of events in major cities around the world. The first of these kicked off this week in San Francisco.


What is HumanOps?

The problem

A superhero culture exists within technical systems operations.

Being woken up to fix problems, losing sleep to make an amazing fix live in production and then powering through a full day of work is considered to be heroic effort.

There is little consideration for the impact this approach has on health, family and long term well-being.

The aim

Running complex systems is difficult and there will sometimes be incidents that require heroic effort. But these should be rare, and there should be processes in place to minimise their occurrence, mitigating the effects when they do happen.

HumanOps events are about encouraging the discussion of ideas and best practices around how to look after the team who look after your systems.

It considers that the human aspects of designing high availability systems are just as important as the selection of technologies and architecture choices.

It’s about showing that mature businesses can’t afford to sacrifice their teams and how the best managed organisations achieve this.

If Etsy, Facebook, Spotify and the UK Government can do this. So can you.

How to implement HumanOps

The first step to implementing HumanOps is to understand and accept the key principles.

Key principles

  1. Humans build & operate systems that have critical business impact.
  2. Humans require downtime. They get tired, get stressed and need breaks.
  3. As a result, human wellbeing directly impacts system operations.
  4. As a result, human wellbeing has a direct impact on critical business systems.

HumanOps systems and processes follow from these principles.

HumanOps systems & processes

There are many areas of operations where HumanOps can be applied, but there are a few core areas which are worth starting with first. Each one of these could be a separate blog post so here are a series of questions to start thinking about your own process design.

  • On call
    This is where the most impact occurs. Being woken up to deal with a critical incident has a high impact, so it is important to design the on-call processes properly. Some key questions to ask: how is the workload shared across team members? How often is someone on-call and how long do they get off-call? What are the response time expectations for people at different escalation levels (e.g. do you have to stay at home by your computer or can you go out but with a longer response time?). Do you get time off after responding to an incident overnight? If so, is there any pressure to forgo that e.g. it should be automatic rather than requiring an active request. Do managers follow the same rules and set an example? Do you expect engineers to support their own code? Do you consider additional compensation for each on-call incident or is it baked into their standard employment contract? Do you prioritise bugs that wake people up?
  • Metrics
    You can’t improve something without measuring it. Critical out of hours incidents will happen, but they should be rare. Do you know your baseline alert level and whether that is improving? Do you have metrics about the number of alerts in general, number of alerts out of hours? Do you know if one person is dealing with a disproportionate number of alerts? Do you know which parts of the system are generating the most alerts? How long does it take for you to respond and then resolve incidents? How does this link to the business impact – revenue, user engagement, NPS? Are these metrics surfaced to the management team?
  • Documentation
    Only the smallest systems can be understood by a single person. This means writing and keeping documentation up to date needs to be a standard part of the development process. Runbooks should be linked to alerts to provide guidance on what alerts mean and how to debug them. Checklists must form a part of all human performed tasks to mitigate the risk of human error. How do you know when documentation is out of date? Who takes responsibility for updating it? How often do you test?
  • Alerts
    Most system operators know the pain of receiving too many alerts which are irrelevant and don’t contain enough information to resolve the problem. This is where linked documentation comes in but the goal should be that alerts don’t reach humans except as a last resort. Interrupting a human should only happen if only a human can resolve the problem. This means automating as much as possible and triggering alerts based on user-impacting system conditions, not just on component failures where the system can continue to operate. Are your alerts actionable? Do they contain enough information for the recipient to know what to do next? Are they specific enough to point to the failure without resulting in a flood if there is a major outage?
  • Simulation
    A large part of the stress of incidents is the uncertainty of the situation coupled with the knowledge that it is business / revenue impacting. Truly novel outages do happen but much of the incident response process can be trained. Knowing what you and each of your team members need to do and when will streamline response processes. Emergency response teams do this regularly because they know that major incidents are complex and difficult to coordinate ad-hoc. Everyone needs to know their role and what to do in advance. War gaming scenarios to test all your systems, people and documentation helps to reveal weaknesses that can be solved when it doesn’t matter as much, and teach the team that they can apply haste without speed. How is the incident initially triaged? What are the escalation processes? How does stakeholder communication work? What happens if your tools are down too e.g. is your Slack war room hosted in the same AWS region as your core infrastructure?

The idea behind HumanOps principles is to provide a framework for focusing on the human side of infrastructure.

What’s the point of spending all that time and money on fancy equipment if the people who actually operate it aren’t being looked after? Human wellbeing is not just a fluffy buzzword – it makes business sense too.

The idea behind HumanOps events are to share what works and what doesn’t, and demonstrate that the best companies consider their human teams to be just as important as their high tech infrastructure.

Over the coming months I’ll be writing more about each of these topics and sharing the videos of other organisations explaining how they do it, too.

If you’re interested in attending, speaking or even running a HumanOps event near you, check out the website event listings and get in touch if there’s nothing nearby.

Should companies be required to publish security reviews?

I recently attended a cyber security conference about the current preparedness and future of cyber crime and security in the UK.

One of the audience members made a comment about how seriously businesses take their own security. He thought that, as with annual financial returns, business should be required to certify their own security credentials on an annual basis.

Many incidents of fraud occur not through cards being physically stolen, but through breaches in security at the shops we buy products from. The 2013 breach at Target is an example, the result of which might be that we decide not to shop there again.

Where we can make these consumer choices, the market is operating as it should. But it’s more challenging if the problem exists further down the chain. Perhaps the vendor used by the store for credit checking is the one that suffers a breach, such as at Equifax in 2017. Or more recently, the Ticketmaster incident, which was blamed on a third party component in their customer support system. How can consumers check several orders down into the supply chain?

Of course this is the idea behind one of the GDPR requirements to provide a list of all the third parties that data is being transferred to. But with companies like PayPal sharing data with hundreds of organisations, is it reasonable to expect consumers to check them all? Or any of them? And what would they actually check?

The Government already runs a certification programme called Cyber Essentials. If you want to sell into certain areas of government then you have to have a Cyber Essentials certification. Requiring vendors to certify helps with the government’s supply chain assurance at the same time as encouraging adoption of a UK standard.

But only around 10,000 businesses have certified in the 4 years the scheme has been operating. Is it a lack of awareness about the scheme or do customers and suppliers outside of government just not care? Maybe a combination of both.

As a consumer, you can’t easily assess security from the outside. You can only go on whether there have ever been any historical incidents and even then, that doesn’t tell you much about the state of their security today. So perhaps that audience member was onto something with requiring annual reporting?

There is also a power dynamic at work. The UK Government can mandate all of its suppliers comply with a particular certification because they all want to sell to government. But what if it were the other way around? Or swap the Government for another big organisation. Good luck requiring your suppliers to implement something similar if you’re just a small business.

It is impossible to have 100% security and breaches are inevitable, but as a customer you want to know that companies are taking basic steps to protect you – things like using strong passwords and keeping their systems up to date. It sounds simple, but one of the more interesting statistics from the conference I attended was that 80-90% of instances of cyber crime could be prevented by people having strong passwords and by keeping their computers and devices up to date. Surely these are basic security precautions all businesses should be expected to take.

Companies are already required to submit financial reports and annual statements about company details to Companies House. Would adding a security questionnaire to that return make a difference?

Voluntary compliance is often the first step because the companies that don’t provide the information are liable to be asked: why not? But then Cyber Essentials is already voluntary and not many businesses have certified. Maybe more would participate if it was free (there’s currently a £300 fee) and it just asked you about the current status, rather than requiring active steps to achieve a certification. Perhaps a grading system could indicate what level of security a business has in place which could show on the Companies House search record.

How many people would actually check this? Financial information about companies is already available but how often are returns checked before signing a contract? Suppliers sometimes run credit checks before offering credit terms but then there are multiple outcomes, such as the length of credit. A security check could only really have two outcomes – to do business, or not.

Last year, I wrote about how the supply side of the market was broken in relation to the security of consumer devices. Consumers should be able to expect product security just like they expect product safety. The good news is that they can indeed now expect this. In March this year, a report was released by the Department for Digital, Culture, Media and Sport alongside a new code of practice. Device manufacturers now have an incentive to build their products with security by design. If they don’t, the next step is regulation.

This is good for assurance of the security of consumer internet of things devices, but at what point does not using a secure password and keeping your systems up to date become negligence? Is the next step extending secure by design from internet of things devices to day to day general company administration?

A missed opportunity in recruiting

If you’ve ever applied for a job anywhere, you probably had a terrible experience.

Submitting an application into a black hole.

Waiting weeks without hearing anything. Maybe never hearing anything at all.

Vague instructions and trying to guess what the selection criteria are.

Delays getting an answer from early interviews.

Lack of any feedback if you get to later interviews.

More delays getting an offer…then, suddenly, time is of the essence and you must make a decision right now!

For most candidates at most companies, this is probably familiar. How does it make you feel about that company? They might be building awesome products, using the latest tech and working on a problem you really want to be part of. You start off with a great impression from their cool products, external marketing and great reputation, only to leave the process disappointed.

Recruiters are a waste of time – not only do they do a terrible job for their clients but they usually contribute to the reputation damage inflicted by badly run processes. But the companies themselves are just as bad. Once a recruiter hands the process over, then they could still run things properly.

Recruitment is odd in that it usually fails – the most common outcome is the failure of the candidate. That’s by design. Many more people interact with the company through the recruitment process than will ever be employed there.

So why not make them advocates? Or at least not detractors.

Even with the disappointment of not being selected for a job, the company can still leave the candidate with a positive impression.

A well run recruitment process should always send replies quickly and keep the candidate informed at all stages. The candidate should never have to chase for a response. It should be run quickly, with progression to the next stage happening over the course of days or within 1-2 weeks. Schedules sometimes don’t fit but with people being the most crucial aspect of the success of a business, making time for candidates should be a priority. And if a candidate dedicates time to the process, the least you can do is let them know why they weren’t successful in the end.

Every company uses a system to process applications. Communication should be built in, it can even be automated at the early stages. There is no excuse.

Why? Because the candidate might become a customer. They might tell their friends (who could be suitable candidates). They might apply for another position in the future.

Recruitment is another opportunity to build the company brand. To do some marketing. To enhance reputation and show off. It should be treated as such.

The SaaS conference marketing challenge

2009, when Server Density started, was very early in SaaS. Most software was still sold on-premise with licensing. Some well known products like Salesforce, Xero and GMail (G-Suite/Google Apps) were delivered SaaS-only but they were the minority.

This meant that the understanding of SaaS marketing was also very early. “Growth hacking” wasn’t a thing and a lot of marketing was still around AdWords and banner ads. Indeed, one of our more effective early campaigns was a banner ad on the newly launched Server Fault as part of the Stack Overflow community!

Content marketing was also new. I was able to build up a huge following over the years simply by writing good quality technical content that would appeal to my target audience. The Server Density blog was and remains the biggest source of traffic and leads to the product.

2018 is very different. We’ve reached saturation point for all of the above low-cost channels. You have to do them all but they are only a small part of the marketing mix.

The biggest component in SaaS marketing today is events and conferences. This has been growing over the last few years but attending, speaking at and sponsoring events is now a huge, if not the largest, aspect of SaaS marketing spend. You have to pay to play.

Regardless of who you’re targeting – from developers to small businesses and from startups to enterprise IT managers – being at conferences is a highly effective method of generating leads, and talking to your existing customers.

Potential customers use conferences to discover new vendors. It’s the new way to search for products to evaluate. This surprised me when I was manning our Server Density booth – the number of potential users who come up and ask about your product as part of an evaluation they’re starting. Or because they’re interested in what’s new. These are kind of people you’d expect to hate any commercialisation – that stereotype is outdated.

Existing customers are just as important. If you don’t have a stand, they’ll wonder why you’re not there. They want to see the vendor they picked with a huge presence and lots of marketing materials, and probably t-shirts and swag they can take home, too. It validates their past decision and is also another channel to market to them for cross selling new products or explaining new functionality. Conferences are a legitimate channel for customer success!

If you’re not at all the big industry events, you’re not being seen.

The challenge is that it is expensive.

The cost of sponsoring combined with travel, hotel and food for several team members in high, not to mention any marketing collateral, banners, swag and all the other booth materials. Just sponsoring for your logo to appear isn’t sufficient – you have to have the booth table, too. And you need a good location with plenty of traffic. If you don’t, your competitors will. That’s not cheap.

This is hard for startups. You need a team of people working the conferences and managing the logistics not just a few times a year but a few times per month. The spend quickly ramps up. But the reasons are obvious – it’s difficult to match the lead volume and quality, because you can qualify and demo on the spot. This is why all your competitors are doing it, and it’s why you need to be doing it too.

It’s also a big reason why you can’t do SaaS without significant funding. Without it, you simply can’t compete with the spending levels needed to get the conference machine going.

Office productivity – where Google and Microsoft have an advantage over AWS

One of the lessons of the High Growth Handbook is that the most successful software companies start out with a single product, but soon shift to using their distribution advantage to offer a portfolio of products:

Startups tend to succeed by building a product that is so compelling and differentiated that it causes large number of customers to adopt it over an incumbent. This large customer base becomes a major asset for the company going forward. Products can be cross sold to these customers, and the company’s share of time or wallet can expand. Since focusing on product is what caused initial success, founders of breakout companies often think product development is their primary competency and asset. In reality, the distribution channel and customer base derived from their first product is now one of the biggest go-forward advantages and differentiators the company has.

This advantage is fairly clear when it comes to public cloud providers.

When AWS first launched, it began with basic infrastructure primitives: storage (S3) and compute (EC2). Over time, it has added a vast number of products into the ecosystem.

This is a classic enterprise model: if you buy one product in the suite, when you need something else you will look to the vendor you already have a contract with first. This is because it simplifies management interfaces, network configuration, security, support, billing and legal agreements.

AWS certainly has an advantage here – it has the biggest mindshare amongst developers. The ecosystem effects of people with the right technology experience are compelling. Google is competing hard, but AWS is ahead when it comes to the size of the portfolio.

Yet AWS has a weakness when it comes to the office productivity suite. This is already a massive lead generator for Microsoft and Azure, and it could become a big source of customers for Google too.

Microsoft has been leveraging its licensing advantage amongst the largest, enterprise customers who use their productivity products – Office, Exchange, Windows. For a long time, Azure was being pushed to be licensed as part of the deal. If you’re already using Microsoft products, it makes sense to consider Azure first.

Whilst Microsoft might have a good base within the enterprise, Google has a similar foothold within the technology community. Pretty much every startup uses G Suite for email, calendar, docs, etc. Most of these use AWS. But the improvements in Google Cloud Platform, and the security and identity products in particular, are making the G Suite to G Cloud cross-sell more compelling.

Hows does AWS compare? WorkMail and WorkDocs. Not particularly compelling products, and products which seem to have been neglected. I don’t know anyone using either of these. Why would you?

This is one major area that AWS is significantly behind.

The Microsoft / Azure demographic is quite different from those using AWS and Google, but as G Suite and GCP become more tightly integrated, it will become a big differentiator for them.