Service Level Agreement

Date

Version

Changes

Date

Version

Changes

2022-01-21

v1.0.0

 

An SLA (service level agreement) is an agreement between provider and client about measurable metrics like uptime, responsiveness and responsibilities. You can see a sample service level agreement here.

1.1 The SLA MUST be between a vendor and a paying customer.

1.2 The SLA MUST be reviewed by both legal and IT operations.

1.3 The SLA MUST include a service description that specify which system parts are included in your agreement and which are out of scope. (third party systems are most often out of scope)

2 Service Level Indicators

An SLI (service level indicator) is a defined quantitative measure of the level of service that is provided.

2.1 All metrics MUST be something you can measure and create automated reports for. It should not require manual steps to create a report.

2.2 All metrics MUST be specified how it should be measured

2.2.1 Use aggregates for time intervals or regions

2.2.2 Define how frequent it is measured

2.2.3 Define what kind of requests are included and what are excluded from measurement. Example, only measure successful requests.

2.2.4 Define where the measurement is done, example client or server

2.4 All metrics SHOULD be aggregated to let you reset the measurement after a time interval. Example available is measured over 1 month. Next month the measurement is reset.

2.5 Request latency metric SHOULD be limited to 95th or 99th percentile to take cold starts and statistical anomalies into account.

2.6 An aggregation MUST be disregarded when there are too little data to make it statistical viable.

3 Service Level Objectives

An SLO (service level objective) is a target value or range of values for a specific service level that is measured by an SLI.

3.1 You SHOULD keep the number of objectives as few as possible.

3.2 All objectives MUST have a safety margin.

3.3 Availability MUST include the availability of dependent systems.

3.4 Objectives MUST specify within margin of usage they are valid. Example, response times are guaranteed within the limit of 100 requests/second.

4 Service Level Agreement

4.1 The SLA MUST specify Client Responsibilities. Example, being available and responsive during a live incident.

4.2 The SLA MUST specify the Vendor Responsibilities. Example, notifying the client of planned maintenance.

4.3 The SLA MUST specify exemptions from responsibilities. Example, force majeure.

4.4 The SLA SHOULD specify how to manage updates to the SLA. Example, e-mail with changes to the SLA 2 months ahead of implementation.

4.5 The SLA MUST specify how to handle reparations to the Client when SLA has not been met by Vendor.

4.6 Reparations MUST be reasonable to your business. Your clients doesn’t want you to go bankrupt.

5 Support

5.1 The SLA SHOULD specify how to get in contact when the system is not working.

5.2 The SLA MUST specify service hours.

5.3 The SLA SHOULD specify Time to First Response for incidents, but specify it in service hours.

5.4 The SLA SHOULD specify Urgency and Impact for incidents, and dismiss minor incidents into support errands.

5.5 The SLA SHOULD specify Time to Resolution in effective time, and time waiting on the customer will be exempt.