Zero downtime upgrades

DETAILS: Tier: Free, Premium, Ultimate Offering: Self-managed

It's possible to upgrade to a newer major, minor, or patch version of GitLab without having to take your GitLab instance offline. However, for this to work there are the following requirements:

You can only upgrade one minor release at a time. So from 13.1 to 13.2, not to 13.3. If you skip releases, database modifications may be run in the wrong sequence and leave the database schema in a broken state.
You have to use post-deployment migrations.
You are using PostgreSQL. Starting from GitLab 12.1, MySQL is not supported.
You have set up a multi-node GitLab instance. Cloud Native Hybrid installations do not support zero-downtime upgrades.

If you want to upgrade multiple releases or do not meet the other requirements:

If you meet all the requirements above, follow these instructions in order. There are three sets of steps, depending on your deployment type:

Deployment type	Description
Gitaly or Gitaly Cluster	GitLab CE/EE using HA architecture for Gitaly or Gitaly Cluster
Multi-node / PostgreSQL HA	GitLab CE/EE using HA architecture for PostgreSQL
Multi-node / Redis HA	GitLab CE/EE using HA architecture for Redis
Geo	GitLab EE with Geo enabled
Multi-node / HA with Geo	GitLab CE/EE on multiple nodes

Each type of deployment requires that you hot reload the puma and sidekiq processes on all nodes running these services after you've upgraded. The reason for this is that those processes each load the GitLab Rails application which reads and loads the database schema into memory when starting up. Each of these processes must be reloaded (or restarted in the case of sidekiq) to re-read any database changes that have been made by post-deployment migrations.

Most of the time you can safely upgrade from a patch release to the next minor release if the patch release is not the latest. For example, upgrading from 14.1.1 to 14.2.0 should be safe even if 14.1.2 has been released. We do recommend you check the release posts of any releases between your current and target version just in case they include any migrations that may require you to upgrade one release at a time.

We also recommend you verify the version specific upgrading instructions relevant to your upgrade path.

Some releases may also include so called "background migrations". These migrations are performed in the background by Sidekiq and are often used for migrating data. Background migrations are only added in the monthly releases.

Certain major/minor releases may require a set of background migrations to be finished. To guarantee this, such a release processes any remaining jobs before continuing the upgrading procedure. While this doesn't require downtime (if the above conditions are met) we require that you wait for background migrations to complete between each major/minor release upgrade. The time necessary to complete these migrations can be reduced by increasing the number of Sidekiq workers that can process jobs in the background_migration queue. To see the size of this queue, Check for background migrations before upgrading.

As a guideline, any database smaller than 10 GB doesn't take too much time to upgrade; perhaps an hour at most per minor release. Larger databases however may require more time, but this is highly dependent on the size of the database and the migrations that are being performed.

To help explain this, let's look at some examples:

Example 1: You are running a large GitLab installation using version 13.4.2, which is the latest patch release of 13.4. When GitLab 13.5.0 is released this installation can be safely upgraded to 13.5.0 without requiring downtime if the requirements mentioned above are met. You can also skip 13.5.0 and upgrade to 13.5.1 after it's released, but you can not upgrade straight to 13.6.0; you have to first upgrade to a 13.5.Z release.

Example 2: You are running a large GitLab installation using version 13.4.2, which is the latest patch release of 13.4. GitLab 13.5 includes some background migrations, and 14.0 requires these to be completed (processing any remaining jobs for you). Skipping 13.5 is not possible without downtime, and due to the background migrations would require potentially hours of downtime depending on how long it takes for the background migrations to complete. To work around this you have to upgrade to 13.5.Z first, then wait at least a week before upgrading to 14.0.

Example 3: You use MySQL as the database for GitLab. Any upgrade to a new major/minor release requires downtime. If a release includes any background migrations this could potentially lead to hours of downtime, depending on the size of your database. To work around this you must use PostgreSQL and meet the other online upgrade requirements mentioned above.

Multi-node / HA deployment

WARNING: You can only upgrade one minor release at a time. So from 15.6 to 15.7, not to 15.8. If you attempt more than one minor release, the upgrade may fail.

Use a load balancer in front of web (Puma) nodes

With Puma, single node zero-downtime updates are no longer possible. To achieve HA with zero-downtime updates, at least two nodes are required to be used with a load balancer which distributes the connections properly across both nodes.

The load balancer in front of the application nodes must be configured to check proper health check endpoints to check if the service is accepting traffic or not. For Puma, the /-/readiness endpoint should be used, while /readiness endpoint can be used for Sidekiq and other services.

Upgrades on web (Puma) nodes must be done in a rolling manner, one after another, ensuring at least one node is always up to serve traffic. This is required to ensure zero-downtime.

Puma enters a blackout period as part of the upgrade, during which nodes continue to accept connections but mark their respective health check endpoints to be unhealthy. On seeing this, the load balancer should disconnect them gracefully.

Puma restarts only after completing all the currently-processing requests. This ensures data and service integrity. Once they have restarted, the health check end points are marked healthy.

The nodes must be updated in the following order to update an HA instance using load balancer to latest GitLab version.

Select one application node as a deploy node and complete the following steps on it:
1. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
2. Upgrade the GitLab package.
3. Get the regular migrations and latest code in place. Before running this step, the deploy node's /etc/gitlab/gitlab.rb configuration file must have gitlab_rails['auto_migrate'] = true to permit regular migrations.
```
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-ctl reconfigure
```
4. Ensure services use the latest code:
```
sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq
```
Complete the following steps on the other Puma/Sidekiq nodes, one after another. Always ensure at least one of such nodes is up and running, and connected to the load balancer before proceeding to the next node.
1. Update the GitLab package and ensure a reconfigure is run as part of it. If not (due to /etc/gitlab/skip-auto-reconfigure file being present), run sudo gitlab-ctl reconfigure manually.
2. Ensure services use latest code:
```
sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq
```
On the deploy node, run the post-deployment migrations:
```
sudo gitlab-rake db:migrate
```

Gitaly or Gitaly Cluster

Gitaly nodes can be located on their own server, either as part of a sharded setup, or as part of Gitaly Cluster.

Before you update the main GitLab application you must (in order):

Upgrade the Gitaly nodes that reside on separate servers.
Upgrade Praefect if using Gitaly Cluster.

Because of a known issue, Gitaly and Gitaly Cluster upgrades cause some downtime.

Upgrade Gitaly nodes

Upgrade the GitLab package on the Gitaly nodes one at a time to ensure access to Git repositories is maintained.

Upgrade Praefect

From the Praefect nodes, select one to be your Praefect deploy node. You install the new Omnibus package on the deploy node first and run database migrations.

On the Praefect deploy node:
1. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
2. Ensure that praefect['auto_migrate'] = true is set in /etc/gitlab/gitlab.rb.
On all remaining Praefect nodes, ensure that praefect['auto_migrate'] = false is set in /etc/gitlab/gitlab.rb to prevent reconfigure from automatically running database migrations.
On the Praefect deploy node:
1. Upgrade the GitLab package.
2. To apply the Praefect database migrations and restart Praefect, run:
```
sudo gitlab-ctl reconfigure
```
On all remaining Praefect nodes:
1. Upgrade the GitLab package.
2. Ensure nodes are running the latest code:
```
sudo gitlab-ctl reconfigure
```

PostgreSQL

Pick a node to be the Deploy Node. It can be any application node, but it must be the same node throughout the process.

Deploy node

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```

All nodes including the Deploy node

To prevent reconfigure from automatically running database migrations, ensure that gitlab_rails['auto_migrate'] = false is set in /etc/gitlab/gitlab.rb.

PostgreSQL only nodes

Upgrade the GitLab package.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

Deploy node

Upgrade the GitLab package.
If you're using PgBouncer:

You must bypass PgBouncer and connect directly to the database leader before running migrations.

Rails uses an advisory lock when attempting to run a migration to prevent concurrent migrations from running on the same database. These locks are not shared across transactions, resulting in ActiveRecord::ConcurrentMigrationError and other issues when running database migrations using PgBouncer in transaction pooling mode.

To find the leader node, run the following on a database node:
```
sudo gitlab-ctl patroni members
```
Then, in your gitlab.rb file on the deploy node, update gitlab_rails['db_host'] and gitlab_rails['db_port'] with the database leader's host and port.

To get the regular database migrations and latest code in place, run

sudo gitlab-ctl reconfigure
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate

All nodes excluding the Deploy node

Upgrade the GitLab package.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

Deploy node

Run post-deployment database migrations on deploy node to complete the migrations with
```
sudo gitlab-rake db:migrate
```

For nodes that run Puma or Sidekiq

Hot reload puma and sidekiq services

sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq

If you're using PgBouncer:

Change your gitlab.rb to point back to PgBouncer and run:
```
sudo gitlab-ctl reconfigure
```

If you do not want to run zero downtime upgrades in the future, make sure you remove /etc/gitlab/skip-auto-reconfigure and revert setting gitlab_rails['auto_migrate'] = false in /etc/gitlab/gitlab.rb after you've completed these steps.

Redis HA (using Sentinel)

DETAILS: Tier: Premium, Ultimate Offering: Self-managed

Package upgrades may involve version updates to the bundled Redis service. On instances using Redis for scaling, upgrades must follow a proper order to ensure minimum downtime, as specified below. This doc assumes the official guides are being followed to setup Redis HA.

In the application node

According to official Redis documentation, the easiest way to update an HA instance using Sentinel is to upgrade the secondaries one after the other, perform a manual failover from current primary (running old version) to a recently upgraded secondary (running a new version), and then upgrade the original primary. For this, we must know the address of the current Redis primary.

If your application node is running GitLab 12.7.0 or later, you can use the following command to get address of current Redis primary
```
sudo gitlab-ctl get-redis-master
```
If your application node is running a version older than GitLab 12.7.0, you have to run the underlying redis-cli command (which get-redis-master command uses) to fetch information about the primary.
1. Get the address of one of the sentinel nodes specified as gitlab_rails['redis_sentinels'] in /etc/gitlab/gitlab.rb
2. Get the Redis main name specified as redis['master_name'] in /etc/gitlab/gitlab.rb
3. Run the following command
```
sudo /opt/gitlab/embedded/bin/redis-cli -h <sentinel host> -p <sentinel port> SENTINEL get-master-addr-by-name <redis master name>
```

In the Redis secondary nodes

Set gitlab_rails['rake_cache_clear'] = false in gitlab.rb if you haven't already. If not, you might receive the error Redis::CommandError: READONLY You can't write against a read only replica. during the reconfigure post installation of new package.
Install package for new version.
Run sudo gitlab-ctl reconfigure, if a reconfigure is not run as part of installation (due to /etc/gitlab/skip-auto-reconfigure file being present).
If reconfigure warns about a pending Redis/Sentinel restart, restart the corresponding service
```
sudo gitlab-ctl restart redis
sudo gitlab-ctl restart sentinel
```

In the Redis primary node

Before upgrading the Redis primary node, we must perform a failover so that one of the recently upgraded secondary nodes becomes the new primary. After the failover is complete, we can go ahead and upgrade the original primary node.

Stop Redis service in Redis primary node so that it fails over to a secondary node
```
sudo gitlab-ctl stop redis
```
Wait for failover to be complete. You can verify it by periodically checking details of the current Redis primary node (as mentioned above). If it starts reporting a new IP, failover is complete.
Start Redis again in that node, so that it starts following the current primary node.
```
sudo gitlab-ctl start redis
```
Install package corresponding to new version.
Run sudo gitlab-ctl reconfigure, if a reconfigure is not run as part of installation (due to /etc/gitlab/skip-auto-reconfigure file being present).
If reconfigure warns about a pending Redis/Sentinel restart, restart the corresponding service
```
sudo gitlab-ctl restart redis
sudo gitlab-ctl restart sentinel
```

Update the application node

Install the package for new version and follow regular package upgrade procedure.

Geo deployment

DETAILS: Tier: Premium, Ultimate Offering: Self-managed

WARNING: You can only upgrade one minor release at a time.

The order of steps is important. While following these steps, make sure you follow them in the right order, on the correct node.

Update the Geo primary site

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
Edit /etc/gitlab/gitlab.rb and ensure the following is present:
```
gitlab_rails['auto_migrate'] = false
```
Reconfigure GitLab:
```
sudo gitlab-ctl reconfigure
```
Upgrade the GitLab package.
To get the database migrations and latest code in place, run:
```
sudo gitlab-ctl reconfigure
```
After the node is updated and reconfigure finished successfully, complete the migrations:
```
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate
```
Copy the /etc/gitlab/gitlab-secrets.json file from the primary site to the secondary site if they're different. The file must be the same on all of a site's nodes.

Update the Geo secondary site

On each secondary node, executing the following:

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
Edit /etc/gitlab/gitlab.rb and ensure the following is present:
```
gitlab_rails['auto_migrate'] = false
```
Reconfigure GitLab:
```
sudo gitlab-ctl reconfigure
```
Upgrade the GitLab package.
To get the database migrations and latest code in place, run:
```
sudo gitlab-ctl reconfigure
```
Run post-deployment database migrations, specific to the Geo database:
```
sudo gitlab-rake db:migrate:geo
```

Finalize the update

After all secondary nodes are updated, finalize the update on the primary node:

Run post-deployment database migrations
```
sudo gitlab-rake db:migrate
```
After the update is finalized on the primary node, hot reload puma and restart sidekiq and geo-logcursor services on all primary and secondary nodes:
```
sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq
sudo gitlab-ctl restart geo-logcursor
```

After updating all nodes (both primary and all secondaries), check their status:

Verify Geo configuration and dependencies
```
sudo gitlab-rake gitlab:geo:check
```

Multi-node / HA deployment with Geo

DETAILS: Tier: Premium, Ultimate Offering: Self-managed

WARNING: You can only upgrade one minor release at a time. You also must first start with the Gitaly cluster, updating Gitaly one node one at a time. This will ensure access to the Git repositories for the remainder of the upgrade process.

This section describes the steps required to upgrade a multi-node / HA deployment with Geo. Some steps must be performed on a particular node. This node is known as the "deploy node" and is noted through the following instructions.

Updates must be performed in the following order:

Update Geo primary multi-node deployment.
Update Geo secondary multi-node deployments.
Post-deployment migrations and checks.

Step 1: Choose a "deploy node" for each deployment

You now must choose:

One instance for use as the primary "deploy node" on the Geo primary multi-node deployment.
One instance for use as the secondary "deploy node" on each Geo secondary multi-node deployment.

Deploy nodes must be configured to be running Puma or Sidekiq or the geo-logcursor daemon. In order to avoid any downtime, they must not be in use during the update:

If running Puma remove the deploy node from the load balancer.
If running Sidekiq, ensure the deploy node is not processing jobs:
```
sudo gitlab-ctl stop sidekiq
```
If running geo-logcursor daemon, ensure the deploy node is not processing events:
```
sudo gitlab-ctl stop geo-logcursor
```

For zero-downtime, Puma, Sidekiq, and geo-logcursor must be running on other nodes during the update.

Step 2: Update the Geo primary multi-node deployment

On all primary nodes including the primary "deploy node"

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.

sudo touch /etc/gitlab/skip-auto-reconfigure

To prevent reconfigure from automatically running database migrations, ensure that gitlab_rails['auto_migrate'] = false is set in /etc/gitlab/gitlab.rb.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

On primary Gitaly only nodes

Upgrade the GitLab package.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

On the primary "deploy node"

Upgrade the GitLab package.
If you're using PgBouncer:

You must bypass PgBouncer and connect directly to the database leader before running migrations.

Rails uses an advisory lock when attempting to run a migration to prevent concurrent migrations from running on the same database. These locks are not shared across transactions, resulting in ActiveRecord::ConcurrentMigrationError and other issues when running database migrations using PgBouncer in transaction pooling mode.

To find the leader node, run the following on a database node:
```
sudo gitlab-ctl patroni members
```
Then, in your gitlab.rb file on the deploy node, update gitlab_rails['db_host'] and gitlab_rails['db_port'] with the database leader's host and port.

To get the regular database migrations and latest code in place, run

sudo gitlab-ctl reconfigure
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate

If this deploy node is used to serve requests or process jobs, then you may return it to service at this point.
- To serve requests, add the deploy node to the load balancer.
- To process Sidekiq jobs again, start Sidekiq:
```
sudo gitlab-ctl start sidekiq
```

On all primary nodes excluding the primary "deploy node"

Upgrade the GitLab package.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

For all primary nodes that run Puma or Sidekiq including the primary "deploy node"

Hot reload puma and sidekiq services:

sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq

Copy the /etc/gitlab/gitlab-secrets.json file from the primary site to the secondary site if they're different. The file must be the same on all of a site's nodes.

Step 3: Update each Geo secondary multi-node deployment

Only proceed if you have successfully completed all steps on the Geo primary multi-node deployment.

On all secondary nodes including the secondary "deploy node"

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.

sudo touch /etc/gitlab/skip-auto-reconfigure

To prevent reconfigure from automatically running database migrations, ensure that geo_secondary['auto_migrate'] = false is set in /etc/gitlab/gitlab.rb.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

On secondary Gitaly only nodes

Upgrade the GitLab package.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

On the secondary "deploy node"

Upgrade the GitLab package.

To get the regular database migrations and latest code in place, run

sudo gitlab-ctl reconfigure
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate:geo

If this deploy node is used to serve requests or perform background processing, then you may return it to service at this point.
- To serve requests, add the deploy node to the load balancer.
- To process Sidekiq jobs again, start Sidekiq:
```
sudo gitlab-ctl start sidekiq
```
- To process Geo events again, start the geo-logcursor daemon:
```
sudo gitlab-ctl start geo-logcursor
```

On all secondary nodes excluding the secondary "deploy node"

Upgrade the GitLab package.
Ensure nodes are running the latest code
```
sudo gitlab-ctl reconfigure
```

For all secondary nodes that run Puma, Sidekiq, or the geo-logcursor daemon including the secondary "deploy node"

Hot reload puma, sidekiq and geo-logcursor services:

sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq
sudo gitlab-ctl restart geo-logcursor

Step 4: Run post-deployment migrations and checks

On the primary "deploy node"

Run post-deployment database migrations:
```
sudo gitlab-rake db:migrate
```
Verify Geo configuration and dependencies
```
sudo gitlab-rake gitlab:geo:check
```
If you're using PgBouncer:

Change your gitlab.rb to point back to PgBouncer and run:
```
sudo gitlab-ctl reconfigure
```

On all secondary "deploy nodes"

Run post-deployment database migrations, specific to the Geo database:
```
sudo gitlab-rake db:migrate:geo
```
Verify Geo configuration and dependencies
```
sudo gitlab-rake gitlab:geo:check
```
Verify Geo status
```
sudo gitlab-rake geo:status
```