Implementation Guide
Airbyte Self-Managed Enterprise is in an early access stage for select priority users. Once you are qualified for a Self-Managed Enterprise license key, you can deploy Airbyte with the following instructions.
Airbyte Self-Managed Enterprise must be deployed using Kubernetes. This is to enable Airbyte's best performance and scale. The core Airbyte components (server
, webapp
, workload-launcher
) run as deployments. The workload-launcher
is responsible for managing connector-related pods (check
, discover
, read
, write
, orchestrator
).
Prerequisites
Infrastructure Prerequisites
For a production-ready deployment of Self-Managed Enterprise, various infrastructure components are required. We recommend deploying to Amazon EKS or Google Kubernetes Engine. The following diagram illustrates a typical Airbyte deployment running on AWS:
Prior to deploying Self-Managed Enterprise, we recommend having each of the following infrastructure components ready to go. When possible, it's easiest to have all components running in the same VPC. The provided recommendations are for customers deploying to AWS:
Component | Recommendation |
---|---|
Kubernetes Cluster | Amazon EKS cluster running on EC2 instances in 2 or more availability zones on a minimum of 6 nodes. |
Ingress | Amazon ALB and a URL for users to access the Airbyte UI or make API requests. |
Object Storage | Amazon S3 bucket with two directories for log and state storage. |
Dedicated Database | Amazon RDS Postgres with at least one read replica. |
External Secrets Manager | Amazon Secrets Manager for storing connector secrets. |
A few notes on Kubernetes cluster provisioning for Airbyte Self-Managed Enterprise:
- We support Amazon Elastic Kubernetes Service (EKS) on EC2 or Google Kubernetes Engine (GKE) on Google Compute Engine (GCE). Improved support for Azure Kubernetes Service (AKS) is coming soon.
- We recommend running Airbyte on memory-optimized instances, such as M7i / M7g instance types.
- While we support GKE Autopilot, we do not support Amazon EKS on Fargate.
- We recommend running Airbyte on instances with at least 2 cores and 8 gigabytes of RAM.
We require you to install and configure the following Kubernetes tooling:
- Install
helm
by following these instructions - Install
kubectl
by following these instructions. - Configure
kubectl
to connect to your cluster by usingkubectl use-context my-cluster-name
:
Configure kubectl to connect to your cluster
We also require you to create a Kubernetes namespace for your Airbyte deployment:
kubectl create namespace airbyte
Configure Kubernetes Secrets
Sensitive credentials such as AWS access keys are required to be made available in Kubernetes Secrets during deployment. The Kubernetes secret store and secret keys are referenced in your values.yaml
file. Ensure all required secrets are configured before deploying Airbyte Self-Managed Enterprise.
You may apply your Kubernetes secrets by applying the example manifests below to your cluster, or using kubectl
directly. If your Kubernetes cluster already has permissions to make requests to an external entity via an instance profile, credentials are not required. For example, if your Amazon EKS cluster has been assigned a sufficient AWS IAM role to make requests to AWS S3, you do not need to specify access keys.
Creating a Kubernetes Secret
While you can set the name of the secret to whatever you prefer, you will need to set that name in various places in your values.yaml file. For this reason we suggest that you keep the name of airbyte-config-secrets
unless you have a reason to change it.
airbyte-config-secrets
Installation Steps
Step 1: Add Airbyte Helm Repository
Follow these instructions to add the Airbyte helm repository:
- Run
helm repo add airbyte https://airbytehq.github.io/helm-charts
, whereairbyte
is the name of the repository that will be indexed locally. - Perform the repo indexing process, and ensure your helm repository is up-to-date by running
helm repo update
. - You can then browse all charts uploaded to your repository by running
helm search repo airbyte
.
Step 2: Configure your Deployment
-
Inside your
airbyte
directory, create an emptyvalues.yaml
file. -
Paste the following into your newly created
values.yaml
file. This is required to deploy Airbyte Self-Managed Enterprise:
global:
edition: enterprise
- To enable SSO authentication, add instance admin details SSO auth details to your
values.yaml
file, underglobal
. See the following guide on how to collect this information for various IDPs, such as Okta and Azure Entra ID.
auth:
instanceAdmin:
firstName: ## First name of admin user.
lastName: ## Last name of admin user.
identityProvider:
type: oidc
secretName: airbyte-config-secrets ## Name of your Kubernetes secret.
oidc:
domain: ## e.g. company.example
app-name: ## e.g. airbyte
clientIdSecretKey: client-id
clientSecretSecretKey: client-secret
- You must configure the public facing URL of your Airbyte instance to your
values.yaml
file, underglobal
:
airbyteUrl: # e.g. https://airbyte.company.example
- Verify the configuration of your
values.yml
so far. Ensurelicense-key
,instance-admin-email
andinstance-admin-password
are all available via Kubernetes Secrets (configured in prerequisites). It should appear as follows:
Sample initial values.yml file
The following subsections help you customize your deployment to use an external database, log storage, dedicated ingress, and more. To skip this and deploy a minimal, local version of Self-Managed Enterprise, jump to Step 3.
Configuring the Airbyte Database
For Self-Managed Enterprise deployments, we recommend using a dedicated database instance for better reliability, and backups (such as AWS RDS or GCP Cloud SQL) instead of the default internal Postgres database (airbyte/db
) that Airbyte spins up within the Kubernetes cluster.
We assume in the following that you've already configured a Postgres instance:
External database setup steps
Configuring External Logging
For Self-Managed Enterprise deployments, we recommend spinning up standalone log storage for additional reliability using tools such as S3 and GCS instead of against using the default internal Minio storage (airbyte/minio
). It's then a common practice to configure additional log forwarding from external log storage into your observability tool.
External log storage setup steps
Configuring External Connector Secret Management
Airbyte's default behavior is to store encrypted connector secrets on your cluster as Kubernetes secrets. You may optionally opt to instead store connector secrets in an external secret manager such as AWS Secrets Manager, Google Secrets Manager or Hashicorp Vault. Upon creating a new connector, secrets (e.g. OAuth tokens, database passwords) will be written to, then read from the configured secrets manager.
Configuring external connector secret management
Configuring Ingress
To access the Airbyte UI, you will need to manually attach an ingress configuration to your deployment. The following is a skimmed down definition of an ingress resource you could use for Self-Managed Enterprise:
Ingress configuration setup steps
Once this is complete, ensure that the value of the webapp-url
field in your values.yaml
is configured to match the ingress URL.
You may configure ingress using a load balancer or an API Gateway. We do not currently support most service meshes (such as Istio). If you are having networking issues after fully deploying Airbyte, please verify that firewalls or lacking permissions are not interfering with pod-pod communication. Please also verify that deployed pods have the right permissions to make requests to your external database.
Step 3: Deploy Self-Managed Enterprise
Install Airbyte Self-Managed Enterprise on helm using the following command:
helm install \
--namespace airbyte \
--values ./values.yaml \
airbyte-enterprise \
airbyte/airbyte
To uninstall Self-Managed Enterprise, run helm uninstall airbyte-enterprise
.
Updating Self-Managed Enterprise
Upgrade Airbyte Self-Managed Enterprise by:
- Running
helm repo update
. This pulls an up-to-date version of our helm charts, which is tied to a version of the Airbyte platform. - Re-installing Airbyte Self-Managed Enterprise:
helm upgrade \
--namespace airbyte \
--values ./values.yaml \
--install airbyte-enterprise \
airbyte/airbyte
Customizing your Deployment
In order to customize your deployment, you need to create an additional values.yaml
file in your airbyte
directory, and populate it with configuration override values. A thorough values.yaml
example including many configurations can be located in charts/airbyte folder of the Airbyte repository.
After specifying your own configuration, run the following command:
helm upgrade \
--namespace airbyte \
--values ./values.yaml \
--install airbyte-enterprise \
airbyte/airbyte
Customizing your Service Account
You may choose to use your own service account instead of the Airbyte default, airbyte-sa
. This may allow for better audit trails and resource management specific to your organizational policies and requirements.
To do this, add the following to your values.yaml
:
serviceAccount:
name:
AWS Policies Appendix
Ensure your access key is tied to an IAM user or you are using a Role with the following policies.
AWS S3 Policy
The following policies, allow the cluster to communicate with S3 storage
{
"Version": "2012-10-17",
"Statement":
[
{ "Effect": "Allow", "Action": "s3:ListAllMyBuckets", "Resource": "*" },
{
"Effect": "Allow",
"Action": ["s3:ListBucket", "s3:GetBucketLocation"],
"Resource": "arn:aws:s3:::YOUR-S3-BUCKET-NAME",
},
{
"Effect": "Allow",
"Action":
[
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:DeleteObject",
],
"Resource": "arn:aws:s3:::YOUR-S3-BUCKET-NAME/*",
},
],
}
AWS Secret Manager Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:CreateSecret",
"secretsmanager:ListSecrets",
"secretsmanager:DescribeSecret",
"secretsmanager:TagResource",
"secretsmanager:UpdateSecret"
],
"Resource": [
"*"
],
"Condition": {
"ForAllValues:StringEquals": {
"secretsmanager:ResourceTag/AirbyteManaged": "true"
}
}
}
]
}