A Universe Data Integration Tool - Airbyte
Published in:2025-11-18 | Category: Big Data
Words: 849 | Reading time: 5min

Recently, I have been exploring various data integration tools to streamline our data workflows. One tool that has caught my attention is Airbyte. It supports a wide range of data sources and destinations, making it a versatile choice for data engineers and analysts alike.

Airbyte is an open-source data integration platform that allows you to easily connect various data sources to your data warehouses
Airbyte
Airbyte

Key Features of Airbyte

  1. Open Source: Airbyte is open-source, allowing users to customize and extend its functionality according to their needs.
  2. Wide Range of Connectors: It supports over 600 sources and destinations, including databases, APIs, and file systems.
  3. Platform: Airbyte can be deployed on-premises or in the cloud, providing flexibility in how you manage your data infrastructure.
  4. Incremental Data Sync: It supports incremental data synchronization, reducing the load on source systems and improving efficiency.
  5. Custom Connectors: Users can create custom connectors using Airbyte’s connector development kit (CDK).

Airbyte Sources and Destinations

In Airbyte open source version, You can see it supports various sources and destinations.

Sources

  • Databases: MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, etc.
  • APIs: Salesforce, Stripe, HubSpot, Google Analytics, etc.
  • File Systems: S3, Google Cloud Storage, FTP, etc.
Airbyte Sources
Airbyte Sources

Destinations

  • Data Warehouses: Snowflake, BigQuery, Redshift, etc.
  • Databases: PostgreSQL, MS SQL Server, etc.
  • Data Lakes: S3, GCS Data Lake, Azure Blob Storage etc.
Airbyte Destinations
Airbyte Destinations

Setting Up Airbyte Locally

You can choose Cloud deployment or Local deployment. Here, I will guide you through the local setup using Docker.

  1. Install Docker: Make sure you have Docker installed on your machine. You can download it from here.

  2. Install abctl:
    abctl is Airbyte’s command-line tool for deploying and managing Airbyte.

    1
    curl -LsfS https://get.airbyte.com | bash -

    When installation completes, you see abctl install succeeded.

  3. Run Airbyte:
    To make Airbyte accessible outside localhost, specify the –host flag and provide a fully qualified domain name for Airbyte’s host.

    1
    abctl local install --host example.com

    If you’re running on an insecure/non-HTTPS connection, turn off the secure cookies requirement.

    1
    abctl local install --host example.com --insecure-cookies

    For the detailed instructions, refer to the official documentation.

Once you’ve installed, you can see the airbyte docker containers is running:

Airbyte Docker Containers
Airbyte Docker Containers

Accessing the airbyte web app at http://<your-domain>:8000, you can start configuring your data sources and destinations. port 8000 is default port, but you can specify a different port using the --port flag during installation.

When first access it, you need to get the default password for login by running:

1
abctl local credentials
Airbyte Credentials
Airbyte Credentials

To set the own password, you can change it anytime through below command:

1
abctl local credentials --password YourStrongPasswordExample

Start Data Integration

Once logged in, you can start creating connections between your data sources and destinations. The Airbyte interface is intuitive, allowing you to configure sync settings, schedule data transfers.

For our project, we’re syncing the sentry data from Sentry (source) to Google BigQuery (destination). Since the Sentry only remains 90 days of data retention, we want to back up the data to BigQuery for long-term storage and analysis.

Setup Sentry as Source

In the source configuration, select Sentry from the list of available sources. You’ll need to provide your Sentry API token and organization details to connect. Once configured, make sure the the source connection is working by testing the connection.

Airbyte Sentry Source
Airbyte Sentry Source

Setup BigQuery as Destination

Next, select Google BigQuery as your destination. You’ll need to provide your GCP credentials and specify the dataset where you want to store the data. Test the connection to ensure everything is set up correctly.

Airbyte BigQuery Destination
Airbyte BigQuery Destination

Setup Connection

Finally, create a connection between the Sentry source and BigQuery destination. You can configure the sync frequency, data transformation options, and other settings as needed.

For the initial sync, you might want to do a full refresh to back up all existing data. After that, you can set up incremental syncs to keep your BigQuery dataset updated with new Sentry data.

Airbyte Connection Setup
Airbyte Connection Setup

Below is an example of the scheduling sync to run every hour:

Airbyte Schedule Sync
Airbyte Schedule Sync

Check the Sync Data

Once the connection is set up, you can monitor the sync process from the Airbyte dashboard. It provides detailed logs and metrics to help you track the status of your data transfers. You can also check your BigQuery dataset to verify that the data from Sentry is being populated correctly.

Below screenshot shows how many data count synced in BigQuery after the Airbyte sync job completed successfully.

Airbyte Sync Data in BigQuery
Airbyte Sync Data in BigQuery

Summary

Airbyte is a powerful and flexible data integration tool that simplifies the process of connecting various data sources to your data warehouses and lakes. Its open-source nature, wide range of connectors, and user-friendly interface make it an excellent choice for data engineers and analysts looking to streamline their data workflows. Whether you’re backing up Sentry data to BigQuery or integrating other data sources, Airbyte provides a robust solution for your data integration needs.

Hope you find this tool useful for your data integration tasks!

Prev:
Serveral Ways to Extracting PDF content to Markdown for LLM & RAG
Next:
Exploring Java 21 New Features