Build Multi-Step Pipelines in Workbooks
This article walks through building a complete multi-step data pipeline using Workbooks. Each step lives in its own table, and the tables pass data to each other using cross-table column references.
Coming soon: Workbooks are not yet available in all accounts. This article covers how they will work when the feature rolls out.
What a multi-step pipeline looks like
A pipeline breaks a complex workflow into stages:
| Step | Table | What it does |
|---|---|---|
| 1 | Scrape | Pulls raw prospects from LinkedIn or a data source |
| 2 | Enrich | Adds email, phone, company data |
| 3 | Validate | Checks email deliverability |
| 4 | Output | Pushes clean records to your CRM |
Each table takes input from the previous table via cross-table column references. Running the pipeline means running each table in sequence.
Build a pipeline
Step 1: Create the Workbook and first table
- Click Workbooks in the left sidebar.
- Click New Workbook and name it, e.g.
Weekly Outreach Pipeline. - In the first table tab, configure your data source.
- For LinkedIn scraping: add a Data Source Column using an Apify search action.
- For a CSV import: import your file as the starting data.
- Run the first table to populate it with data.
Step 2: Add the enrichment table
- Click + to add a new table tab. Name it
Enrich. - Click Add Column and select Cross-Table Reference.
- Choose the first table as the source and select the LinkedIn URL column.
- Configure the key match on LinkedIn URL.
- Add Action Columns for email and phone enrichment:
- Find Work Email by LinkedIn URL (BetterEnrich)
- Find Mobile by LinkedIn URL (BetterEnrich)
- Map the LinkedIn URL cross-table column as input for both action columns.
Step 3: Add the validation table
- Add another table tab. Name it
Validate. - Add a cross-table reference pulling the Email column from the
Enrichtable. - Add an Action Column: Verify Email (BetterEnrich) or Identify Email Type (TexAu Utility).
- Map the email cross-table reference as input.
Step 4: Add the output table
- Add a final table tab. Name it
Push to CRM. - Add cross-table references pulling from both the
EnrichandValidatetables:- Email (from Enrich)
- Phone (from Enrich)
- Email verification status (from Validate)
- Add an Action Column:
- Create Contact (HubSpot) or Create Person (Pipedrive)
- Add a formula column to filter out invalid emails before pushing:
IF(email_status = "invalid", "skip", "send"). - Configure the CRM action to only run on rows where the formula column equals
"send".
Run the pipeline manually
- Open the Workbook.
- Start with the first table tab. Click Run All Rows.
- Wait for it to complete.
- Click the second table tab. Click Run All Rows.
- Continue through each table in order.
Run the pipeline on a schedule
- Open the Workbook settings.
- Click Schedule.
- Set the frequency (daily, weekly, etc.) and the start time.
- Select whether to run all tables in sequence or a specific table.
- Click Save Schedule.
When the Scheduled Job triggers, TexAu runs each table in the pipeline in the order they appear as tabs.
Pipeline design tips
- Keep each table focused on one task. Do not combine scraping and enrichment in the same table.
- Use a key column consistently across all tables. LinkedIn URL or Email works well as the shared identifier.
- Add a formula column at the end of each table to flag incomplete or problem rows before they move to the next step.
- Name your tables clearly -
Scrape,Enrich,Validate,Pushis easier to navigate thanTable 1,Table 2.
Troubleshooting
A later table has blank cross-table columns even though the earlier table has data. Verify that the source table ran successfully and has data in the column you are referencing. Refresh the cross-table column by re-running the destination table.
The pipeline runs out of order. Tables run in the order they are run manually, or in tab order when running on a schedule. If you need a specific execution order, arrange your tabs in that order by dragging them.
One table fails mid-way and corrupts downstream tables. Add a formula column at the end of each table to filter bad rows before the next table runs. This prevents incomplete data from propagating downstream.