Bulk exporting trace data - Docs by LangChain

Plan restrictions applyPlease note that the Data Export functionality is only supported for LangSmith Plus or Enterprise tiers.

LangSmith’s bulk data export functionality allows you to export your traces into an external destination. This can be useful if you want to analyze the data offline in a tool such as BigQuery, Snowflake, RedShift, Jupyter Notebooks, etc. An export can be launched to target a specific LangSmith project and date range. Once a batch export is launched, the system will handle the orchestration and resilience of the export process. Exporting your data may take some time depending on the size of your data. LangSmith also has a limit on how many of your exports can run at the same time. Bulk exports have a runtime timeout of 72 hours, refer to Automatic retry behavior for more details.

Destinations

Currently we support exporting to an S3 bucket or S3 API compatible bucket that you provide. The data will be exported in Parquet columnar format. This format will allow you to easily import the data into other systems. The data export will contain equivalent data fields as the Run data format.

Exporting data

Destinations - providing a S3 bucket

To export LangSmith data, you will need to provide an S3 bucket where the data will be exported to. The following information is needed for the export:

Bucket Name: The name of the S3 bucket where the data will be exported to.
Prefix: The root prefix within the bucket where the data will be exported to.
S3 Region: The region of the bucket - this is needed for AWS S3 buckets.
Endpoint URL: The endpoint URL for the S3 bucket - this is needed for S3 API compatible buckets.
Access Key: The access key for the S3 bucket.
Secret Key: The secret key for the S3 bucket.
Include Bucket in Prefix (optional): Whether to include the bucket name as part of the path prefix. Defaults to false for new destinations or when the bucket name is already present in the path. Set to true for legacy compatibility or when using storage systems that require the bucket name in the path.

We support any S3 compatible bucket, for non AWS buckets such as GCS or MinIO, you will need to provide the endpoint URL.

Preparing the destination

For self-hosted and EU region deploymentsUpdate the LangSmith URL appropriately for self-hosted installations or organizations in the EU region in the requests below. For the EU region, use eu.api.smith.langchain.com.

Permissions required

Both the backend and queue services require write access to the destination bucket:

The backend service attempts to write a test file to the destination bucket when the export destination is created. It will delete the test file if it has permission to do so (delete access is optional).
The queue service is responsible for bulk export execution and uploading the files to the bucket.

AWS S3 permissions The minimal AWS S3 permission policy relies on the following permissions:

s3:PutObject (required): Allows writing Parquet files to the bucket.
s3:DeleteObject (optional): Cleans up test files during destination creation. If this permission isn’t present, the file is left under the /tmp directory after destination creation.
s3:GetObject (optional but recommended): Verifies file size after writing.
s3:AbortMultipartUpload (optional but recommended): Avoids dangling multipart uploads.

Minimal IAM policy example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR_BUCKET_NAME/*"
      ]
    }
  ]
}

Recommended IAM policy example with additional permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR_BUCKET_NAME/*"
      ]
    }
  ]
}

Google Cloud Storage (GCS) permissions When using GCS with the S3-compatible XML API, the following IAM permissions are required:

storage.objects.create (required): Allows writing files to the bucket.
storage.objects.delete (optional): Cleans up test files during destination creation. If this permission isn’t present, the file is left under the /tmp directory after destination creation.
storage.objects.get (optional but recommended): Verifies file size after writing.

These permissions can be granted through the “Storage Object Admin” predefined role or a custom role.

Create a destination

The following example demonstrates how to create a destination using cURL. Replace the placeholder values with your actual configuration details. Note that credentials will be stored securely in an encrypted form in our system.

curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "destination_type": "s3",
    "display_name": "My S3 Destination",
    "config": {
      "bucket_name": "your-s3-bucket-name",
      "prefix": "root_folder_prefix",
      "region": "your aws s3 region",
      "endpoint_url": "your endpoint url for s3 compatible buckets",
      "include_bucket_in_prefix": true
    },
    "credentials": {
      "access_key_id": "YOUR_S3_ACCESS_KEY_ID",
      "secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
    }
  }'

Use the returned id to reference this destination in subsequent bulk export operations. If you receive an error while creating a destination, see debug destination errors for details on how to debug this.

Credentials configuration

Requires LangSmith Helm version >= 0.10.34 (application version >= 0.10.91)

We support the following additional credentials formats besides static access_key_id and secret_access_key:

To use temporary credentials that include an AWS session token, additionally provide the credentials.session_token key when creating the bulk export destination.
(Self-hosted only): To use environment-based credentials such as with AWS IAM Roles for Service Accounts (IRSA), omit the credentials key from the request when creating the bulk export destination. In this case, the standard Boto3 credentials locations will be checked in the order defined by the library.

AWS S3 bucket

For AWS S3, you can leave off the endpoint_url and supply the region that matches the region of your bucket.

curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "destination_type": "s3",
    "display_name": "My AWS S3 Destination",
    "config": {
      "bucket_name": "my_bucket",
      "prefix": "data_exports",
      "region": "us-east-1"
    },
    "credentials": {
      "access_key_id": "YOUR_S3_ACCESS_KEY_ID",
      "secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
    }
  }'

Google GCS XML S3 compatible bucket

When using Google’s GCS bucket, you need to use the XML S3 compatible API, and supply the endpoint_url which is typically https://storage.googleapis.com. Here is an example of the API request when using the GCS XML API which is compatible with S3:

curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "destination_type": "s3",
    "display_name": "My GCS Destination",
    "config": {
      "bucket_name": "my_bucket",
      "prefix": "data_exports",
      "endpoint_url": "https://storage.googleapis.com"
      "include_bucket_in_prefix": true
    },
    "credentials": {
      "access_key_id": "YOUR_S3_ACCESS_KEY_ID",
      "secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
    }
  }'

See Google documentation for more info

Create an export job

To export data, you will need to create an export job. This job will specify the destination, the project, the date range, and filter expression of the data to export. The filter expression is used to narrow down the set of runs exported and is optional. Not setting the filter field will export all runs. Refer to our filter query language and examples to determine the correct filter expression for your export. You can use the following cURL command to create the job:

curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "end_time": "2024-01-02T23:59:59Z",
    "filter": "and(eq(run_type, \"llm\"), eq(name, \"ChatOpenAI\"), eq(input_key, \"messages.content\"), like(input_value, \"%messages.content%\"))",
    "format_version": "v2_beta"
  }'

The session_id is also known as the Tracing Project ID, which can be copied from the individual project view by clicking into the project in the Tracing Projects list.

Use the returned id to reference this export in subsequent bulk export operations.

Exportable fields

By default, bulk exports include the following fields for each run. You can optionally specify a subset of these fields when creating an export to reduce file size and export time. For more details, refer to Limiting exported fields. The following fields are available for export: Identifiers & hierarchy:

Field	Description
`id`	Run ID
`tenant_id`	Workspace/tenant ID
`session_id`	Project/session ID
`trace_id`	Trace ID
`parent_run_id`	Parent run ID
`parent_run_ids`	List of all parent run IDs
`reference_example_id`	Reference to example if part of a dataset

Basic metadata:

Field	Description
`name`	Run name
`run_type`	Type of run (e.g., “chain”, “llm”, “tool”)
`start_time`	Start timestamp (UTC)
`end_time`	End timestamp (UTC)
`status`	Run status (e.g., “success”, “error”)
`is_root`	Whether this is a root-level run
`dotted_order`	Hierarchical ordering string
`trace_tier`	Trace tier/retention level

Run data:

Field	Description
`inputs`	Run inputs (JSON)
`outputs`	Run outputs (JSON)
`error`	Error message if failed
`extra`	Extra metadata (JSON)
`events`	Run events (JSON)

Tags & feedback:

Field	Description
`tags`	List of tags
`feedback_stats`	Feedback statistics (JSON)

Token usage & costs:

Field	Description
`total_tokens`	Total token count
`prompt_tokens`	Prompt token count
`completion_tokens`	Completion token count
`total_cost`	Total cost
`prompt_cost`	Prompt cost
`completion_cost`	Completion cost
`first_token_time`	Time to first token

Limiting exported fields

Requires LangSmith Helm version >= 0.12.11 (application version >= 0.12.42). This feature is supported in scheduled bulk exports and standard bulk exports.

You can improve bulk export speed and reduce row size by limiting which fields are included in the exported Parquet files using the export_fields parameter. When export_fields is provided, only the specified fields are exported as columns in the Parquet files. When export_fields is not provided, all exportable fields are included. This is particularly useful when you want to exclude larger fields like inputs and outputs. The following example creates an export job that only includes specific fields:

curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "end_time": "2024-01-02T23:59:59Z",
    "export_fields": ["id", "name", "run_type", "start_time", "end_time", "status", "total_tokens", "total_cost"],
    "format_version": "v2_beta"
  }'

The export_fields parameter accepts an array of field names. Available fields include the Run data format fields as well as additional export-only fields:

tenant_id
is_root

Performance tip: Excluding inputs and outputs from your export can significantly improve export performance and reduce file sizes, especially for large runs. Only include these fields if you need them for your analysis.

Scheduled exports

Requires LangSmith Helm version >= 0.10.42 (application version >= 0.10.109)

Scheduled exports collect runs periodically and export to the configured destination. To create a scheduled export, include interval_hours and remove end_time:

curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "filter": "and(eq(run_type, \"llm\"), eq(name, \"ChatOpenAI\"), eq(input_key, \"messages.content\"), like(input_value, \"%messages.content%\"))",
    "interval_hours": 1,
    "format_version": "v2_beta"
  }'

You can also use export_fields with scheduled exports to limit which fields are exported:

curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "interval_hours": 1,
    "export_fields": ["id", "name", "run_type", "start_time", "end_time", "status", "total_tokens", "total_cost"],
    "format_version": "v2_beta"
  }'

Details

interval_hours must be between 1 hour and 168 hours (1 week) inclusive.
For spawned exports, the first time range exported is start_time=(scheduled_export_start_time), end_time=(start_time + interval_hours). Then start_time=(previous_export_end_time), end_time=(this_export_start_time + interval_hours), and so on.
end_time must be omitted for scheduled exports. end_time is still required for non-scheduled exports.
Scheduled exports can be stopped by cancelling the export.
- Exports that have been spawned by a scheduled export have the source_bulk_export_id attribute filled.
- If desired, these spawned bulk exports must be canceled separately from the source scheduled bulk export - canceling the source bulk export does not cancel the spawned bulk exports.
Spawned exports run at end_time + 10 minutes to account for any runs that are submitted with end_time in the recent past.
format_version (optional): The format version to use for the parquet files. "v2_beta" has (1) enhanced datatypes for the columns and (2) a Hive-compliant folder structure.

Example If a scheduled bulk export is created with start_time=2025-07-16T00:00:00Z and interval_hours=6:

Export	Start Time	End Time	Runs At
1	2025-07-16T00:00:00Z	2025-07-16T06:00:00Z	2025-07-16T06:10:00Z
2	2025-07-16T06:00:00Z	2025-07-16T12:00:00Z	2025-07-16T12:10:00Z
3	2025-07-16T12:00:00Z	2025-07-16T18:00:00Z	2025-07-16T18:10:00Z

Monitoring the export job

Monitor export status

To monitor the status of an export job, use the following cURL command:

curl --request GET \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID'

Replace {export_id} with the ID of the export you want to monitor. This command retrieves the current status of the specified export job.

List runs for an export

An export is typically broken up into multiple runs which correspond to a specific date partition to export. To list all runs associated with a specific export, use the following cURL command:

curl --request GET \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}/runs' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID'

This command fetches all runs related to the specified export, providing details such as run ID, status, creation time, rows exported, etc.

List all exports

To retrieve a list of all export jobs, use the following cURL command:

curl --request GET \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID'

This command returns a list of all export jobs along with their current statuses and creation timestamps.

Stop an export

To stop an existing export, use the following cURL command:

curl --request PATCH \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "status": "Cancelled"
}'

Replace {export_id} with the ID of the export you wish to cancel. Note that a job cannot be restarted once it has been cancelled, you will need to create a new export job instead.

Partitioning scheme

Data will be exported into your bucket into the follow Hive partitioned format:

<bucket>/<prefix>/export_id=<export_id>/tenant_id=<tenant_id>/session_id=<session_id>/runs/year=<year>/month=<month>/day=<day>

Importing data into other systems

Importing data from S3 and Parquet format is commonly supported by the majority of analytical systems. See below for documentation links:

BigQuery

To import your data into BigQuery, see Loading Data from Parquet and also Hive Partitioned loads.

Snowflake

You can load data into Snowflake from S3 by following the Load from Cloud Document.

RedShift

You can COPY data from S3 or Parquet into Amazon Redshift by following the AWS COPY command documentation.

Clickhouse

You can directly query data in S3 / Parquet format in Clickhouse. As an example, if using GCS, you can query the data as follows:

SELECT count(distinct id) FROM s3('https://storage.googleapis.com/<bucket>/<prefix>/export_id=<export_id>/**',
 'access_key_id', 'access_secret', 'Parquet')

See Clickhouse S3 Integration Documentation for more information.

DuckDB

You can query the data from S3 in-memory with SQL using DuckDB. See S3 import Documentation.

Error handling

Debugging destination errors

The destinations API endpoint will validate that the destination and credentials are valid and that write access is is present for the bucket. If you receive an error, and would like to debug this error, you can use the AWS CLI to test the connectivity to the bucket. You should be able to write a file with the CLI using the same data that you supplied to the destinations API above. AWS S3:

aws configure

# set the same access key credentials and region as you used for the destination
> AWS Access Key ID: <access_key_id>
> AWS Secret Access Key: <secret_access_key>
> Default region name [us-east-1]: <region>

# List buckets
aws s3 ls /

# test write permissions
touch ./test.txt
aws s3 cp ./test.txt s3://<bucket-name>/tmp/test.txt

GCS Compatible Buckets: You will need to supply the endpoint_url with --endpoint-url option. For GCS, the endpoint_url is typically https://storage.googleapis.com:

aws configure

# set the same access key credentials and region as you used for the destination
> AWS Access Key ID: <access_key_id>
> AWS Secret Access Key: <secret_access_key>
> Default region name [us-east-1]: <region>

# List buckets
aws s3 --endpoint-url=<endpoint_url> ls /

# test write permissions
touch ./test.txt
aws s3 --endpoint-url=<endpoint_url> cp ./test.txt s3://<bucket-name>/tmp/test.txt

Monitoring runs

You can monitor your runs using the List Runs API. If this is a known error, this will be added to the errors field of the run.

Common errors

Here are some common errors:

Error	Description
Access denied	The blob store credentials or bucket are not valid. This error occurs when the provided access key and secret key combination doesn’t have the necessary permissions to access the specified bucket or perform the required operations.
Bucket is not valid	The specified blob store bucket is not valid. This error is thrown when the bucket doesn’t exist or there is not enough access to perform writes on the bucket.
Key ID you provided does not exist	The blob store credentials provided are not valid. This error occurs when the access key ID used for authentication is not a valid key.
Invalid endpoint	The endpoint_url provided is invalid. This error is raised when the specified endpoint is an invalid endpoint. Only S3 compatible endpoints are supported, for example `https://storage.googleapis.com` for GCS, `https://play.min.io` for minio, etc. If using AWS, you should omit the endpoint_url.

Failure modes and retry policy

LangSmith bulk exports handle transient failures and infrastructure issues automatically to ensure resilience. Each bulk export is divided into multiple runs, where each run processes data for a specific date partition (typically organized by day). Runs are processed independently, which enables:

Parallel processing of different time periods.
Independent retry logic for each run.
Resumption from specific checkpoints if interrupted.

Each run (date range) in your export has its own failure handling and retry budget. If a run fails after exhausting all retries, the entire export is marked as FAILED.

Automatic retry behavior

Export jobs automatically retry transient failures with the following behavior:

Maximum retry attempts: 20 retries per run (subject to change).
Retry delay: 30 seconds between attempts (fixed, no exponential backoff).
Run timeout: 4 hours maximum per run.
Overall workflow timeout: 72 hours for the entire export.

Failure scenarios

Failure type	Cause	Automatic retry?	Action required
Infrastructure interruption	Deployments, server restarts, worker crashes	Yes, automatically requeued with remaining retries.	None, jobs resume automatically.
Run timeout	Single run exceeds 4-hour limit	Yes, retried up to 20 times (subject to change).	If persistent, narrow date range, add filters, or limit the exported fields.
Workflow timeout	Entire export exceeds 72 hours	No	Reduce export scope (date range, filters) or break into smaller exports.
Storage/destination errors	Invalid credentials, missing bucket, permission issues	No	Fix destination configuration and create new export.
Destination deleted	Bucket removed during export	No	Recreate destination and restart export.
Terminal processing errors	Data serialization issues, resource exhaustion	Yes, retried up to 20 times (subject to change).	Check run error details; may require investigation.

Any single run failure (after all retries are exhausted) causes the entire export to fail.

Export status lifecycle

Exports can have the following statuses:

Status	Description
`CREATED`	Export has been created but not yet started processing.
`RUNNING`	Export is actively processing runs.
`COMPLETED`	All runs successfully exported.
`FAILED`	One or more runs failed after exhausting retries.
`CANCELLED`	Export was manually cancelled by user.
`TIMEDOUT`	Export exceeded the 48-hour workflow timeout.

Individual runs can have the same possible statuses: CREATED, RUNNING, COMPLETED, FAILED, CANCELLED, or TIMEDOUT.

Concurrency and rate limits

To ensure system stability, exports are subject to the following limits:

Maximum concurrent runs per export: 45
Maximum concurrent exports per workspace: 15

If you have multiple exports running, new run jobs will queue until capacity becomes available.

Progress tracking and resumability

The export system maintains detailed progress metadata for each run:

Latest cursor position in the data stream.
Number of rows exported.
List of Parquet files written.

This progress tracking enables:

Graceful resumption: If a run is interrupted (e.g., by a deployment), it resumes from the last checkpoint rather than starting over.
Progress monitoring: Track how much data has been exported through the API.
Efficient retries: Failed runs don’t re-export data that was already successfully written.

Troubleshooting failed exports

If your export fails, follow these steps:

Check the export status: Use the GET /api/v1/bulk-exports/{export_id} endpoint to retrieve the export details and status.
Review run errors: Each run includes an errors field with detailed error messages keyed by retry attempt (e.g., retry_0, retry_1).
Verify destination access: Ensure your destination bucket still exists and credentials are valid.
Check run size: If you see timeout errors, your date partitions may contain too much data. It may be helpful to limit the exported fields.
Review system limits: Ensure you’re not hitting concurrency limits (5 runs per export, 3 exports per workspace).

To monitor your export job, refer to Monitoring the export job. For storage-related errors, you can test your destination configuration using the AWS CLI or gsutil before retrying the export.

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Tracing setup

Configuration & troubleshooting

Viewing & managing traces

Automations

Feedback & evaluation

Monitoring & alerting

Data type reference

​Destinations

​Exporting data

​Destinations - providing a S3 bucket

​Preparing the destination

​Permissions required

​Create a destination

​Credentials configuration

​AWS S3 bucket

​Google GCS XML S3 compatible bucket

​Create an export job

​Exportable fields

​Limiting exported fields

​Scheduled exports

​Monitoring the export job

​Monitor export status

​List runs for an export

​List all exports

​Stop an export

​Partitioning scheme

​Importing data into other systems

​BigQuery

​Snowflake

​RedShift

​Clickhouse

​DuckDB

​Error handling

​Debugging destination errors

​Monitoring runs

​Common errors

​Failure modes and retry policy

​Automatic retry behavior

​Failure scenarios

​Export status lifecycle

​Concurrency and rate limits

​Progress tracking and resumability

​Troubleshooting failed exports

Destinations

Exporting data

Destinations - providing a S3 bucket

Preparing the destination

Permissions required

Create a destination

Credentials configuration

AWS S3 bucket

Google GCS XML S3 compatible bucket

Create an export job

Exportable fields

Limiting exported fields

Scheduled exports

Monitoring the export job

Monitor export status

List runs for an export

List all exports

Stop an export

Partitioning scheme

Importing data into other systems

BigQuery

Snowflake

RedShift

Clickhouse

DuckDB

Error handling

Debugging destination errors

Monitoring runs

Common errors

Failure modes and retry policy

Automatic retry behavior

Failure scenarios

Export status lifecycle

Concurrency and rate limits

Progress tracking and resumability

Troubleshooting failed exports