Skip to main content

JDBC

The dbt Semantic Layer Java Database Connectivity (JDBC) API enables users to query metrics and dimensions using the JDBC protocol, while also providing standard metadata functionality.

A JDBC driver is a software component enabling a Java application to interact with a data platform. Here's some more information about our JDBC API:

  • The Semantic Layer JDBC API utilizes the open-source JDBC driver with ArrowFlight SQL protocol.
  • You can download the JDBC driver from Maven.
  • The dbt Semantic Layer supports ArrowFlight SQL driver version 12.0.0 and higher.
  • You can embed the driver into your application stack as needed, and you can use dbt Labs' example project for reference.
  • If you’re a partner or user building a homegrown application, you’ll need to install an AWS root CA to the Java Trust documentation (specific to Java and JDBC call).

dbt Labs partners can use the JDBC API to build integrations in their tools with the dbt Semantic Layer

Using the JDBC API

If you are a dbt user or partner with access to dbt Cloud and the dbt Semantic Layer, you can setup and test this API with data from your own instance by configuring the Semantic Layer and obtaining the right JDBC connection parameters described in this document.

You may be able to use our JDBC API with tools that do not have an official integration with the dbt Semantic Layer. If the tool you use allows you to write SQL and either supports a generic JDBC driver option (such as DataGrip) or supports Dremio and uses ArrowFlightSQL driver version 12.0.0 or higher, you can access the Semantic Layer API.

Refer to Get started with the dbt Semantic Layer for more info.

Note that the dbt Semantic Layer API doesn't support ref to call dbt objects. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code.

Authentication

dbt Cloud authorizes requests to the dbt Semantic Layer API. You need to provide an environment ID, host, and service account tokens.

Connection parameters

The JDBC connection requires a few different connection parameters.

This is an example of a URL connection string and the individual components:

jdbc:arrow-flight-sql://semantic-layer.cloud.getdbt.com:443?&environmentId=202339&token=SERVICE_TOKEN
JDBC parameterDescriptionExample
jdbc:arrow-flight-sql://The protocol for the JDBC driver.jdbc:arrow-flight-sql://
semantic-layer.cloud.getdbt.comThe access URL for your account's dbt Cloud region. You must always add the semantic-layer prefix before the access URL.For dbt Cloud deployment hosted in North America, use semantic-layer.cloud.getdbt.com
environmentIdThe unique identifier for the dbt production environment, you can retrieve this from the dbt Cloud URL
when you navigate to Environments under Deploy.
If your URL ends with .../environments/222222, your environmentId is 222222

SERVICE_TOKENdbt Cloud service token with “Semantic Layer Only” and "Metadata Only" permissions. Create a new service token on the Account Settings page.token=SERVICE_TOKEN

*Note If you're testing locally on a tool like DataGrip, you may also have to provide the following variable at the end or beginning of the JDBC URL &disableCertificateVerification=true.

Querying the API for metric metadata

The Semantic Layer JDBC API has built-in metadata calls which can provide a user with information about their metrics and dimensions.

Refer to the following tabs for metadata commands and examples:

Use this query to fetch all defined metrics in your dbt project:

select * from {{ 
semantic_layer.metrics()
}}

Use this query to fetch available metrics given dimensions. This command is essentially the opposite of getting dimensions given a list of metrics.

Note, group_by is a required argument that lists one or multiple dimensions in it.

select * from {{
semantic_layer.metrics_for_dimensions(group_by=['customer__customer_type'])

}}

Querying the API for metric values

To query metric values, here are the following parameters that are available. Your query must have either a metric or a group_by parameter to be valid.

ParameterDescriptionExample
metricsThe metric name as defined in your dbt metric configurationmetrics=['revenue']
group_byDimension names or entities to group by. We require a reference to the entity of the dimension (other than for the primary time dimension), which is pre-appended to the front of the dimension name with a double underscore.group_by=['user__country', 'metric_time']
grainA parameter specific to any time dimension and changes the grain of the data from the default for the metric.group_by=[Dimension('metric_time')
grain('week\|day\|month\|quarter\|year')]
whereA where clause that allows you to filter on dimensions and entities using parameters. This takes a filter list OR string. Inputs come with Dimension, and Entity objects. Granularity is required if the Dimension is a time dimension"{{ where=Dimension('customer__country') }} = 'US')"
limitLimit the data returnedlimit=10
orderOrder the data returned by a particular fieldorder_by=['order_gross_profit'], use - for descending, or full object notation if the object is operated on: order_by=[Metric('order_gross_profit').descending(True)]
compileIf true, returns generated SQL for the data platform but does not executecompile=True

Note on time dimensions and metric_time

You will notice that in the list of dimensions for all metrics, there is a dimension called metric_time. Metric_time is a reserved keyword for the measure-specific aggregation time dimensions. For any time-series metric, the metric_time keyword should always be available for use in queries. This is a common dimension across all metrics in a semantic graph.

You can look at a single metric or hundreds of metrics, and if you group by metric_time, it will always give you the correct time series.

Additionally, when performing granularity calculations that are global (not specific to a particular time dimension), we recommend you always operate on metric_time and you will get the correct answer.

Note that metric_time should be available in addition to any other time dimensions that are available for the metric(s). In the case where you are looking at one metric (or multiple metrics from the same data source), the values in the series for the primary time dimension and metric_time are equivalent.

Examples

Refer to the following examples to help you get started with the JDBC API.

Fetch metadata for metrics

You can filter/add any SQL outside of the templating syntax. For example, you can use the following query to fetch the name and dimensions for a metric:

select name, dimensions from {{ 
semantic_layer.metrics()
}}
WHERE name='food_order_amount'

Query common dimensions

You can select common dimensions for multiple metrics. Use the following query to fetch the name and dimensions for multiple metrics:

select * from {{ 
semantic_layer.dimensions(metrics=['food_order_amount', 'order_gross_profit'])
}}

Query grouped by time

The following example query uses the shorthand method to fetch revenue and new customers grouped by time:

select * from {{
semantic_layer.query(metrics=['food_order_amount','order_gross_profit'],
group_by=['metric_time'])
}}

Query with a time grain

Use the following example query to fetch multiple metrics with a change in time dimension granularities:

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month')])
}}

Group by categorical dimension

Use the following query to group by a categorical dimension:

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'), 'customer__customer_type'])
}}

Query only a dimension

In this case, you'll get the full list of dimension values for the chosen dimension.

select * from {{
semantic_layer.query(group_by=['customer__customer_type'])
}}

Query with where filters

Where filters in API allow for a filter list or string. We recommend using the filter list for production applications as this format will realize all benefits from the Predicate pushdown where possible.

Where Filters have a few objects that you can use:

  • Dimension() - Used for any categorical or time dimensions. If used for a time dimension, granularity is required - Dimension('metric_time').grain('week') or Dimension('customer__country')

  • Entity() - Used for entities like primary and foreign keys - Entity('order_id')

Note: If you prefer a more explicit path to create the where clause, you can optionally use the TimeDimension feature. This helps separate out categorical dimensions from time-related ones. The TimeDimesion input takes the time dimension name and also requires granularity, like this: TimeDimension('metric_time', 'MONTH').

  • Use the following example to query using a where filter with the string format:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
where="{{ Dimension('metric_time').grain('month') }} >= '2017-03-09' AND {{ Dimension('customer__customer_type' }} in ('new') AND {{ Entity('order_id') }} = 10")
}}
  • (Recommended for better performance) Use the following example to query using a where filter with a filter list format:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
where=["{{ Dimension('metric_time').grain('month') }} >= '2017-03-09'", "{{ Dimension('customer__customer_type' }} in ('new')", "{{ Entity('order_id') }} = 10"]
}}

Query with a limit

Use the following example to query using a limit or order_by clauses:

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time')],
limit=10)
}}

Query with Order By Examples

Order By can take a basic string that's a Dimension, Metric, or Entity and this will default to ascending order

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time')],
limit=10,
order_by=['order_gross_profit']
}}

For descending order, you can add a - sign in front of the object. However, you can only use this short hand notation if you aren't operating on the object or using the full object notation.

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time')],
limit=10,
order_by=[-'order_gross_profit']
}}

If you are ordering by an object that's been operated on (e.g., change granularity), or you are using the full object notation, descending order must look like:

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('week')],
limit=10,
order_by=[Metric('order_gross_profit').descending(True), Dimension('metric_time').grain('week').descending(True) ]
}}

Similarly, this will yield ascending order:

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('week')],
limit=10,
order_by=[Metric('order_gross_profit'), Dimension('metric_time').grain('week')]
}}

Query with compile keyword

Use the following example to query using a compile keyword:

select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
compile=True)
}}

FAQs

Hover to view
  • Why do some dimensions use different syntax, like metric_time versus [Dimension('metric_time')?
    When you select a dimension on its own, such as metric_time you can use the shorthand method which doesn't need the “Dimension” syntax. However, when you perform operations on the dimension, such as adding granularity, the object syntax [Dimension('metric_time') is required.

  • What does the double underscore "__" syntax in dimensions mean?
    The double underscore "__" syntax indicates a mapping from an entity to a dimension, as well as where the dimension is located. For example, user__country means someone is looking at the country dimension from the user table.

  • What is the default output when adding granularity?
    The default output follows the format {time_dimension_name}__{granularity_level}. So for example, if the time dimension name is ds and the granularity level is yearly, the output is ds__year.

0