All resources
Guides

Guide: Sharing and Marketing Data Products on Databricks

This guide will provide everything you need to start sharing data on Databricks whether you sell data as a product or share data with customers as a value-added service. We’ll walk through the basics of the platform and common use cases and then dive into the things you need to know to get started sharing on the Marketplace and beyond.

Steven Jacobs

Vice President of Marketing

Databricks is the analytics platform of choice for many data teams. Rooted in open source, the platform grew as a favorite among advanced data teams working on ML / AI but now is the foundation for many enterprises' entire data operations. If you share data with customers, there’s a good chance that a meaningful portion of your consumers evaluate and use your products inside the Databricks platform.

Over the past few years, Databricks has launched a set of powerful new features that empower providers of analytical data to dramatically accelerate the ability for consumers to find and use their products. Through its data marketplace and sharing protocol, providers can now market and deliver ready-to-query data instantly to prospects and customers natively in their workspace. That means providers can reach new consumers and accelerate the time-to-insight for prospects and existing customers.

This guide will provide everything you need to start sharing data on Databricks whether you sell data as a product or share data with customers as a value-added service. We’ll walk through the basics of the platform and common use cases and then dive into the things you need to know to get started sharing on the Marketplace and beyond.

What is Databricks?

Databricks offers an integrated set of tools for building, deploying and managing data solutions at scale. Billed as a “Data Lakehouse,” Databricks allows teams to manage integration, storage, processing, governance, sharing, analytics and AI from a single platform. Originally built to support data science and engineering efforts, the platform has expanded in recent years into business intelligence as well as data warehousing workloads.

The platform offers three ways to share and market data and analytical products to its users – all of which are powered by its sharing protocol, Delta Sharing

Providers can:

  1. share products directly to specific users via direct Delta Shares; 
  2. share private listings for products directly with approved consumers using a Private Exchange; 
  3. and lastly, share and market products publicly on the Databricks Marketplace.

What is Delta Sharing?

Delta Sharing is the backbone that powers all data sharing on the platform. Delta Sharing is an open-source standard sponsored by Databricks that allows companies to share data (and analytical) products to individuals outside of their organization. With Delta Sharing, data is shared not replicated: organizations grant a user access to a given data product, which continues to be managed by the provider. 

Delta Sharing is unique because it offers some degree of cross-platform sharing. Unlike other sharing protocols, Delta Sharing is open-sourced, which means other apps and platforms can use and build it into their services. Today, a handful of applications like PowerBI, Lookr, Collibra, Immuta and other platforms offer some support for the protocols. 

However, companies must manage the product within Databricks to create a Delta Share usable on the Marketplace. Once a product is managed in a Databricks Workspace, companies can easily create a share around a specific set of tables or an analytical asset, which can be attached to a listing.

For companies that do not already work in Databricks, Data Sharing Platforms like Bobsled can help providers leverage Delta Sharing and the Databricks Marketplace without building and managing infrastructure in the platform. (See more about  Data Sharing Platforms below.)

What is the Databricks Marketplace?

Launched in 2022, the Databricks Marketplace allows Databricks users to discover, evaluate and access datasets and analytical assets (e.g. ML models, notebooks, applications and dashboards) from external providers. Data providers manage listings around each product. These listings include basic information about the product and its use cases as well as notebooks with code to get started. Today, most products shared on the marketplace are datasets but the platform also allows users to share ML / AI models as well. 

Importantly, every Marketplace Listing must be connected to an active Delta Share. Providers can specify whether the data is accessible on-demand or upon approval, but regardless, providers need to have their product available via Delta Sharing. This requires the provider to manage their product in a Databricks workspace to create an active share. (See more in “Getting Started” for the details. )

What is a Private Exchange on Databricks?

Private Exchanges are a feature of the Databricks Marketplace that allow users to share listings with only approved users. Listings shared on the Private Exchange are only available to specific users approved by the provider. They offer similar functionality as the Marketplace – a listing with information and access to a data product backed by a delta share – just only available to approved users.

Three ways to share and market data and analytics products on Databricks 

Who can share data products on Databricks Marketplace?

Short answer: anyone with data assets available via Delta Sharing. The most common type of provider on the platform are data vendors (think: companies like CoreLogic that market data as their primary revenue source.)

What are the benefits of sharing data on Databricks?

Delta sharing and the marketplace are powerful tools for consumers – and in turn, big opportunities for providers. The faster a consumer can find, evaluate and implement external data, the more data they can discover and buy. For providers, that means the ability to sell to new personas, accelerate sales cycles by reducing the time-to-insight, and eliminating the overall friction in their go-to-market. 

  • Reach new buyers: Databricks is a popular platform for data science and AI teams and the Marketplace provides a central way to reach them as they search for new products.
  • Accelerate sales cycles:  Sharing data natively using a protocol like Delta Sharing dramatically reduces the time it takes for prospects to go from interest to insight with a data product. 
  • Move beyond data: Share workbooks, models and other analytical products that help get users started.

Getting Started 

Step 1: Understand the Basic Requirements

In order to create a Marketplace Listing providers need to meet some basic requirements. Some of these requirements are administrative (e.g. becoming a partner), but several are strategic and potentially costly if done incorrectly. The single biggest challenge providers will face in sharing on Databricks is infrastructure: if your team does not already use Databricks, getting a data product ready to share can be costly and time consuming if done incorrectly. 

Basic requirements to share on Databricks Marketplace

Step 2: Create the business case

First things first: identify how sharing data onto Databricks can help grow your business. Databricks Marketplace offers a great way to find new customers and delivering data via direct Delta Sharing can substantially reduce the time and costs to generate insights for consumers. There’s a good chance that your existing customers and prospects use Databricks, but it’s critical to build the business case first. 

  • Master key concepts: Do your research so you understand the key concepts and business case for sharing on Databricks. 
  • Map out demand: Ask your account reps and sales teams not only which existing customers and prospects have requested native delivery, but also which host their data on Databricks and could benefit from native delivery.
  • See what competitors are doing: Look through the Databricks Marketplace to see whether your competitors show up. If they do, there’s a good chance they are finding customers in the channel.

Step 3: Ready your infrastructure

The biggest barrier for most companies to share data on Databricks is infrastructure. In order to share data to the marketplace (or elsewhere), the dataset needs to be managed in a Databricks Workspace. If you build and manage your data products in Databricks already, this is not an issue; but for companies that manage products elsewhere, this can be challenging. 

There are two main implementation options for providers whose data lives elsewhere to share data on Databricks. Providers can either create and manage a Databricks Managed Workspace for the sole purpose of sharing on Databricks or use a Data Sharing Platform.

Building In-House

Managing sharing-only infrastructure is a classic example of “undifferentiated heavy lifting,” a term coined by Jeff Besos to describe tasks that require lots of work with little strategic value. Setting up sharing-only infrastructure requires a meaningful amount of work for your engineering team that does not contribute to the overall competitiveness of your core product. Sharing is increasingly a requirement; not a competitive advantage.

Common challenge building sharing-only infrastructure

  • Learning Curve: Your team is an expert in your source platform. They know how it works, its quirks and how to fix problems. Sharing-only infrastructure requires them to learn an entirely new platform, which means slow development cycles, higher likelihood for errors, and frustration all around.  
  • Technical debt: Managing data products in one infrastructure is hard enough. If your data needs to be replicated and managed elsewhere, that’s another breakpoint in your system. Software breaks, platforms change protocols, the team member who built it leaves.
  • Exponential complexity: Databricks is not the only platform your customers use. The work your team does to support Databricks does not translate to sharing into other platforms like Snowflake, BigQuery and AWS.

What makes Databricks even more difficult is that it’s a “platform-as-a-service” model. Unlike other “software-as-a-service” platforms, Databricks is delivered as a set of integrated technologies which the buyer can implement and manage as they see fit. That’s enormously valuable for sophisticated data teams looking for nuanced control over their infrastructure, but can create significant complexity for teams that want to get up and running as fast as possible.  

A critical consideration with Databricks is security. Delta Sharing offers a range of enterprise-grade security features, but importantly, it’s the users’ responsibility to effectively configure and manage them in their own cloud. Data providers will need to actively engage their security team to ensure that configurations are set properly and network traffic is monitored actively. 

Using a Data Sharing Platform

Data sharing platforms are a new but important part of the modern delivery stack for data-as-a-service companies. Data sharing platforms allow companies to share data products across any major cloud or platform from a single control plane without opening an account in the destination platform. These platforms manage all of your sharing-only infrastructure as a service and allow providers to stay focused on building products in the platform their teams know.

Build vs. buy considerations should be impacted by two factors: where your data currently resides and the platforms (including Databricks) where your customers reside.

Step 4: Create a listing 

Listings are the foundation of the Databricks Marketplace. Listings include basic information about the provider, detailed descriptions about the product and its use cases, and importantly, sample notebooks that prospects can use to start seeing the value of the product in action. 

All listings must reference a live Delta Share but, providers have the opportunity to make the data available instantly or only upon review. When products are available “upon review”, providers can review the request prior to delivering the data. Importantly, as of public launch, the marketplace does not allow users to transact on the platform. 

A few tips for building Marketplace Listings

  • Think about the persona. The individuals searching are not necessarily going to be your traditional customer. Think about how other personas—a data scientist at a retailer or a researcher at a pharmaceutical company —might find your data on the marketplace.
  • Make trial data available instantly: If possible, include a limited, trial version of a data product. The biggest benefit of Databricks Marketplace is the ease with which consumers can explore datasets. Data from other marketplaces suggest that listings with free on-demand data substantially outperform others.
  • Build a data dictionary. Make sure you have a comprehensive data dictionary so customers can explore and analyze the data without any support.
  • Create “quick start” queries. Help prospects understand the value of your dataset by sharing sample queries in the listing.

Step 5: Grow on Databricks (and beyond)

Databricks is a powerful go-to-market engine that can help data providers find customers once they join the marketplace. 

  • Partner with Databricks: Databricks is a powerful go-to-market machine—and partnership with their marketplace team can be helpful. 
  • Experiment with product led growth: Making limited, public-ready data instantly available is a powerful component of the Databricks Marketplace. Experiment with creating “open-access trial datasets” for listings to see if it drives a product-led growth motion.
  • Expand cloud sharing program: Databricks is a critical part of the modern data stack, but there’s no one winner. Customers will live on the platforms, and succeeding on Databricks is a great place to start to grow a broader cloud sharing program.

How Bobsled Can Help

Why data providers use Bobsled 

Bobsled is the leading Data Sharing Platform that allows users to share ready-to-query data to any cloud data lake or warehouse without managing new infrastructure. That means engineering teams can focus on building world-class data products in the platforms they know and sales teams can promise a modern sharing experience for customers no matter where they work. 

Leading data companies like CoreLogic, Carto and LinkUp already use Bobsled to share data to power their data sharing offering.

“Making data products available to CoreLogic clients in the platforms where they work is an essential part of the modern data business – and Bobsled is the technology that helps us get there,” said Brian Battaglia, Executive, Property Intelligence Solutions, at CoreLogic. “With Bobsled, we can make our data solutions accessible to CoreLogic clients across platforms, which means faster trials and quicker sales cycles. Bobsled is a critical component to enhancing our cloud alliance strategy.”

How data sharing works with Bobsled

Data providers select the data products they want to share from their platform of record, pick the account in the platform where their customers work – and Bobsled handles the rest. No platforms to learn, pipelines to manage, or infrastructure to secure. Sharing is managed through a single control plane and customers

Analytics-ready data wherever your customer works

  • Secure, fully managed connections: Support delivery to a customer’s data lake or warehouse without building pipelines or learning new platforms.
  • Share ready-to-query data, anywhere: Make your data instantly accessible and ready to analyze natively in the platforms where your consumers work.
  • Data product analytics and dashboard: Track deliveries and usage across platforms to ensure customers get the data they need when they need it.

Getting started with Bobsled

Bobsled offers commitment-free, consumption-based pricing so you only pay for what you use. Customers can usually get started sharing data to customers in Databricks, Snowflake, GCP and AWS in an afternoon.

By clicking download you're confirming that you agree with our Terms and Conditions.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Related resources

Webinar
Guides

Building Data Products That Scale

Bobsled CEO Jake Graham talks with Vertical Knowledge Chief Product Officer Rayne Gaisford and Facteus Co-Founder Jonathan Chin about the investments they are making to not only create differentiated data sets but build products that make accessing those insights as simple and fast as possible.

Q&A
Guides

Winning on Snowflake Marketplace: Insights from the Inside

We talk with Tom Gray, Principal of Financial Services Data Collaboration, at Snowflake about what the top companies are doing differently in its marketplace and how a new set of technologies are transforming the way data and app providers market their products.