2024 Data Infrastructure Market Map: Navigating the Changing Data Infrastructure Landscape
Team8 is pleased to release our Data Infrastructure Market Map for 2024, our first annual update to the inaugural Data Map released in 2023. We are also excited to unveil our new AI Landscape Map, which we will introduce in more detail below. In this article, we’ll detail the major changes to the maps including categories that have consolidated or emerged to address problems in a new way.
Data Platforms Consolidation
A growing number of companies are building comprehensive data platforms encompassing all elements of the data stack including ingestion, storage, management, and analytics capabilities. We created a new category of Data Platforms that encompass this trend with AWS, Google Cloud, Microsoft Fabric, Databricks, and Snowflake emerging as the leaders. We expect to see acquisitions by these companies to expand and complete their offerings. Alternatively, Light Data Stacks are an emerging category of lightweight, modular data infrastructure platforms with an emphasis on pricing and ease of use. These products are ideal for smaller and less mature organizations that previously lacked access to lightweight tooling and can now access powerful data infrastructure.
Solidifying The Lakehouse Architecture
Data Lakehouses represent a shift in how businesses approach data management. The data landscape has historically been divided between data lakes and data warehouses. Data lakes offer flexibility for storing vast amounts of diverse data, while data warehouses provide structure and governance for specific analytics needs. This creates a siloed environment, hindering the ability of organizations to derive insights from the full spectrum of data. By providing a single platform for storing, processing, and analyzing structured and unstructured data with capabilities like schema enforcement, indexing, and metadata management, data lakehouses combine the advantages of data lakes and data warehouses. Lakehouses foster wider access to data across the organization for business analysts and data teams. Cloud storage and open-source technologies such as Apache Iceberg have made lakehouse implementations more affordable and accessible. We expect to see this architecture becoming increasingly pervasive in the coming years.
Consolidation and Competition In Business Intelligence (BI)
Established BI platforms like Tableau and Microsoft Power BI continue to hold their ground, offering robust features and functionality for data visualization and exploration. However, many startups have entered the space forcing established vendors to invest heavily in new features to keep pace. Going forward, BI vendors will continue to enhance their platforms with advanced analytics, new interfaces, AI-powered insights, and improved data visualization capabilities. Vertical-based solutions continue to emerge, challenging horizontal platforms with industry-specific insights and use cases in industries such as Ecommerce, retail, healthcare, logistics, and finance. In the coming years, we expect to see new and existing BI tools focus increasingly on non-technical users with new interfaces and intelligence capabilities.
Rethinking the Metrics Layer
While the Metrics Layer was featured as a category in the 2023 Data Ecosystem Map, many of the companies have faced difficulties developing their own standalone products. The metrics layer had the goal of streamlining metric calculations and ensuring consistency across BI tools and downstream use-cases. However, adoption has thus far been limited due to business logic complexity and unclear positioning within the data stack. Some companies have built out capabilities for a specific vertical such as SaaS or eCommerce, while others have built out BI features or been acquired. The most notable of these acquisitions was dbt Labs’ acquisition of Transform in February 2023. Nonetheless, the problems of different teams in the same enterprise reporting different numbers and overwhelming business logic complexity still remain. The next version of these tools will look to leverage automation and metadata activation to achieve these goals, which we have detailed in our report A CDOs Guide – Making Data Work: On the Need for Next-Gen Automated Data Fabric.
The Rise of The Analytics Assistant
The Analytics Assistant category, a new area in the market map, includes tools that enable less-technical users to interact with data using natural language queries and execute automated tasks. CDAOs have long had the goal of democratizing data outside of the data analyst dashboard creation paradigm. Analytics assistants promise to enable non-technical users to explore and gather insights from data. The parallels with BI tools are unmistakable, and we see this as a space that will challenge and push BI platforms to compete intensely. These platforms should support integrations into key data and business platforms, enable a source of truth, and support intuitive query interfaces for the long tail of users in organizations. For companies using analytics assistants, managing metadata properly is vital in maintaining consistency across the data stack and enabling AI to find the most accurate data to answer user queries. Analytics Assistants are a fast-evolving category with the potential to have a significant impact on how organizations utilize their data.
Reverse ETL Companies Pivot Towards Composable Customer Data Platforms
Reverse ETL solutions initially focused on moving enriched data from data warehouses back into operational systems. While this use case was initially underserved, ETL providers quickly developed capabilities to support this reverse flow of data. However, this infrastructure proved to be ideal for a new breed of Composable Customer Data Platforms enabling data ingestion, management, and identity resolution on top of the warehouse. With Composable CDPs, enterprises can track the entire customer journey on top of the already centralized customer data in the warehouse. These tools have seen a convergence with existing CDP solutions, and we expect competition and consolidation to increase in the coming years.
Breaking Out the AI Ecosystem
In last year’s map, we had grouped the data and AI ecosystems together under one map. Over the past year, alongside the development of foundation models, we have witnessed extensive growth of infrastructure products aiming to help companies build and deploy AI models. This growth warranted a new landscape, and we’re excited to release our AI Infrastructure Map alongside the Data Ecosystem Map. We look at the AI ecosystem in four layers: Hardware, Data, Model infrastructure, and Deployment. In this map, we chose to focus on the latter three: Model infrastructure through Deployment, with the exception of the foundation models in order to keep the focus on infrastructure tooling. We see opportunities to build major companies in AI infrastructure, with leaders and some standards already emerging. We’re excited to add this deeper coverage to our map and expect this space to be just as dynamic over the next year.
Final Thoughts
We hope that the 2024 Data Map has given you a new lens on the exciting and evolving data ecosystem. As always, please feel free to email us at [email protected] if you have comments, questions, or would like to talk!