Structuring data teams: Building an effective data organisation
A well-established data organisation is necessary to harness the full potential of data & AI. But what does a successful data organisation look like and what are factors of success?
With the rise of AI, companies are starting to realise the importance of data to unlock AI's full potential. But simply collecting and obtaining data isn't enough. To truly leverage data for competitive advantage, you need a well-structured data team. This post contains a deep-dive into the complexities of building effective data organisations, covering everything from data roles in start-ups to advanced structuring strategies in enterprises.
TL;DR
If you just want a condensed version of this post, here you go:
Data roles and responsibilities: There is a variety of different responsibilities that a data organisation carries within a company. The many different data roles needed to set-up a successful data organisation makes it complex to do well.
Team structure options: A lot of companies start by embedding data roles into existing teams, while others start with a centralised data team. In most cases however companies gravitate towards a hybrid approach with data roles are embedded into product or business teams while dedicated data platform teams offer support in terms of infrastructure and standardisation.
Business alignment is crucial: Close alignment of a data organisation with the business is very important. This largely determines the success of a data organisation. Having a proper data leadership function is important to ensure this alignment happens and then is brought back to the teams for execution.
Team sizing & leadership: My take is, keep teams small and scale thoughtfully. As mentioned before but also when scaling it's important to establish a data leadership function to develop data career paths and a data strategy.
1. The struggles of defining and creating a data organisation
Building a high-performing data team is surprisingly challenging. Here's why:
Ambiguity in definition: What exactly is a “data team”? The term is broad and can encompass various functions, leading to mismatched expectations and organisational friction.
Evolving needs: A company's data needs change dramatically as it grows. A structure that works for a Series A startup will likely be inadequate for a mature enterprise.
Skillset diversity: Data teams require a diverse range of skills, from data engineering and analysis to machine learning (ML) and data governance. Finding and blending these skills effectively is a constant puzzle.
Integration challenges: Data teams often sit at the intersection of technology and business. Integrating them seamlessly with other departments is crucial but can be fraught with challenges.
What responsibilities do data teams have?
Before we dive into structuring, let's define what we mean by “data team” or “data organisation”. A data team or organisation is a group of individuals responsible for acquiring, storing, processing, analysing, and operationalising data to drive better decision-making and business outcomes.
This broad definition encompasses a range of functions, including:
Data engineering:
Building and maintaining the infrastructure and pipelines required to collect, store, and process data.
This is a role that any company that handles sizeable amounts of data will have.
Data analytics:
Analysing data to identify trends, patterns, and insights that inform business strategy and decision-making.
This is a role that typically are among the first to be hired at a company for more operational roles. As data grows these type of roles become more important, but the technical aspects are increasingly being automated by AI.
Data science:
Developing and deploying advanced analytical models, including machine learning, to solve complex business problems.
This role is often only hired if the company deals with complex advanced analytics problems where either advanced analytical methods or ML can help with. Should in most cases be hired after first having sizeable investments into data engineering, data infrastructure and a data analytics function.
ML engineering:
Building, deploying and monitoring ML models in a production setting.
This role becomes critical to hire for when embedding ML into customer facing products. Often this role is the technical (engineering) counterpart of a data science function.
MLOps:
Developing and maintaining the infrastructure and processes to deploy and monitor ML models.
Usually these roles are hired at a later stage when standardisation is necessary and when a ML platform should be built.
AI engineering:
Build and deploy LLM-based applications (similar but different in the LLM element to ML Engineering).
When dealing with problems that can be tackled by GenAI or AI Agents this role becomes crucial. Implementing AI in a proof-of-concept (PoC) manner can be achievable without this specialised role, but soon you will find the need to for example version prompts, optimise accuracy, develop retrieval-augmented generation (RAG) methods, etc.
Data governance:
Ensuring the quality, security, and compliance of data assets.
Typically a role found in larger enterprises and corporates, where standardisation and helping develop a high data literacy is important.
ML / AI research:
Performs research into ML and / or AI topics and comes up with novel methods.
In most cases only found in specialised AI research companies (OpenAI, Anthropic, Mistral, etc) or Big tech (Microsoft, Google, etc).
When does a company need data teams?
Not every company might need a dedicated data organisation. The need for a dedicated data team typically emerges when:
Data volume and complexity increase: Spreadsheets and ad-hoc analyses become insufficient to handle the growing volume and complexity of data.
Data-driven decision-making becomes a priority: Leadership recognises the potential of data to inform strategic decisions and improve operational efficiency.
Business questions become more sophisticated: The need for deeper insights and predictive analytics grows.
Data quality and consistency become critical: Ensuring reliable and trustworthy data becomes essential for accurate decision-making.
Key roles within a data organisation
The composition of a data team depends on the specific needs of the organisation, but some common roles include:
Data engineer: Builds and maintains data pipelines, data warehouses, and data lakes.
Data analyst: Analyses data to identify trends, patterns, and insights, often using tools like SQL, Python, and data visualisation platforms (e.g. Tableau, Looker, PowerBI).
Data scientist: Develops and deploys advanced analytical models, including machine learning algorithms.
ML engineer: Works closely with Data Scientists to scale, deploy and maintain machine learning models.
Analytics engineer: Transforms raw data into analyzable models that are easy for end-users to explore.
Tech lead / Data architect: Designs and oversees the overall technical data landscape of the organisation.
Data governance officer: Develops and enforces data governance policies and procedures.
Head of Data / Chief Data Officer (CDO): Leads the data team and sets the overall data strategy for the organisation.
2. Structuring data teams in various organisations
Now that we have established the basics of what makes up a typical data organisation, we'll dive into how data teams are organised in various company sizes. The size of the company usually has a large role in how data teams are structured.
Option 1: Embedded data roles (De-centralised)
In the early stages of a company, resources are typically scarce, and the focus is on building and launching a product quickly. A common approach is to embed data roles within various parts of a company, e.g. product engineering teams, marketing teams, operations teams, etc.
Here's why this makes sense:
Close collaboration: Data analysts and engineers work closely with product managers and engineers, enabling faster iteration and more data-informed product development.
Agility: Embedded data roles can quickly respond to changing product needs and provide timely insights.
Cost-effectiveness: Avoids the overhead of creating a separate data team in the early days.
Product focus: Keeps all team members focused on the core company product/offering
However, this approach has its limitations:
Siloed knowledge: Data insights may not be shared effectively across different product teams.
Inconsistent data practices: Different product teams may adopt different data standards and tools, leading to inconsistencies and inefficiencies.
Limited career growth: Data professionals may feel isolated and lack opportunities for career development. It's for example rare that an individual from a data role will take up an engineering leadership role as usually they are a minority.
Risk of neglect: Data-related tasks can be deprioritised in favour of more immediate product development needs.
Option 2: Dedicated data teams (Centralised)
In a centralised model, data professionals (analysts, engineers, scientists) reside within a single, or multiple, dedicated data team(s). These teams acts as a service provider for the entire organisation, handling data-related requests and projects from various departments. Companies of all sizes opt for this model, and has been a common way of how data organisations work.
Here's what makes this approach attractive:
Standardisation and governance: Centralisation promotes consistent data standards, tools, and processes across the organisation, improving data quality and reliability.
Knowledge sharing and collaboration: Data professionals can learn from each other, share best practices, and leverage collective expertise to solve complex problems.
Career development: A dedicated data team provides a clear career path for data professionals, fostering growth and specialisation.
Economies of scale: Shared infrastructure, tooling, and data platforms can reduce costs and improve efficiency.
Strategic alignment: A central team can help the organisation develop a comprehensive data strategy aligned with overall business goals.
However, centralised teams also have potential drawbacks:
Bottlenecks: A single team can become overwhelmed with requests, leading to delays and frustration for other departments.
Lack of context: Data professionals may lack deep understanding of specific business domains, potentially leading to less relevant or impactful insights.
Slower iteration: The distance between data analysis and product development can slow down the feedback loop and hinder agility.
Perceived as a cost centre: Departments may view the data team as a service provider, rather than a strategic partner, leading to under-utilisation and misalignment.
Option 3: Platform and Product data teams (Hybrid)
A hybrid model attempts to combine the benefits of both embedded and centralised approaches. This usually involves platform teams, and product teams.
Here's how it typically works:
Platform teams: Central platform teams focuses on building and maintaining the core infrastructure, tools, and services used by the entire organisation. This team handles tasks like maintaining infrastructure, setting security standards and manage access control. They are sometimes referred to as data platform teams or ML platform teams.
Product teams: Smaller, embedded data teams (analysts, data scientists, ML/AI engineers) are aligned with specific product areas or business units. These teams focus on providing data insights and support directly to their respective teams.
The advantages of this approach include:
Scalability: The platform team enables the rest of the organisation to move faster due to better tooling and automation.
Balance: The platform team delivers leverage, while the product teams deliver insights and product value.
Efficiency: Centralised infrastructure streamlines data management and reduces duplication of effort, while embedded teams enable faster iteration within product areas.
Specialisation: Platform teams can focus on technical expertise, while product teams can develop deep domain knowledge.
Improved Communication: Closer collaboration between embedded data teams and product teams leads to better communication and alignment.
However, hybrid models also present challenges:
Overhead: Requires strong communication and coordination between the platform team and the embedded teams to ensure alignment and avoid conflicts.
Potential for silos: If not managed carefully, embedded teams can still develop siloed data practices.
Complexity: Requires a more sophisticated organisational structure and clear roles and responsibilities.
Defining boundaries: It can be challenging to define clear boundaries between the responsibilities of the platform team and the embedded teams (e.g. who owns data quality?).
Ultimately, the best data team structure depends on the specific needs and context of the organisation. Factors to consider include the size of the company, the complexity of its products and services, the maturity of its data infrastructure, and its overall culture. There is no one size fits all approach, but most companies will eventually gravitate towards the last option (hybrid) as it often combines the best of the two first options.
3. Setting up your data organisation for success
Having laid the groundwork on the roles, responsibilities and team structure of a data organisation, let's discuss a few critical factors of what determines the success of a data organisation.
Importance of proximity to the business
Regardless of the organisational structure, data teams must be closely aligned with the business. This means:
Understanding business objectives: It's crucial that data teams understand the strategic goals and priorities of the wider organisation.
Collaborating with business stakeholders: Data teams should work closely with business stakeholders to define data needs, identify opportunities, and translate insights into action.
Communicating effectively: Data teams need to be able to communicate their findings clearly and concisely to business audiences.
Iterating based on business needs: The data team's roadmap must be able to adopt to the business and business needs.
Here's a few examples on how to achieve the above
Data leadership: For the maturity of a data organisation it's important to have a data leadership function that is involved in the setting of business objectives and continuously collaborates with stakeholders to solve problems that are close to the business.
Educating stakeholders: Having a strong learning culture where stakeholders can for example perform self-service analytics and where data literacy is spread across an organisation can help with “data” feeling closer to the business.
Sizing teams: What size should teams be?
As a company grows and whichever team structure is picked this question will pop up: How large should my teams be? Here's my take:
Early stages/smaller organisations: When starting with an embedded model, teams are typically small and focused. Often, a single data analyst or engineer might support one or two product teams. The goal is rapid iteration and tight integration with the product development process.
Scaling organisations: As the company grows, maintaining consistently small, teams can become challenging. You might see some expansion of data roles in teams. But in general when teams get larger than 6-8 people (in total), it might be time to look at either scaling the amount of teams or setting up dedicated (platform) teams to help scale data across an entire organisation. In this situation it's also important to start thinking about data career paths and a data leadership function.
Larger enterprise companies: The same principles as the above holds for large companies In many large companies there will likely be a mix of embedded data roles and dedicated data (platform) teams. It's important to establish a framework around with expectations for each data role and the roles and responsibilities of the various teams. This can for example be maximum team sizes, maximum number of reports per manager, etc. A well established data leadership function can help setup a framework to ensure a data strategy and data culture is built.
4. Conclusion
In conclusion, the journey of defining and establishing a data organisation is a complex and has many different facets. While the specific structure and composition will inevitably vary depending on a company's size, data maturity, and strategic objectives, certain fundamental principles remain.
The ambiguity in the term "data team" makes it important to establish a clear and well-defined understanding of its roles, responsibilities and it should fit into a company. As a company evolves, its data needs will also shift, demanding a flexible and adaptable organisational structure. The integration of diverse skillsets, from data science to AI engineering and MLOps, poses a continuous challenge that requires careful consideration and planning.
Ultimately, the success of any data organisation hinges on its ability to effectively bridge the gap between technology and business. Whether employing an embedded, centralised, or hybrid model, it is crucial to ensure that data teams are closely aligned with business objectives, collaborate effectively with stakeholders, and communicate insights in a clear and actionable manner. By prioritising proximity to the business, fostering a culture of data literacy, and carefully managing team sizes, organisations can more easily unlock the full potential of their data assets and drive better decision-making. Building a successful data organisation is not just about technology; it's about people, processes, structure and alignment.