Cracking the Python Monorepo

A monorepo is a single repository that contains multiple projects. It is a popular way to organize codebases with many coupled components, and is also used at very big companies like Google, Facebook, and Twitter.

For a long time, I did not understand the benefits of monorepos. I thought they were used because people could not figure out how to split their codebases into smaller parts.

After working at Dagster (on a pretty huge monorepo with more than 140k lines of code and 70+ subprojects), I realized that monorepos can provide quite a pleasant development experience when done right — with the right tooling, practices, and, of course, the right use case. Monorepos solve a very specific problem: local dependencies between projects force them to be updated together, which eliminates certain types of technical debt (e.g. ensures all current projects are always compatible with each other). I also anjoyed using all this tooling which somebody else has built for me (but I imagine it’s not as fun to build it).

It’s worth noting that monorepos are not a silver bullet. They have their own set of challenges, mostly the need to build custom tooling and to organize the development workflow. The big tech companies have the resources to build and maintain these tools, but for smaller companies, it can be quite challenging.

Dagster’s monorepo wasn’t perfect either. Some of the drawbacks were:

  • Slow CI/CD pipelines: builds could run for hours!
  • Legacy Python packaging made maintaining dependencies and CI pipelines quite complicated. Making a change to dependencies required editing multiple configuration files at a few locations with great care.

These problems were mostly due to technical debt

This post focuses on a very specific use case — uv Python monorepos. Until very recently, Python monorepos were quite hard to set up and maintain, with problems like the ones I mentioned above being quite common.

However, nowadays we have a bunch of excellent tooling available with great out-of-the-box monorepo support.

WARNING

uv shouldn’t need any introduction. In 2024, uv took the Python ecosystem by storm, and it’s now the go-to tool for Python development. Using anything else (except perhaps Pixi which builds on top of uv and extends it to handle Conda packages) doesn’t make much sense anymore. If you haven’t heard about uv, that’s a sure sign you’ve been living in a cave—or otherwise off the grid—for the past year.

The dream of the monorepo

In this post, I am going to share an approach to building Python monorepos that solves these issues in a very elegant way. The benefits of this approach are:

  • it works with any uv project (even yours!)
  • it needs little to zero maintenance and boilerplate
  • it provides end-to-end pipeline caching — including steps downstream to building the image (like running linters and tests), which is quite rare
  • it’s easy to run locally and in CI

We will walk through a typical monorepo setup and use uv and Dagger to build lightning-fast and modular build pipelines. Feel free to check out the docs for these tools (uv quickstart, Dagger quickstart). Impatient readers can jump straight to the Dagger module.

Python packaging: 🐍 📦 😱

Please look at the emojis above until you get it. Yes, managing packaging in a Python monorepo can be a nightmare. But it doesn’t have to be!

And it’s really not with the right tooling.

  • uv has the concept of workspaces which allows installing individual packages from a monorepo and makes managing dependencies a breeze. It standardizes dependency management and maintenance operations in monorepos, including operations with local dependencies.
  • Dagger — a universal build tool which supports multiple languages (including Python) to define containerized build pipelines. Because Dagger pipelines can be written in Python, they can be easily adapted to work with monorepos of arbitrary complexity and structure. Dagger is essentially a glorified Dockerfile generator available in your favorite programming language. Dagger pipelines are huge graphs of BuildKit (the engine used by Docker when building images) steps, so the entire pipeline can be optimized, parallelized, and cached by BuildKit.
  • modern QA tooling: ruff and pyright have first-class support for monorepos and are able to automatically discover and merge configuration files from multiple subdirectories.

A brief history of Python packaging

Contrary to what you might think, Python packaging is not a nightmare anymore. It used to be, but with the introduction of PEP 517 and PEP 518, and the rise of uv, it’s actually in pretty good shape — I rarely have to pull out my hair when working with Python packaging nowadays.

The PEPs and their adoption were important to standardize the way Python packages are built and distributed. Because the overwhelming majority of Python packages now provide correct distribution metadata (like hashes of the package contents), it’s much easier for advanced and optimized package managers like uv to do their job really well. Some machine learning dependencies — and specifically PyTorch — used to sabotage the Python packaging ecosystem, but even PyTorch now (mostly) provides the hashes with the various wheels they build for all these CUDA versions.

Therefore, I’d like to note that the improvements with packaging are not only due to better tooling like uv, but also due to the community’s effort to standardize and improve the Python packaging ecosystem in general.

Setting up the monorepo

Fair enough! Let’s start by invoking uv (assuming the dear reader has — and he should — uv installed) and creating a new workspace:

mkdir uv-dagger-dream
cd uv-dagger-dream
uv init
uv add --group dev ruff pyright
mkdir projects
uv init --package --lib projects/lib-one
uv init --package --lib projects/lib-two
uv lock

Phew! That was a lot of commands. Let’s break them down:

  • uv init initializes a project. A workspace is a directory that contains one or more packages. It’s a way to group packages together and manage their dependencies, while allowing cross-package dependencies.
  • uv add --group dev adds dependencies to the root project. Development groups are only used when working on the project and are not registered as dependencies when the project is published. Downstream projects will not inherit these dependencies.
  • uv init --package --lib projects/lib-one initializes a new package in the projects directory. The --lib flag tells uv that this package is a library. This is important because it will add a [build-system] section to the pyproject.toml file, which is required for uv to know how to build the package.
  • uv lock creates a uv.lock file that contains the resolved dependencies for the workspace. This file is used by uv to determine which versions of the dependencies to install.

After this section, you should see something like this (non-essential files are omitted):

.
├── README.md
├── projects
│   ├── lib-one
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── lib_one
│   │           └── __init__.py
│   └── lib-two
│       ├── pyproject.toml
│       └── src
│           └── lib_two
│               └── __init__.py
├── pyproject.toml
└── uv.lock

Projects recognized by uv as workspace members share the same uv.lock file, environment, can be added as dependencies to each other, and can be managed with uv commands.

TIP

I like to edit the root pyproject.toml and set workspace.members to ['projects/*'] so that all the packages in the projects directory are recognized as workspace members.

To demonstrate how one project can be added as a dependency to another, let’s add lib-one as a dependency to lib-two:

uv add --package lib-two lib-one

The --package flag tells uv to execute the command in the context of the lib-two package. The lib-one package is added as a dependency to lib-two’s pyproject.toml and the root uv.lock file is updated automatically.

You can see it now? There is more than one package in our repository, therefore, it’s a monorepo! 🎉

Cache me if you can

Who knows how to write a Dockerfile? I’m sure you do. Do you know how to write it efficiently? Are you sure? In any case, we are about to unleash the combined power of Dagger (backed by BuildKit), and uv, to build our monorepo in a very (very) efficient way.

We don’t actually need a Dockerfile. We could have used Dagger to define the entire build process in Python. However, since it might complicate building the project with plain docker or other tools, we are still going to define most of the build with a traditional Dockerfile. But:

  1. We will define an intermediate stage almost identical to the final stage.
  2. We will then call Dagger to build the project in the intermediate stage. We will then complete the final stage programmatically with Dagger.

This approach allows building a very similar image locally just with docker build . if needed.

For simplicity, all the subprojects will share the same Dockerfile. Behold!

Click to reveal the Dockerfile

# options: prod,dev
ARG INCLUDE_DEPENDENCIES=dev
ARG PYTHON_VERSION=3.12.8

FROM python:${PYTHON_VERSION}-slim AS base

ENV DEBIAN_FRONTEND=noninteractive
# some apt packages usually needed in Python projects
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    apt-get update && apt-get install -y git curl gcc libpq-dev

# install uv (https://github.com/astral-sh/uv)
# docs for using uv with Docker: https://docs.astral.sh/uv/guides/integration/docker/
COPY --from=ghcr.io/astral-sh/uv:0.5.27 /uv /bin/uv

# UV_PROJECT_ENVIRONMENT configures the environment for the uv project interface
# UV_PYTHON configures the python executable for the uv pip interface
ENV UV_PROJECT_ENVIRONMENT=/usr/local/ \
    UV_PYTHON=/usr/local/bin/python \
    UV_COMPILE_BYTECODE=1 \
    UV_LINK_MODE=copy \
    UV_FROZEN=1

FROM base AS deps-prod

WORKDIR /src

COPY pyproject.toml uv.lock  ./

ARG PACKAGE
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --no-install-workspace --all-extras --no-dev --package $PACKAGE

FROM deps-prod AS deps-dev

ARG PACKAGE
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --no-install-workspace --only-group dev --inexact && \
    uv sync --no-install-workspace --all-extras --inexact --package $PACKAGE

# -------------------------------------------------------------
FROM deps-${INCLUDE_DEPENDENCIES} AS final

ARG PACKAGE

# Copy all the rest of the code
COPY . .

# finally install our code
RUN uv sync --all-extras --inexact --package $PACKAGE
NOTE

The resulting image will contain a subset of the monorepo dependencies needed for the specific project due to the --package flag.

That’s a lot of Docker magic! Let’s break it down:

  • The deps-prod stage installs only runtime dependencies. This is useful for building a more lightweight image for deployment.
  • The deps-dev stage installs development dependencies. This is useful for building an image for QA checks or running tests.
  • The final stage installs the package itself. Only at this point the source code is copied into the image. The last uv sync invocation doesn’t install any third-party dependencies, only the dependencies from our monorepo (uv workspace). Noticed the --no-install-workspace flag spammed all over the place? It’s quite important as it configures uv sync to ignore the missing source code and install only the dependencies.
INFO

The --mount=type=cache,target=/root/.cache/uv flag tells Docker to mount the cache directory to the build container. This way, the cache is persisted between builds and doesn’t inflate the image itself.

What a great Dockerfile! It’s so efficient that it’s almost a crime. Or is it not? Can you spot the problem?

Let’s have a hypothetical conversation between you (the dear reader) and a Docker guru:

– You: Hey, I know! Just look at that filthy COPY . . before the final uv sync! It will invalidate the cache every time any file in the monorepo changes!

– Docker guru: You are right! But how can we fix it?

– You: Oh, I’m very smart. Let’s only copy our package’s source code into the image and then uv sync it. Check this out!

# -------------------------------------------------------------
FROM deps-${INCLUDE_DEPENDENCIES} AS final

ARG PACKAGE
ARG PACKAGE_NAME
COPY projects/$PACKAGE ./projects/$PACKAGE/
# at this point all the third-party dependencies are already installed
# so the step below is very fast anyway
RUN uv sync --all-extras --inexact --package $PACKAGE

– Docker guru: Wow! You are a genius! But what if our package depends on some other package in the monorepo?

– You: Oh, that’s easy! We can just add another COPY instruction for the dependency before running uv sync.

– Docker guru: Oh and what if we have multiple packages that depend on each other? Are you going to write a new COPY instruction for each of them? Are you going to maintain a separate Dockerfile for each set of dependencies? What if they are scattered around the repo instead of being carefully placed in projects/ and hard to track? By the way, COPY . . is cursed in another way. It will always invalidate the final image cache and trigger potentially expensive downstream steps in your CI pipeline like running tests. A pipeline like this is doomed to be slow.

With plain Docker, we are left with two options:

  1. Carefully (probably manually) track all inter-package dependencies and maintain COPY instructions in multiple Dockerfiles.
  2. Slap a COPY . . and accept the fact that the final image will always be rebuilt from scratch.

Pick your poison.


Remember our goal: to avoid unnecessary rebuilds of the final image and granularly include only the source code of the packages that are actually needed. What if we could programmatically define the Dockerfile? What if we could define the build process in Python? What if there is already a place in our project where the local dependencies graph is defined precisely?

Think about it for a moment. I will give you a hint: it’s I’m sorry this wasn’t a hint but a direct answer, but let’s move on.

A thousand daggers

Let’s look into the uv.lock file for a moment. It’s a TOML file that describes the entire dependency tree of our monorepo. At the very top, you will find:

[manifest]
members = [
    "lib-one",
    "lib-two",
    "uv-dagger-dream",
]

[[package]]
name = "lib-one"
version = "0.1.0"
source = { editable = "projects/lib-one" }

[[package]]
name = "lib-two"
version = "0.1.0"
source = { editable = "projects/lib-two" }
dependencies = [
    { name = "lib-one" },
]

🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉

At the beginning of the uv.lock file is the members array. It contains a list of all the workspace members (our local packages), including the root member uv-dagger-dream.

Next comes the package array. Each element in the package array describes a package (local or third-party) in the monorepo. Notice the source key in the package table. It points to the source code of the package. And we can use it to identify the local dependencies of a given package.

The plan:

  1. build the original Dockerfile up to the deps-dev stage. At this point, we have all the production dependencies installed, but the source code is not copied and installed yet.
  2. extract the information about the local dependencies from the uv.lock file (with Python)
  3. use it to copy the source code of each local dependency into the image
  4. then we can run uv sync to install the local dependencies (including our project) in editable mode

Docker can’t cover steps 2, and 3. But Dagger can! Let’s write a Dagger function to do this.

NOTE

The words container and image are used interchangeably in this post. Technically, a container is a running instance of an image, but Dagger defines the Container type, so I will use the word container to refer to images most of the time.

We will start by creating a new Dagger module sitting in a separate package in our monorepo. This way we keep it independent and reusable.

mkdir .dagger
cd .dagger
dagger init --sdk python --name monorepo-dagger .dagger

This command will create a new Dagger module in the .dagger directory.

A work-around for Dagger not supporting uv workspaces

The Dagger package we just added is not configured to be a workspace member because of this current limitation of Dagger (this may already be fixed at the time of reading). It also relies on an ephemeral local sdk package which is excluded from version control and may not be always available for uv.

Here are the steps for a workaround at the time of writing:

  1. Remove the following from .dagger/pyproject.toml:
[tool.uv.sources]
dagger-io = { path = "sdk", editable = true }

and run uv add tomli (we will use it to read uv.lock) inside the .dagger directory.

This will ensure that the Dagger module is not dependent on the local sdk package and will fetch dagger-io from PyPI instead.

  1. Add the Dagger module as a development dependency to the root project:
uv add --group dagger .dagger
uv sync --all-groups

This will enable type checking and linting for our Dagger module.

  1. Move dagger.json to the repo root, and add
"source": ".dagger"

to it.

  1. Finally, add
[project.entry-points."dagger.mod"]
main_object = "monorepo_dagger:MonorepoDagger"

to the root pyproject.toml.


Now we can run the dagger call command from the repo root.

Building the build pipeline

We will go backwards because it’s somewhat easier to understand in my opinion. Feel free to consult the Dagger Python SDK documentation.

First, we will do a bunch of imports and define some useful types:

from typing import (
    Annotated,
    TypeAlias,
)

import dagger
import tomli
from dagger import (
    BuildArg,
    Container,
    DefaultPath,
    File,
    Ignore,
    dag,
    function,
    object_type,
)

IGNORE = Ignore(
    [
        ".env",
        ".git",
        "**/.venv",
        "**__pycache__**",
        ".dagger/sdk",
        "**/.pytest_cache",
        "**/.ruff_cache",
    ]
)

# this represents the repo root
RootDir: TypeAlias = Annotated[
    dagger.Directory,
    DefaultPath("."),
    IGNORE,
]

# this represents the source directory of a specific project in the monorepo
SourceDir: TypeAlias = Annotated[
    dagger.Directory,
    IGNORE,
]

And here is the Dagger entry point — MonorepoDagger, also called a Dagger Module. The method build_project defines a Dagger Function which will be available as

dagger call build-project

from the command line. The build_project function will build the Docker image for a given project and will only contain the dependencies and source code required for that project. This function will call other high-level methods of the class to achieve this.

@object_type
class MonorepoDagger:
    @function
    async def build_project(
        self,
        root_dir: RootDir,
        project: str,
        debug_sleep: float = 0.0,
    ) -> Container:
        """Build a container containing only the source code for a given project and it's dependencies."""
        # we start by creating a container including only third-party dependencies
        # with no source code (except pyproject.toml and uv.lock from the repo root)
        container = self.container_with_third_party_dependencies(
            pyproject_toml=root_dir.file("pyproject.toml"),
            uv_lock=root_dir.file("uv.lock"),
            dockerfile=root_dir.file("Dockerfile"),  # this could be a parameter
            project=project,
        )

        # find the source code locations for the dependencies of a given project
        project_sources_map = await self.get_project_sources_map(
            root_dir.file("uv.lock"), project
        )

        container = self.copy_source_code(container, root_dir, project_sources_map)

        container = container.with_exec(["sleep", str(debug_sleep)])

        # we run `uv sync` to create editable installs of the local dependencies
        # pointing (for now) to the dummy directories we created in the previous step
        container = self.install_local_dependencies(container, project)

        # change the working directory to the project's source directory
        # so that commands in CI are automatically run in the context of this project
        container = container.with_workdir(f"/src/{project_sources_map[project]}")

        return container

The debug_sleep argument will be useful later.


Let’s implement the container_with_third_party_dependencies method first. That’s easy, we just need to use the existing Dockerfile and specify the deps-dev target stage. Note how we don’t need any files except pyproject.toml and uv.lock to build the Docker image for a given project. This is possible thanks to uv workspaces.

    def container_with_third_party_dependencies(
        self,
        pyproject_toml: File,
        uv_lock: File,
        dockerfile: File,
        project: str,
    ) -> Container:
        # create an empty directory to make sure only the pyproject.toml
        # and uv.lock files are copied to the build context (to affect caching)
        build_context = (
            dag.directory()
            .with_file(
                "pyproject.toml",
                pyproject_toml,
            )
            .with_file(
                "uv.lock",
                uv_lock,
            )
            .with_file(
                "/Dockerfile",
                dockerfile,
            )
            .with_new_file("README.md", "Dummy README.md")
        )

        return build_context.docker_build(
            target="deps-dev",
            dockerfile="/Dockerfile",
            build_args=[BuildArg(name="PACKAGE", value=project)],
        )
NOTE

We also create a dummy README.md file because Hatch — the default build system in uv projects — requires it to be present.


The project_sources_map dictionary is the precious information we need to enable granular copying of the source code. Here is the implementation of the get_project_sources_map method which retrieves it:

    async def get_project_sources_map(
        self,
        uv_lock: File,
        project: str,
    ) -> dict[str, str]:
        """Returns a dictionary of the local dependencies' (of a given project) source directories."""
        uv_lock_dict = tomli.loads(await uv_lock.contents())

        members = set(uv_lock_dict["manifest"]["members"])

        local_projects = {project}

        # first, find the dependencies of our project
        for package in uv_lock_dict["package"]:
            if package["name"] == project:
                dependencies = package.get("dependencies", [])
                for dep in dependencies:
                    if isinstance(dep, dict) and dep.get("name") in members:
                        local_projects.add(dep["name"])

        # now, gather all the directories with the dependency sources

        project_sources_map = {}

        for package in uv_lock_dict["package"]:
            if package["name"] in local_projects:
                project_sources_map[package["name"]] = package["source"]["editable"]

        return project_sources_map

This function will parse the uv.lock file and return a dictionary where the keys are the project names and the values are the paths to the source code.

INFO

Most of the Dagger operations are lazy. The operations which trigger materializations are async and therefore must be explicitly awaited. This is why we use await to fetch the uv.lock file contents. It’s a very elegant way to express blocking operations, because once part of the code becomes async (blocking), all the code that calls it must also be async (blocking). Smart!

NOTE

For extra cache efficiency this can be replaced by creating empty directories and files and delaying the source code copying to after the last uv sync command, but we will keep it simple for the sake of this blog post. Also, the current approach is already good enough.


Our source code is still not copied into the image. Let’s implement the copy_source_code method which will granularly copy the source code of a given project and its dependencies into the image. This is why we are here!

    def copy_source_code(
        self,
        container: Container,
        root_dir: RootDir,
        project_sources_map: dict[str, str],
    ) -> Container:
        for project, project_source_path in project_sources_map.items():
            container = container.with_directory(
                f"/src/{project_source_path}",
                root_dir.directory(project_source_path),
            )

        return container

Now the only thing left is to install the local dependencies in editable mode:

    def install_local_dependencies(
        self, container: Container, project: str
    ) -> Container:
        # the following uv command installs the project
        # and its dependencies in editable mode
        container = container.with_exec(
            [
                "uv",
                "sync",
                "--inexact",
                "--package",
                project,
            ]
        )

        return container

All together:

Click to reveal the full Dagger module
1from typing import (
2 Annotated,
3 TypeAlias,
4)
5
6import dagger
7import tomli
8from dagger import (
9 BuildArg,
10 Container,
11 DefaultPath,
12 File,
13 Ignore,
14 dag,
15 function,
16 object_type,
17)
18
19IGNORE = Ignore(
20 [
21 ".env",
22 ".git",
23 "**/.venv",
24 "**__pycache__**",
25 ".dagger/sdk",
26 "**/.pytest_cache",
27 "**/.ruff_cache",
28 ]
29)
30
31# this represents the repo root
32RootDir: TypeAlias = Annotated[
33 dagger.Directory,
34 DefaultPath("."),
35 IGNORE,
36]
37
38# this represents the source directory of a specific project in the monorepo
39SourceDir: TypeAlias = Annotated[
40 dagger.Directory,
41 IGNORE,
42]
43
44
45@object_type
46class MonorepoDagger:
47 @function
48 async def build_project(
49 self,
50 root_dir: RootDir,
51 project: str,
52 debug_sleep: float = 0.0,
53 ) -> Container:
54 """Build a container containing only the source code for a given project and it's dependencies."""
55 # we start by creating a container including only third-party dependencies
56 # with no source code (except pyproject.toml and uv.lock from the repo root)
57 container = self.container_with_third_party_dependencies(
58 pyproject_toml=root_dir.file("pyproject.toml"),
59 uv_lock=root_dir.file("uv.lock"),
60 dockerfile=root_dir.file("Dockerfile"), # this could be a parameter
61 project=project,
62 )
63
64 # find the source code locations for the dependencies of a given project
65 project_sources_map = await self.get_project_sources_map(
66 root_dir.file("uv.lock"), project
67 )
68
69 container = self.copy_source_code(container, root_dir, project_sources_map)
70
71 container = container.with_exec(["sleep", str(debug_sleep)])
72
73 # we run `uv sync` to create editable installs of the local dependencies
74 # pointing (for now) to the dummy directories we created in the previous step
75 container = self.install_local_dependencies(container, project)
76
77 # change the working directory to the project's source directory
78 # so that commands in CI are automatically run in the context of this project
79 container = container.with_workdir(f"/src/{project_sources_map[project]}")
80
81 return container
82
83 def container_with_third_party_dependencies(
84 self,
85 pyproject_toml: File,
86 uv_lock: File,
87 dockerfile: File,
88 project: str,
89 ) -> Container:
90 # create an empty directory to make sure only the pyproject.toml
91 # and uv.lock files are copied to the build context (to affect caching)
92 build_context = (
93 dag.directory()
94 .with_file(
95 "pyproject.toml",
96 pyproject_toml,
97 )
98 .with_file(
99 "uv.lock",
100 uv_lock,
101 )
102 .with_file(
103 "/Dockerfile",
104 dockerfile,
105 )
106 .with_new_file("README.md", "Dummy README.md")
107 )
108
109 return build_context.docker_build(
110 target="deps-dev",
111 dockerfile="/Dockerfile",
112 build_args=[BuildArg(name="PACKAGE", value=project)],
113 )
114
115 async def get_project_sources_map(
116 self,
117 uv_lock: File,
118 project: str,
119 ) -> dict[str, str]:
120 """Returns a dictionary of the local dependencies' (of a given project) source directories."""
121 uv_lock_dict = tomli.loads(await uv_lock.contents())
122
123 members = set(uv_lock_dict["manifest"]["members"])
124
125 local_projects = {project}
126
127 # first, find the dependencies of our project
128 for package in uv_lock_dict["package"]:
129 if package["name"] == project:
130 dependencies = package.get("dependencies", [])
131 for dep in dependencies:
132 if isinstance(dep, dict) and dep.get("name") in members:
133 local_projects.add(dep["name"])
134
135 # now, gather all the directories with the dependency sources
136
137 project_sources_map = {}
138
139 for package in uv_lock_dict["package"]:
140 if package["name"] in local_projects:
141 project_sources_map[package["name"]] = package["source"]["editable"]
142
143 return project_sources_map
144
145 def copy_source_code(
146 self,
147 container: Container,
148 root_dir: RootDir,
149 project_sources_map: dict[str, str],
150 ) -> Container:
151 for project, project_source_path in project_sources_map.items():
152 container = container.with_directory(
153 f"/src/{project_source_path}",
154 root_dir.directory(project_source_path),
155 )
156
157 return container
158
159 def install_local_dependencies(
160 self, container: Container, project: str
161 ) -> Container:
162 # the following uv command installs the project
163 # and its dependencies in editable mode
164 container = container.with_exec(
165 [
166 "uv",
167 "sync",
168 "--inexact",
169 "--package",
170 project,
171 ]
172 )
173
174 return container

The beauty of this approach is that we can now take full advantage of:

  • uv’s project discovery
  • the fact that uv workspace configuration is standardized and well-defined
  • and always kept in sync with the actual project structure when running uv commands

We can build the Docker image for any project in the monorepo with a single command:

dagger call build-project --root-dir . --project lib-one

We can add a project to an arbitrary location in the monorepo, add other projects as dependencies, and build the new project without changing anything:

uv init --package --lib weird-location/nested/lib-three
uv add --package lib-three lib-one lib-two
dagger call build-project --root-dir . --project lib-three

Running dagger call build-project --root-dir . --project lib-three just works despite the weird location of the lib-three project and zero build pipeline changes!

Proof
dagger call build-project --root-dir . --project lib-three
✔ connect 0.2s
✔ load module 5.4s
✔ parsing command line arguments 2.0s

✔ monorepoDagger: MonorepoDagger! 2.1s
✔ .buildProject(
│ │ debugSleep: 0.000000
│ │ project: "lib-three"
│ │ rootDir: no(digest: "sha256:7112225e5254a6bc947b4ce9318d5ed7e8e5a713df2bb1acefa52bbd739077ce"): Missing
│ ): Container! 8.2s
✔ .defaultArgs: [String!]! 0.0s

✔ Container.mounts: [String!]! 0.0s

✔ Container.entrypoint: [String!]! 0.0s

✔ Container.platform: Platform! 0.0s

✔ Container.user: String! 0.0s

✔ Container.workdir: String! 0.0s

_type: Container
defaultArgs:
    - python3
entrypoint: []
mounts: []
platform: linux/amd64
user: ""
workdir: /src

Now let’s confirm caching works as expected. lib-one doesn’t depend on lib-two, so modifying files in lib-two should not invalidate the cache for lib-one. Because Dagger doesn’t always log the intermediate hash digests, we will use the --debug-sleep flag to check whether the build stage is skipped.

The first build:

dagger call build-project --root-dir . --project lib-one --debug-sleep=5
✔ .buildProject(
│ │ debugSleep: 5.000000
│ │ project: "lib-one"
│ │ rootDir: no(digest: "sha256:e52f8c20e2809532808a5be5d1b0313aa8d18d10766fc902b7a22b0358973109"): Missing
│ ): Container! 9.1s

Now let’s change something in lib-two and rebuild lib-one:

touch projects/lib-two/src/lib_two/new_file.py
dagger call build-project --root-dir . --project lib-one --debug-sleep=5
✔ .buildProject(
│ │ debugSleep: 5.000000
│ │ project: "lib-one"
│ │ rootDir: no(digest: "sha256:d1b1986db760ada8081fc3b9ff584ce0c55c006adda7cb324b5f68774bc976e6"): Missing
│ ): Container! 2.6s

Hooray! The build only took 2.6s now — the cache has not been invalidated and the build stage has been skipped!

INFO

The build is not fully cached because the --root-dir argument points at the entire repo (which did change). But it doesn’t matter because the final image is cached and the build stage is skipped.

Growing the pipeline

Now that we have a Dagger Function which builds a container for a given project, we can easily create downstream steps in our CI pipeline. For example, this is how we can run tests for a project after building the container:

    @function
    async def pytest(self, root_dir: RootDir, project: str) -> str:
        """Run pytest for a given project."""
        container = await self.build_project(root_dir, project)
        return await container.with_exec(["pytest"]).stdout()

Running the tests becomes:

dagger call pytest --root-dir . --project lib-one

Note how we can do it in one function call. Any upstream steps (like building the image) are automatically executed before the tests and are very likely to be cached.

Another one with pyright:

    @function
    async def pyright(self, root_dir: RootDir, project: str) -> str:
        """Run pyright for a given project."""
        container = await self.build_project(root_dir, project)
        return await container.with_exec(["pyright"]).stdout()

Now we can just call these Dagger functions locally or in our CI/CD system (typically one CI/CD step corresponds to one dagger call) — they will work exactly the same! Dagger Cloud can also be used to execute builds remotely (and the entire team can benefit from the shared cache). It’s also worth mentioning their integration with Depot — provider for accelerated builds and caching, which requires zero configuration and can speed up builds even more.

Conclusion

When combined together, uv and Dagger provide powerful features that dramatically simplify build processes in Python monorepos, while maintaining flexibility and providing enormous performance gains.

The pipeline we built is a good starting point for further customization and optimization. You can add more steps to the pipeline, such as linting, code formatting, and deployment steps, and add configuration options to create a comprehensive build process that meets your specific requirements. It’s very easy because it’s just Python code. You could even generalize this approach to work with multiple uv workspaces (so multiple uv.lock files) in a single monorepo.

I encourage you to explore the documentation for these tools to fully understand their capabilities and how they can be tailored to your specific needs.

References and Acknowledgements


Thank you @nordmtr for the feedback and suggestions!