Caching Tasks
Every JavaScript or TypeScript codebase will need to run package.json
scripts, like build
, test
and lint
. In Turborepo, we call these tasks.
Turborepo can cache the results and logs of your tasks - leading to enormous speedups for slow tasks.
Missing the cache
Each task in your codebase has inputs and outputs.
- A
build
task might have source files as inputs and outputs logs tostderr
andstdout
as well as bundled files. - A
lint
ortest
task might have source files as inputs and outputs logs tostdout
andstderr
.
Let's say you run a build
task with Turborepo using turbo run build
:
-
Turborepo will evaluate the inputs to your task and turn them into a hash (e.g.
78awdk123
). -
Check the local filesystem cache for a matching cache artifact (e.g.
./node_modules/.cache/turbo/78awdk123.tar.zst
). -
If Turborepo doesn't find any matching artifacts for the calculated hash, Turborepo will then execute the task.
-
Once the task is succesfully completed, Turborepo saves all specified outputs (including files and logs) into a new cache artifact, addressed by the hash.
Turborepo takes a lot of information into account when creating the hash: the dependency graph, tasks it depends on, source files, environment variables, and more!
Hitting the cache
Let's say that you run the task again without changing any of its inputs:
-
The hash will be the same because the inputs haven't changed (e.g.
78awdk123
) -
Turborepo will find the cache artifact with a matching hash (e.g.
./node_modules/.cache/turbo/78awdk123.tar.zst
) -
Instead of running the task, Turborepo will replay the output - printing the saved logs to
stdout
and restoring the saved output files to their respective position in the filesystem.
Restoring files and logs from the cache happens near-instantaneously. This can reduce your build times from minutes or hours down to seconds or milliseconds. Although specific results will vary depending on the shape and granularity of your codebase's dependency graph, most teams find that they can reduce their overall monthly build time by around 40-85% with Turborepo's caching.
Turn off caching
In some environments you don't want to write the cache output. To disable cache writes, append --no-cache
to any command. For example, this will run dev
(and all tasks that it dependsOn
) in all workspaces, but it won't cache the output:
turbo run dev --no-cache
Note that --no-cache
disables cache writes but does not disable cache reads. If you want to disable cache reads, use the --force
flag.
You can also configure specific tasks to skip writing to the cache by setting the pipeline.<task>.cache
configuration to false
:
{
"$schema": "https://turbo.build/schema.json",
"pipeline": {
"dev": {
"cache": false,
"persistent": true
}
}
}
Force overwrite cache
Conversely, if you want to disable reading the cache and force turbo
to re-execute a previously cached task, add the --force
flag:
# Run `build` npm script in all workspaces ignoring the cache.
turbo run build --force
Note that --force
disables cache reads but does not disable cache writes. If you want to disable cache writes, use the --no-cache
flag.
Logs
Not only does turbo
cache the output of your tasks, it also records the terminal output (i.e. combined stdout
and stderr
) to (<package>/.turbo/run-<command>.log
). When turbo
encounters a cached task, it will replay the output as if it happened again, but instantly, with the package name slightly dimmed.
Hashing
By now, you're probably wondering how turbo
decides what constitutes a cache hit vs. miss for a given task. Good question!
First, turbo
constructs a hash of the current global state of the codebase. This includes things like:
- Hash of the contents of any files that satisfy the glob patterns in
globalDependencies
- The values of environment variables listed in
globalEnv
- Select information from
turbo.json
,package.json
, and any lockfile - and more!
Then it adds more factors relevant to a given workspace's task:
- Hash of the contents of all version-controlled files in the workspace folder (or the files matching the
inputs
globs, if configured) - The configured
outputs
specified in thepipeline
- The set of resolved versions of all installed
dependencies
,devDependencies
, andoptionalDependencies
- The workspace task's name
- The sorted list of environment variable key-value pairs specified in
pipeline.<task>.env
list. - and more!
Once turbo
encounters a given workspace's task in its execution, it checks the cache (both locally and remotely) for a matching hash. If it's a match, it skips executing that task, moves or downloads the cached output into place, and replays the previously recorded logs instantly. If there isn't anything in the cache (either locally or remotely) that matches the calculated hash, turbo
will execute the task locally and then cache the specified outputs
.
The hash of a given task is available to the task at execution time as an environment variable TURBO_HASH
. This value can be useful in stamping outputs or tagging Dockerfile etc.