RFdiffusion (patched version)

Overview

RFdiffusion is an open source method for structure generation, with or without conditional information (a motif, target etc). It can perform a whole range of protein design challenges:

  • Motif Scaffolding
  • Unconditional protein generation
  • Symmetric unconditional generation (cyclic, dihedral and tetrahedral symmetries currently implemented, more coming!)
  • Symmetric motif scaffolding
  • Binder design
  • Design diversification ("partial diffusion", sampling around a design)
Source

Watson, J.L., Juergens, D., Bennett, N.R. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023). https://doi.org/10.1038/s41586-023-06415-8


Scope of This Documentation

This documentation does not aim to explain RFdiffusion itself, nor provide detailed guidance on how to use its various generative modeling modes.

Instead, it focuses on the specificities of using RFdiffusion on the RPBS HPC cluster, including:

  • the patched version maintained by RPBS and its user-visible differences
  • how to load the environment via module load
  • a generic SLURM job script example to run predictions
  • practical examples that illustrate key HPC-specific features:
  • how model weights are handled
  • how output directories and prefixes are set
  • how schedule directories are automatically managed or overridden

For complete usage instructions, prediction modes, and advanced options, please refer to the official RFdiffusion documentation.


RFdiffusion Patched Version: What Changes For You?

The RFdiffusion version deployed on RPBS HPC is a patched version of the official v1.1.0 release.

For users, this means:

  • you don’t need to download or specify the model weights.
    • the models are automatically loaded from /shared/banks/ckpt_models/rfdiffusion/1.1.0/.
  • you don’t need to set the schedule cache path.
    • the .schedules directory is automatically created inside your output folder (${outdir}/.schedules).
  • bug fixes are already applied
    • these corrections improve stability and compatibility but require no action from your side. View changelog

Getting Started on RPBS HPC

This section explains how to launch RFdiffusion jobs on the RPBS HPC infrastructure using the patched version.

Loading the Environment

Before using RFdiffusion, load the preconfigured module environment:

module load rfdiffusion/1.1.0-patched-rpbs

This sets up:

  • a containerized RFdiffusion runtime environment with GPU support
  • ready-to-use CLI entry points like run_inference.py

You should now have access to the run_inference.py main entrypoint options:

srun run_inference.py -h

SLURM Job Script Example

If you want to follow the following example, start by creating an input directory and downloading the example protein:

mkdir -p inputs
wget -O inputs/5TPN.pdb https://files.rcsb.org/download/5TPN.pdb

Below is a minimal SLURM script for running a motif scaffolding task. It will:

  • use 5TPN.pdb as the input structure (must exist in inputs/ directory)
  • anchor the design on a fixed segment: residues A163–181
  • inpaint 10–20 residues on both N- and C-terminal sides of this anchor
  • generate two designed structures
  • write the outputs to the results/ folder, with filenames starting with mydesign
#!/bin/bash
#SBATCH --job-name=rfdiff                 # Job name
#SBATCH --account=ACCOUNT-PROJECT-NAME    # Accounting project
#SBATCH --nodelist=gpu-node18             # Request specific node with GPU
#SBATCH --gres=gpu:1                      # Request 1 GPU
#SBATCH --cpus-per-task=4                 # Request 4 CPU cores
#SBATCH --mem=4G                          # Request 4 GB of RAM
#SBATCH --time=01:00:00                   # Set a 1 hour time limit
#SBATCH --output=logs/rfdiff_%j.out       # Save standard output to log file
#SBATCH --error=logs/rfdiff_%j.err        # Save standard error to log file


# Load the rfdiffusion environment
module load rfdiffusion/1.1.0-patched-rpbs

# Run RFdiffusion – Motif Scaffolding Task
run_inference.py \
  inference.input_pdb=inputs/5TPN.pdb \
  inference.outdir=results \
  inference.output_prefix=mydesign \
  inference.num_designs=1 \
  'contigmap.contigs=[10-20/A163-181/10-20]'

After running the SLURM job shown above, your results/ directory should contain the following:

Path / Filename Description
logs_2025-06-02_08-50-47/ Log output directory from the RFdiffusion run
└── run_inference.log Log file capturing stdout/stderr
mydesign_0.pdb Final designed structure – model 0
mydesign_0.trb Internal metadata (pickle trace bundle) – model 0
mydesign_1.pdb Final designed structure – model 1
mydesign_1.trb Internal metadata (pickle trace bundle) – model 1
traj/ Folder containing reverse-ordered denoising trajectories
├── mydesign_0_pX0_traj.pdb Model predictions at each timestep – model 0
├── mydesign_0_Xt-1_traj.pdb Inputs to the model at each timestep – model 0
├── mydesign_1_pX0_traj.pdb Model predictions at each timestep – model 1
└── mydesign_1_Xt-1_traj.pdb Inputs to the model at each timestep – model 1
.schedules/ Automatically created cache folder for diffusion schedules
└── T_50_omega_1000_min_...pkl Pickle file with precomputed schedule parameters

This format is ideal for ReadTheDocs, GitLab Pages, or GitHub READMEs and gives just the right amount

See the official documentation for more details.


Specific Patched Version Behavior

1. Model Weights – No Download Required

By default, RFdiffusion uses pretrained weights stored on shared banks:

/shared/banks/ckpt_models/rfdiffusion/1.1.0/

The models has been downloaded from the script download_models.sh.

Setting the path to the models folder is thus not required anymore. However, to override the default path used to access RFdiffusion models use:

export MODELS_PATH=/your/custom/models/path

2. Output Prefix and Directory – Independent Control

You can separately define the output directory and filename prefix:

inference.output_prefix=mydesign
inference.outdir=results/

3. Schedule Cache Directory – Automatic Handling

You don’t need to set the .schedules path manually. The RFdiffusion patched version automatically creates it inside your output folder: ${outdir}/.schedules/.

If needed, you can override the location using: inference.schedule_directory_path=/my/custom/cache/