Multi-state models from PanDDA

Galen Correy
08 August 2022
tags: #how_to

Background

The pan-dataset density analysis (PanDDA) tool developed by Nick Pearce and colleagues at the XChem facility of the Diamond Light Source is a super powerful method for identifying low occupancy states in X-ray crystallography data [1,2]. Why do we care about low occupancy states? For one thing, the field of fragment-based drug discovery relies on tools to identify weakly bound ligands [3,4]. When fragments are soaked into protein crystals, the occupancy of the fragment (i.e. the proportion of protein molecules with a fragment bound) can often be relatively low (e.g. 10-20%). PanDDA helps to identify low occupancy fragments by subtracting the ground-state electron density (i.e. the electron density when no ligand is present) from the changed-state electron density (i.e. the electron density when the ligand is present) [1]. In addition to transforming crystallographic fragment screening, PanDDA can also help to identify and model larger ligands that may bind with relatively high affinity compared to fragments, but still have relatively low occupancy. This discrepancy can arise because ligand occupancy in soaking experiments does not necessarily correlate with binding affinity as measured by solution-based methods. One reason for this is low ligand solubility; it may be difficult to reach 1:1 stoichiometry in a soaking experiment. Another reason is that a binding site may be partially obstructed, or otherwise stabilized in a conformation that decreases the ligand occupancy. The presence of low occupancy states is a fundamental challenge of using crystallographic soaking experiments for determining ligand structures: identifying and resolving these states is the reason that PanDDA is such a powerful method.

PanDDA is a powerful tool for identifying low occupancy states, but it presents crystallographers with a new challenge: actually modeling the states it identifies! The best option is to model both states using alternative occupancy (altloc) identifiers in the coordinate file to distinguish ligand-bound and ligand-free states [1,5] (this results in what we call a multi-state model). However, these multi-state models can be difficult to interpret/visualize, especially for the vast majority of users that are only interested in the ligand-bound state. A related issue is that we want to ensure that users can easily examine the PanDDA event maps that were used to model a ligand. For our recent preprint describing the design and structure-based optimization of ligands targeting the Nsp3 macrodomain, we modeled all the structures using a multi-state approach [6]. We’ve taken the following steps to disseminate the structures and maps as rapidly and helpfully as possible.

Multi-state coordinate files and structure factor intensities have been deposited in the PDB (with RELEASE NOW selected)
Structure factor intensities in MTZ format, Dimple output, PanDDA event/Z-maps, refined structures and ligand-bound states are available to download from Zenodo
Diffraction images are available to download from https://proteindiffraction.org (search by PDB code)

How to extract the ligand-bound state in our multi-state models

Option 1

Download coordinates from PDB (e.g. fetch 5SQP in PyMOL)
Remove the altloc A coordinates - these correspond to the ligand-free state (remove alt A in PyMOL)
The coordinates can then be visualized or saved as a coordinate file (pdb 5SQP_ligand-bound.pdb in PyMOL)

Option 2

Use this PyMOL script to fetch the coordinates using the PDB code and extract the ligand-bound state
This script removes the altloc records for residues that only have a single conformation modeled in the ligand-bound state and renames the altloc records for residues with multiple conformations (Alternatively: the ligand-bound states can be downloaded directly from Zenodo)

How to inspect PanDDA event maps

Option 1

Use this script to extract the PanDDA event map from the deposited structure factor CIFs (discussed here)
The resulting map coefficients in MTZ format can be converted to CCP4 format using phenix.mtz2map.

Option 2

Download the PanDDA event map in .ccp4 format from Zenodo. (Note: use COOT version 0.8.9.2 to visualize maps.)

Where to next?

Our goal is to use macromolecular structural information to make ligand discovery more efficient. We think that identifying and modeling low occupancy states is critical to this endeavor. Developing automated ways to model the low occupancy states identified by PanDDA is a long-term goal. This will speed up ligand modeling and reduce the error/bias that is often associated with manual approaches.