I’ve learned quite a bit through my experiences with the PDB and EMPIAR deposition processes of ligand-bound ribosome CryoEM data. The below are some of my guidelines and tips for how to make going through these two deposition processes less confusing and/or frustrating.
This blog post assumes you have a finalized map and model ready for deposition.
There’s a lot of information below. Here’s the short version, which will serve as a refresher after having read this post in full:
Go to the PDB’s wwPDB Deposition Service and start a new deposition. Select your country and then answer the questions below. In this example, we’re depositing a single particle electron microscopy structure; we’re depositing coordinates (that’s the model), and the associated map has not previously been deposited. You’ll get an email with login details.
Note that you will need to open a new deposition account for each map/model pair you plan to deposit.
Along with the map, you need the calibrated pixel size from the data collection (which, if you haven’t binned your data, is equal to the voxel spacing) and a recommended contour level. For the recommended contour level, open the map and model in Coot, zoom in on the ligand, and adjust the map’s contour. Record the sigma of the map that produces the best clarity for observing the ligand’s density.
The model you upload can be in either PDB format (.pdb) or mmCIF format (.cif); for information on model refinement using PHENIX with the OPLS3e force field, see my Benchling protocol here. If you upload a .pdb, it must meet PDB naming requirements, and there must be a “TER” line after each polymer chain. The most relevant PDB naming restrictions for ribosome models are: 1) chain names can only be one letter long and 2) atom names cannot have letters in them. For small ribosome models, these likely won’t be an issue; however, larger ribosome models with many chains and atoms will run into these issues. The easiest way around these is to convert the model to mmCIF format, which supports these naming schemes. Open the model in PyMOL, save it as a .cif, and upload this model file to the PDB deposition service.
Don’t overthink the thumbnail. I used the same screenshot of my ribosome map, shown as a surface, in Chimera as my thumbnail for all my ligand-bound ribosome models.
When you submit these files, PDB validation will run on the model you uploaded. You’ll get an email when the validation report is ready to look at. More on this file later. Know that you can also run this validation on any model with the wwPDB Validation Service.
This section is straightforward. Note that, after filling in all this information once, you can choose to copy it to a new deposition instance.
In the “Macromolecules” section, you’ll need to submit names and sequences for each polymer chain in your model following these directions:
Input the sequence of this molecule using standard one-letter codes. Please include the complete sequence including tags, linkers, unobserved regions and mutations. Non-standard residues should be input using the three-letter code in parenthesis, e.g. (MSE).
It’s okay if the sequence you input has extra residues (this will be the case if the model you’re depositing has truncated chains or residues missing); however, the sequence you input cannot be lacking residues or ligands in the model you’re depositing. That means you need to include the three-letter code(s) for the ligand(s) in your model.
I stress that this sequence should be the complete sequence of what is found in nature. To get this:
phenix.print_sequence model.pdb > seq.txt
to extract the exact sequence of your model to a text file. Most of the time, the sequence in your model is incomplete, so do NOT use this sequence in the deposition. (If you do, you’ll probably get asked about it in the review, but better to be right the first time.) This output file will be in FASTA format. Annoyingly, all non-standard RNA residues will be represented with “?”. Currently, manually editing these to be in three-letter code with parenthesis, such as “(6MZ)”, is the only way I know to correct these. (Although you can do the next step without editing these “?”, you’ll need to edit them eventually. When I do this, I’ll have the model open in PyMOL with the Display Sequence on.)The data collection sections, e.g. “EM sample” and “EM experiment” tabs, are straightforward. Many of these details will be values already in your Table 1.
In the “Ligands” section, you’ll specify which ligand is the study’s subject of interest (non-standard RNA residues will also appear among this list) and include a few additional details. I recommend submitting your ligand’s SMILES string among these.
For the ribosome, your model is likely C1, having no symmetry (might be different if yours is a crystal structure). If this is the case, then yes, your assembly applies to all chains and yes, the assembly can be generated without applying matrices.
The “Related entries” section is straightforward.
There’s a lot of information packed into this report. Here are some especially important things to look out for:
Your job isn’t done when you submit your entry. You will be unable to edit the deposition instance until your PDB deposition contact reviews your submission and reports back to you. Once you submit, you’ll receive a PDB ID and an EMDB ID, if your map was obtained through electron microscopy. Don’t forget to add these accession names to your Table 1.
When you receive this report, read over it, address anything flagged as potentially being wrong, and ensure that the rest of the information listed is correct. Open the extended Validation report and ensure the stereochemistry of your ligand is correct; if you gave your SMILES string, it’s unlikely this would be wrong, but still it’s better to check. At this point, you can still make any necessary changes to your model and reupload it, going through the process of validation again. If everything looks good and you’ve confirmed it, then there’s nothing else to do. Remember that you can always ask to have your deposition instance unlocked so you can make changes in the future. Uploading a new map or model will put you through the validation cycle again.
EMPIAR, or the Electron Microscopy Public Image Archive, is a public resource for raw electron microscopy movies, images, image stacks, particles, class averages, and more.
Go to EMPIAR’s deposition home page to register your user account or to log in. Unlike the PDB, where you need to make a new deposition login for each structure you want to deposit, everything in EMPIAR is connected to one login. Once logged in, you can click the link aptly called “Create a new deposition” to get started. Note that EMPIAR has an extremely helpful pictoral deposition manual available to you. I found it exceedingly useful and clear.
Once you’ve created a deposition instance, you have 3 main jobs:
This is the first page you’re brought to when you make a new deposition. If need be, you can navigate back to it by clicking the “Deposition overview” link on the left under “Deposition-related tasks”. This section is straightforward. Note that, after filling in all this information once, you can choose to copy it to a new deposition instance.
Also note that nearly every section is required to be filled out. If you don’t have the information or if the section isn’t relevant, click “N/A”. This will allow you to Save & Validate successfully and move on. You can only upload your data once you successfully validate.
After having completed Job 1, you’ll be able to upload your data. There are 3 ways to do this: Globus, Aspera using the command line, and Aspera using the web interface. I completed all my data uploads using Globus. To use Globus, you’ll have to make your own account and download Globus Connect Personal. You’ll need this software to establish an endpoint on the computer which houses your raw data, from which the transfer will be made to an EMPIAR endpoint.
Following the steps on EMPIAR under the Globus upload section is quite straightforward. I’ll reiterate their instructions to emphasize that you must be logged into your Globus account before trying to access their endpoint. Then, you can follow the links on EMPIAR to access their endpoint, log into the unique EMPIAR endpoint using the username and password they provide you, and transfer your data. Remember that you must have your home endpoint activated in order for you to be able to transfer anything.
Many types of raw EM data can be deposited to EMPIAR, including movies, images, and image stacks. If you’re depositing movies, it’s required that you include the appropriate gain reference for each set of movies. Include a dark reference and defects file if available, but these are optional.
Under “Deposition-related tasks”, you’ll now see an “Associate image sets with the data” section. Here, you’ll fill out some specifications, including the kind of data you’re uploading (e.g. multiframe micrographs), the format (e.g. TIFF), and the image and pixel sizes, among other details. If you don’t have these values on hand, most can be found in the header of your file. One potentially curious spec is the voxel type. If you don’t know what your voxel type is, you can use header
from IMOD, identify
from ImageMagick, or the header flag -H
from EMAN2 on one of your files to discern it. If your voxel type (also called “data type”) comes up as “unknown” by IMOD, try ImageMagick. For an unknown reason, the latter worked for me and not the former.
Your job isn’t done when you submit your entry. Once you submit, you’ll receive a public EMPIAR ID. Don’t forget to add this accession ID to your Table 1. Your entry will be reviewed and, once complete, you’ll have to log into your account to approve each deposition for release individually. Congratulations.
When creating a deposition, you have the option to choose “Create a new deposition from XML” to upload an XML file containing all the specifications and details of one (or presumably multiple) depositions. You’ll need to follow their XML schema, found here in .xsd and .sch formats. If that link goes stale, you can also find the schema under the “What is EMPIAR data model?” section of their FAQ.