News from the Fraser Lab


Fraser Lab DEIJ Journal Club - Land Acknowledgments

Roberto Efraín (Robbie) Díaz
08 April 2022
tags: #deij_jc

Background
A group of scientists within the Fraser lab have begun a journal club centered around issues of diversity, equity, inclusion, and justice within academia, specifically in the biological sciences.

Our goal is to provide an environment for continued learning, critical discussion, and brainstorming action items that individuals and labs can implement. Our discussions and proposed interventions reflect our own opinions based on our personal identities and lived experiences, and may differ from the identities and experiences of others. We will recap our discussions and proposed action items through a series of blog posts, and encourage readers to directly engage with DEIJ practitioners and their scholarship to improve your environment.

Article: The limits of settlers’ territorial acknowledgments. Asher L, Curnow J & Davis A (2018) DOI: 10.1080/03626784.2018.1468211

Summary: There has been an increase in the performance of land acknowledgments by non-Indigenous people in non-Indigenous, primarily academic settler, spaces. This article examines what purpose do these land acknowledgments serve, who are they for, and can land acknowledgments performed by settlers be improved to better reflect the original intentions of Indigenous people who created this practice?

Key Points:

  • Settlers perform land acknowledgments as a means of combating the erasure of Indigenous people.
    • The pedagogical intention has been to combat erasure and force settlers to grapple with our positionality.
  • In becoming standardized and mainstream, it is reduced to a “mundane “box-ticking” exercise, easily ignored and void of learning opportunities.”
  • Settler moves to innocence are those strategies or positionings that attempt to relieve the settler of feelings of guilt or responsibility without giving up land or power or privilege, without having to change much at all.
  • The practice needs to be improved to avoid becoming rote and normalized.

Open Questions:

  • What pedagogical work do territorial acknowledgments accomplish in settler spaces?
  • What do people learn from a territorial acknowledgment and does it serve any decolonial purpose?
  • Are territorial acknowledgments productive in disrupting avoidance mechanisms and pushing settlers towards decolonial solidarity?

Proposed Action Items:

  • Begin including an intentional and well-researched land acknowledgment in your presentations.
  • Include specific examples of how settlers can contribute to decolonial efforts.

Inspecting PanDDA event maps deposited in the Protein Data Bank

Galen Correy
26 August 2021

Background

The PanDDA algorithm is a super useful tool for detecting low occupancy ligands in electron density maps obtained by X-ray diffraction. Low occupancy ligands are frequently encountered in fragment screening campaigns, and PanDDA can greatly increase the hit rate of a fragment screen and therefore increase the number of starting points available for fragment-based ligand discovery. We’ve used PanDDA for fragment screens against the PTP1B phosphatase and the NSP3 macrodomain from SARS-CoV-2. After modeling ligands, the data are deposited in the PDB. Data includes the atomic coordinates, the structure factor intensities, the map coefficients after final refinement with the ligand, and the PanDDA event map coefficients. The structure factor intensities and the map coefficients as separate data blocks in a single CIF.

The problems

There are two problems with looking at this data after downloading it from the PDB. The first problem is that because of the low occupancies of ligands, maps based on the structure factor intensities or the map coefficients after final refinement with the partial occupancy ligand often do not contain convincing electron density evidence for the bound ligand. That evidence is best found in the PanDDA map.

The second problem is that CIFs with multiple data blocks can be tricky to convert into MTZ files for visualization in COOT. From my experience, running phenix.cif_as_mtz will lead to the correct conversion of the map coefficients from the refined data, however, the PanDDA event map coefficients may not be converted. The structure factors encoding the PanDDA event map are based on the real space analysis and in space group P1, not the symmetry of the corresponding PDB file.

The solution

We split the CIF containing the three data blocks into separate CIFs. Actually, it’s fine just to extract the PanDDA event map block into one CIF, and move the original and refined data in another. Then run phenix.cif_as_mtz on the separate CIFs, with the correct symmetry flags specified, to convert them into MTZ files.

The extract_pandda.sh script does this for you. Download the coordinates and structure factor file from the Protein Data Bank (xxxx.pdb and xxxx-sf.cif files, where xxxx is the four letter PDB code) and move them to a working directory. On the command line, run ./extract_pandda.sh xxxx-sf.cif SG (where SG is the space group of the crystal). The script will split the CIF into two separate CIFs, containing the refined and original data (xxxx-sf_data.cif) or the PanDDA event map (xxxx-sf_pandda.cif). The script then runs phenix.cif_as_mtz and converts the CIFs to MTZs for visualization in COOT.

Caveats

The script needs the Phenix version dev-4338 to run, available here.

Data blocks need to be named as follows in the CIF (where xxxx is the PDB code):

  1. Data from final refinement with ligand: data_rxxxxsf
  2. PanDDA event map: data_rxxxxAsf
  3. Original data: data_rxxxxBsf

How to structure 1:1 meetings

Gabriella Estevam
29 March 2021

One of the best things we can give each other as colleagues is our time and attention. Therefore, to receive pertinent advice and make the most out of 1:1 and subgroup meetings, it is imperative to effectively communicate our immediate goals and expectations.

I would consider myself a goal-oriented individual, so concretely breaking down overarching goals into smaller to-do items helps me not only outline what needs to be done to get to the finish line, but also allows me to schedule and prioritize my efforts. As a side-effect, this also helps me better communicate, having already reflected and planned out what I hope to accomplish in a given timeline.

I have found the best way to prepare for 1:1 or subgroup meetings is by using the usual and brilliantly simple slideshow, as having a visual aid grounds the conversation.

When sitting down to prepare any 1:1 or subgroup meeting, I first ask myself a few questions:

  1. What is my biggest current goal?
  2. What have I done to progress towards accomplishing that goal?
  3. What do I have left to do?
  4. What comes next?
  5. Is there anything else I’d like to talk about?

In thinking about these questions, I try to keep it fairly discrete. As a graduate student my obvious biggest goal is to graduate, but my biggest goal of the month might be to make some mutants for an assay, so I stay focused on that and begin outlining my thoughts on the first slide itself. Since we meet about twice a month for subgroup meetings, I keep the slides focused on monthly goals. However, for 1:1 meetings the outlined goals can be more overarching if I think I need perspective on the project as a whole, or if meeting 1:1 on a weekly basis I’ll focus the discussion on plans for the week.

Once I have my talking points outlined, I then proceed to make the rest of the slides. At this stage it becomes like preparing an extremely condensed group meeting, where the focus is on showing the experimental workflow and data I have collected, or simply in bullet points listing the things I have accomplished or have yet to do. One thing I place great time in when preparing for the meeting are the questions “what comes next?” and “is there anything else I’d like to talk about?” mainly because these might be less tangible than let’s say cloning a mutant. These are the questions that are the most reflective about the present and future. For instance, what will making this protein mutant allow me to do next and why is that important for my project? Clearly explaining how and why your goals are important for your project, career, etc. is the crux of a productive meeting.

If the slides have been thoughtfully prepared, it should then be pretty easy to discuss your plans, show your productivity, and manage your time in the meeting. Everything is at your fingertips, and if you forget or run out of time to discuss a topic, it’s there in writing to discuss later on Slack, in passing, or whatever you find most appropriate.

Remember, 1:1 and subgroup meetings are your time! It’s your time to get feedback, advice, vent about why things aren’t working, express happiness about things that are working, ask for support, brainstorm, and map out next steps! Be thoughtful and use your time wisely.

Here’s an example of one of my own subgroup meeting slides.


How to make depositing a ligand-bound ribosome model and raw CryoEM movies as painless as possible

Jenna Pellegrino
20 October 2020

I’ve learned quite a bit through my experiences with the PDB and EMPIAR deposition processes of ligand-bound ribosome CryoEM data. The below are some of my guidelines and tips for how to make going through these two deposition processes less confusing and/or frustrating.

Part 1: PDB deposition

This blog post assumes you have a finalized map and model ready for deposition.

There’s a lot of information below. Here’s the short version, which will serve as a refresher after having read this post in full:

  1. Choose the right coordinates file format (.pdb or .cif) for the size of your model
  2. Run BLAST on each chain, pick the best match, and update sequence with non-standard residues
  3. Have your data collection and processing information from your Table 1
  4. Have your ligand’s SMILES string
  5. Resolve outliers and clashes denoted in the Validation report before submitting
  6. Review analysis report, make corrections, reupload a file or approve for deposition

Opening a deposition instance

Go to the PDB’s wwPDB Deposition Service and start a new deposition. Select your country and then answer the questions below. In this example, we’re depositing a single particle electron microscopy structure; we’re depositing coordinates (that’s the model), and the associated map has not previously been deposited. You’ll get an email with login details.

Note that you will need to open a new deposition account for each map/model pair you plan to deposit.

Uploading map, model, and png thumbnail

Along with the map, you need the calibrated pixel size from the data collection (which, if you haven’t binned your data, is equal to the voxel spacing) and a recommended contour level. For the recommended contour level, open the map and model in Coot, zoom in on the ligand, and adjust the map’s contour. Record the sigma of the map that produces the best clarity for observing the ligand’s density.

The model you upload can be in either PDB format (.pdb) or mmCIF format (.cif); for information on model refinement using PHENIX with the OPLS3e force field, see my Benchling protocol here. If you upload a .pdb, it must meet PDB naming requirements, and there must be a “TER” line after each polymer chain. The most relevant PDB naming restrictions for ribosome models are: 1) chain names can only be one letter long and 2) atom names cannot have letters in them. For small ribosome models, these likely won’t be an issue; however, larger ribosome models with many chains and atoms will run into these issues. The easiest way around these is to convert the model to mmCIF format, which supports these naming schemes. Open the model in PyMOL, save it as a .cif, and upload this model file to the PDB deposition service.

  1. Be careful if you were working with a .cif ribosome model containing chains with two-letter names (such as chains AA, AB, AC) and saved it as a .pdb. These chains will all be renamed to chain A. You’ll then run into such issues as “chain A has multiple atoms with the same name”.
  2. This is relevant to you if there are more than 99,999 atoms in your model, because then atoms will start getting named with letters, such as “A0000”. If this is in your model, you’ll get a “ERROR: ‘A0000’ is not a number.” flag when you try to upload your model to the PDB deposition service.

Don’t overthink the thumbnail. I used the same screenshot of my ribosome map, shown as a surface, in Chimera as my thumbnail for all my ligand-bound ribosome models.

When you submit these files, PDB validation will run on the model you uploaded. You’ll get an email when the validation report is ready to look at. More on this file later. Know that you can also run this validation on any model with the wwPDB Validation Service.

Completing the “Admin” section

This section is straightforward. Note that, after filling in all this information once, you can choose to copy it to a new deposition instance.

Getting complete nucleotide and amino acid sequences for your model

In the “Macromolecules” section, you’ll need to submit names and sequences for each polymer chain in your model following these directions:

Input the sequence of this molecule using standard one-letter codes. Please include the complete sequence including tags, linkers, unobserved regions and mutations. Non-standard residues should be input using the three-letter code in parenthesis, e.g. (MSE).

It’s okay if the sequence you input has extra residues (this will be the case if the model you’re depositing has truncated chains or residues missing); however, the sequence you input cannot be lacking residues or ligands in the model you’re depositing. That means you need to include the three-letter code(s) for the ligand(s) in your model.

I stress that this sequence should be the complete sequence of what is found in nature. To get this:

  1. Run phenix.print_sequence model.pdb > seq.txt to extract the exact sequence of your model to a text file. Most of the time, the sequence in your model is incomplete, so do NOT use this sequence in the deposition. (If you do, you’ll probably get asked about it in the review, but better to be right the first time.) This output file will be in FASTA format. Annoyingly, all non-standard RNA residues will be represented with “?”. Currently, manually editing these to be in three-letter code with parenthesis, such as “(6MZ)”, is the only way I know to correct these. (Although you can do the next step without editing these “?”, you’ll need to edit them eventually. When I do this, I’ll have the model open in PyMOL with the Display Sequence on.)
  2. Run a BLAST sequence alignment search. Although you can do BLAST searches with multiple FASTA inputs at once, I think it’s less confusing to do each polymer chain sequence one at a time. Select which BLAST search is appropriate for your sequence, either BLASTn for nucleotide sequences or BLASTp for ribosomal protein sequences. When the search is done, select a sequence with the greatest percent match that also accurately describes your sample. Click the GenBank link next to the sequence range for the alignment. This will bring you to an NCBI page. Click the FASTA link to get the sequence of the aligned region in FASTA format. Copy this FASTA and keep it, along with the weblink, in your records.
  3. Edit this FASTA sequence: change all “T” to “U”, update any non-standard RNA residues to have their name in three-letter code enveloped in parenthesis, and, if this chain has your ligand(s) bound, add the ligand(s) name in three-letter code enveloped in parenthesis.
  4. Input this sequence into the PDB deposition service. You can find suggestions for the chain’s name from the BLAST search. Examples of what I’ve used: 23S ribosomal RNA, 50S ribosomal protein L2. After pasting the sequence in, you’ll need to click the button to align it.
  5. Below this section, you’ll specify how the molecule/sample was obtained (e.g. purified from natural source) and what it is (e.g. Escherichia coli, Taxonomy ID 562).
  6. Repeat steps 2-5 for all other polymer chains.

Adding details of your collection

The data collection sections, e.g. “EM sample” and “EM experiment” tabs, are straightforward. Many of these details will be values already in your Table 1.

Addressing your ligands

In the “Ligands” section, you’ll specify which ligand is the study’s subject of interest (non-standard RNA residues will also appear among this list) and include a few additional details. I recommend submitting your ligand’s SMILES string among these.

For the ribosome, your model is likely C1, having no symmetry (might be different if yours is a crystal structure). If this is the case, then yes, your assembly applies to all chains and yes, the assembly can be generated without applying matrices.

The “Related entries” section is straightforward.

Reviewing the Validation report

There’s a lot of information packed into this report. Here are some especially important things to look out for:

  • Percentile scores: these bars should be in the blue
  • Outliers: bond length, bond angle, chirality, planarity, and Ramachandran
  • Cis peptides
  • Clashes: because the ribosome is huge, you’re bound to have a lot of these. I recommend running Validate → Probe Clashes on your model in Coot and focusing on the pink clashes near the ligand and those that are especially bad.
  • Ligand: you’ll need to wait for your submission to be reviewed before you can see all the details for this

Submitting your entry for deposition

Your job isn’t done when you submit your entry. You will be unable to edit the deposition instance until your PDB deposition contact reviews your submission and reports back to you. Once you submit, you’ll receive a PDB ID and an EMDB ID, if your map was obtained through electron microscopy. Don’t forget to add these accession names to your Table 1.

Reviewing the extended Validation report and closing out the deposition

When you receive this report, read over it, address anything flagged as potentially being wrong, and ensure that the rest of the information listed is correct. Open the extended Validation report and ensure the stereochemistry of your ligand is correct; if you gave your SMILES string, it’s unlikely this would be wrong, but still it’s better to check. At this point, you can still make any necessary changes to your model and reupload it, going through the process of validation again. If everything looks good and you’ve confirmed it, then there’s nothing else to do. Remember that you can always ask to have your deposition instance unlocked so you can make changes in the future. Uploading a new map or model will put you through the validation cycle again.

Part 2: EMPIAR deposition

EMPIAR, or the Electron Microscopy Public Image Archive, is a public resource for raw electron microscopy movies, images, image stacks, particles, class averages, and more.

Opening an EMPIAR deposition instance

Go to EMPIAR’s deposition home page to register your user account or to log in. Unlike the PDB, where you need to make a new deposition login for each structure you want to deposit, everything in EMPIAR is connected to one login. Once logged in, you can click the link aptly called “Create a new deposition” to get started. Note that EMPIAR has an extremely helpful pictoral deposition manual available to you. I found it exceedingly useful and clear.

Once you’ve created a deposition instance, you have 3 main jobs:

  1. Fill out all the citation, entry title, and authorship details (and upload a png or gif thumbnail)
  2. Upload your data
  3. Fill out the image set format specifications

Job 1 - Complete the Deposition overview

This is the first page you’re brought to when you make a new deposition. If need be, you can navigate back to it by clicking the “Deposition overview” link on the left under “Deposition-related tasks”. This section is straightforward. Note that, after filling in all this information once, you can choose to copy it to a new deposition instance.

Also note that nearly every section is required to be filled out. If you don’t have the information or if the section isn’t relevant, click “N/A”. This will allow you to Save & Validate successfully and move on. You can only upload your data once you successfully validate.

Job 2 - Upload your data

After having completed Job 1, you’ll be able to upload your data. There are 3 ways to do this: Globus, Aspera using the command line, and Aspera using the web interface. I completed all my data uploads using Globus. To use Globus, you’ll have to make your own account and download Globus Connect Personal. You’ll need this software to establish an endpoint on the computer which houses your raw data, from which the transfer will be made to an EMPIAR endpoint.

Following the steps on EMPIAR under the Globus upload section is quite straightforward. I’ll reiterate their instructions to emphasize that you must be logged into your Globus account before trying to access their endpoint. Then, you can follow the links on EMPIAR to access their endpoint, log into the unique EMPIAR endpoint using the username and password they provide you, and transfer your data. Remember that you must have your home endpoint activated in order for you to be able to transfer anything.

Many types of raw EM data can be deposited to EMPIAR, including movies, images, and image stacks. If you’re depositing movies, it’s required that you include the appropriate gain reference for each set of movies. Include a dark reference and defects file if available, but these are optional.

Job 3 - Fill out the image set format specifications

Under “Deposition-related tasks”, you’ll now see an “Associate image sets with the data” section. Here, you’ll fill out some specifications, including the kind of data you’re uploading (e.g. multiframe micrographs), the format (e.g. TIFF), and the image and pixel sizes, among other details. If you don’t have these values on hand, most can be found in the header of your file. One potentially curious spec is the voxel type. If you don’t know what your voxel type is, you can use header from IMOD, identify from ImageMagick, or the header flag -H from EMAN2 on one of your files to discern it. If your voxel type (also called “data type”) comes up as “unknown” by IMOD, try ImageMagick. For an unknown reason, the latter worked for me and not the former.

Submitting your entry for deposition

Your job isn’t done when you submit your entry. Once you submit, you’ll receive a public EMPIAR ID. Don’t forget to add this accession ID to your Table 1. Your entry will be reviewed and, once complete, you’ll have to log into your account to approve each deposition for release individually. Congratulations.

Another option to creating a deposition instance

When creating a deposition, you have the option to choose “Create a new deposition from XML” to upload an XML file containing all the specifications and details of one (or presumably multiple) depositions. You’ll need to follow their XML schema, found here in .xsd and .sch formats. If that link goes stale, you can also find the schema under the “What is EMPIAR data model?” section of their FAQ.


How we include protein structures in our posts

Daniel Hogan
08 July 2020

To display 3D structural models, we use UglyMol, an open-source web-based macromolecular viewer focused on electron density. It’s embedded within a separate HTML page that is included in posts using an iframe. The document that the iframe inserts is located at /static/posts/uglymol/uglymol.html, which can be viewed here. The structure to show is selected using a query string. For example, the structure for entry 3K0N on the PDB can be inserted into any page on our site by simply including <iframe src="/static/posts/uglymol/uglymol.html#id=3k0n"></iframe>, which would cause it to look for files called 3k0n.pdb and 3k0n.mtz within the /static/posts/uglymol/ directory. Note that the files are stored locally and not loaded from the PDB.

Further parameters can also be added, such as #id=3k0n&xyz=10,5,15&eye=90,-30,60&zoom=50, yielding the following figure:

For a concrete example, the document creating this post can be viewed here. It, like everything else on this site, is released under an MIT license.