Eric Greene
04 November 2022
tags:
#deij_jc
Background
A group of scientists within the Fraser, Coyote-Maestas, and Pinney labs have begun a journal club centered around issues of diversity, equity, inclusion, and justice within academia, specifically in the biological sciences.
Our goal is to provide an environment for continued learning, critical discussion, and brainstorming action items that individuals and labs can implement. Our discussions and proposed interventions reflect our own opinions based on our personal identities and lived experiences, and may differ from the identities and experiences of others. We will recap our discussions and proposed action items through a series of blog posts, and encourage readers to directly engage with DEIJ practitioners and their scholarship to improve your environment.
November 4th, 2022 – Blinding peer review
Discussion Leader: Eric Greene
Articles:
Summary Article: “Funding: Blinding peer review”
Primary Article: “An experimental test of the effects of redacting grant applicant identifiers on peer review outcomes”
Bonus Article: “Strategies for inclusive grantmaking”
Summary and Key Points:
STEM research funding is a highly competitive space that has a persistent lack of diversity and representation, especially at the faculty level. I chose this case study as it discusses one of the largest current racial disparities in STEM, highlights a source of white privilege that directly impacts lab funding, and provides experimental evidence towards one mitigation strategy.
The NIH is a substantial funding source for biomedical research in the US and NIH funding is foundational to the existence of many laboratories that are driving biomedical scientific discovery. However, there is a large and persistent funding gap between White and Black investigators, where Black PIs are funded at 55-60% White PIs rate.
In response to this disparity, the NIH conducted a study on the effects of blinding applicants’ identity and institution on the review of R01 proposals. The goal of this large experiment was to gain an understanding about the role of peer review in facilitating racial bias in grant awards and to understand the extent to which blinding applicant identity could blunt racial bias. The experiment uncovered the following:
-
Scores for applications from Black PIs were unaffected by blinding, but scores for applications from White PIs were significantly lower when the White PIs identity was blinded such that the racial gap was cut in half. This finding could be due to the “Halo effect” where personal/institutional prestige dramatically upweights advantaged/privileged individuals and can be seen as another mechanism fueling a ‘winners keep winning’ phenomena. Indeed the “Halo effect” has been indicated to be a potent factor in manuscript peer-review.
-
The principle critique of invoking the “Halo effect” to rationalize the findings of this study is that proposal writers did not write their proposal with identifying information redacted, it was done administratively with previously reviewed R01 applications, leaving uncertainty regarding the impact of administrative redaction on ‘grantsmanship’. However, we discussed the likelihood that applicants who benefit from individual/institutional prestige would likely write favorably toward this status in their applications thus in effect working to entrench any positive “Halo effect” benefit.
Blinding applicant identification on grant proposals is not a silver bullet that solves racial disparity in NIH funding. Including being imperfect itself, with ~22% of reviewers able to positively identify blinded applicant identity. However, this is one tool that has a demonstrated effect here to blunt reviewer bias. While blinding was somewhat effective here, there are means of double blinding and/or tiered blinding of application materials that can be used instead that may hold greater potential.
A key part of our discussion was about the review criteria for NIH funding that explicitly required a numerical evaluation of the individual and institution. Evaluation of a person contributes to an obligate entanglement of one’s past scientific accomplishments with their future potential during the grant review process. Not only can this equivalence be false (people often can succeed past initial setbacks), but it also can be harmful by promoting an applicant’s self-worth to be tied to their productivity. Funding requires accounting for equipment available to carry out the research, which is important for accountability on the part of the investigator, but does not necessarily require a numerical number. This detailed level of evaluation would prompt reviewers to score prestigious/well-resourced institutions higher even if the same research could be carried out elsewhere. We discussed as an alternative whether equipment/facilities categories could be scored as ‘sufficient’ or ‘insufficient’ and not influence the overall impact score of the application.
Open Questions:
- How does one justly judge an application as fundable?
- The ‘Halo effect’ in consequential academic evaluation processes has amassed supportive evidence beyond grant funding. How do we best de-leverage this effect towards a level playing field?
- Blinding applicant identity can help, even if not perfect, how do we improve blinding processes through an equity lens?
- Another explanation for lower Black PI funding rates stems from the subject matter of study, such as studying health care topics of interest for communities of color, which though important may not necessarily be of high funding value to reviewer or reviewing institute. How can these health care topics be adequately elevated and funded?
- To what extent do non-NIH funding mechanisms also incur racial disparity? What have other organizations tried to mitigate? Have these strategies worked?
Proposed Action Items:
While trainees may have limited influence to change the course of NIH peer review, there are nonetheless actions that one can take:
- Call your Representative/Senators to implore them to raise the NIH budget. The value of NIH sponsored research is high to the general public and with more funds, the 10-30% fund rate will increase and be less demoralizing to independent investigators and trainees.
- Should you find yourself in the position of power as a peer reviewer, practice empathy during the review process and familiarize yourself with bias’ that can crop up in the process
- Vote. The NIH is a government entity and is not immune to political authority figures.
- Encourage unsuccessful applicants to pursue resubmission. Rejection is hard but community can help.
- Encourage other non-federal funding mechanisms to blind reviewers or if they have the budget, to do a study where each application is evaluated blinded and open. Compare the scores and who gets funded.
Galen Correy
08 August 2022
tags:
#how_to
Background
The pan-dataset density analysis (PanDDA) tool developed by Nick Pearce and colleagues at the XChem facility of the Diamond Light Source is a super powerful method for identifying low occupancy states in X-ray crystallography data [1,2]. Why do we care about low occupancy states? For one thing, the field of fragment-based drug discovery relies on tools to identify weakly bound ligands [3,4]. When fragments are soaked into protein crystals, the occupancy of the fragment (i.e. the proportion of protein molecules with a fragment bound) can often be relatively low (e.g. 10-20%). PanDDA helps to identify low occupancy fragments by subtracting the ground-state electron density (i.e. the electron density when no ligand is present) from the changed-state electron density (i.e. the electron density when the ligand is present) [1]. In addition to transforming crystallographic fragment screening, PanDDA can also help to identify and model larger ligands that may bind with relatively high affinity compared to fragments, but still have relatively low occupancy. This discrepancy can arise because ligand occupancy in soaking experiments does not necessarily correlate with binding affinity as measured by solution-based methods. One reason for this is low ligand solubility; it may be difficult to reach 1:1 stoichiometry in a soaking experiment. Another reason is that a binding site may be partially obstructed, or otherwise stabilized in a conformation that decreases the ligand occupancy. The presence of low occupancy states is a fundamental challenge of using crystallographic soaking experiments for determining ligand structures: identifying and resolving these states is the reason that PanDDA is such a powerful method.
PanDDA is a powerful tool for identifying low occupancy states, but it presents crystallographers with a new challenge: actually modeling the states it identifies! The best option is to model both states using alternative occupancy (altloc) identifiers in the coordinate file to distinguish ligand-bound and ligand-free states [1,5] (this results in what we call a multi-state model). However, these multi-state models can be difficult to interpret/visualize, especially for the vast majority of users that are only interested in the ligand-bound state. A related issue is that we want to ensure that users can easily examine the PanDDA event maps that were used to model a ligand. For our recent preprint describing the design and structure-based optimization of ligands targeting the Nsp3 macrodomain, we modeled all the structures using a multi-state approach [6]. We’ve taken the following steps to disseminate the structures and maps as rapidly and helpfully as possible.
-
Multi-state coordinate files and structure factor intensities have been deposited in the PDB (with RELEASE NOW selected)
-
Structure factor intensities in MTZ format, Dimple output, PanDDA event/Z-maps, refined structures and ligand-bound states are available to download from Zenodo
-
Diffraction images are available to download from https://proteindiffraction.org (search by PDB code)
How to extract the ligand-bound state in our multi-state models
Option 1
-
Download coordinates from PDB (e.g. fetch 5SQP
in PyMOL)
-
Remove the altloc A coordinates - these correspond to the ligand-free state (remove alt A
in PyMOL)
-
The coordinates can then be visualized or saved as a coordinate file (pdb 5SQP_ligand-bound.pdb
in PyMOL)
Option 2
-
Use this PyMOL script to fetch the coordinates using the PDB code and extract the ligand-bound state
-
This script removes the altloc records for residues that only have a single conformation modeled in the ligand-bound state and renames the altloc records for residues with multiple conformations (Alternatively: the ligand-bound states can be downloaded directly from Zenodo)
How to inspect PanDDA event maps
Option 1
- Use this script to extract the PanDDA event map from the deposited structure factor CIFs (discussed here)
- The resulting map coefficients in MTZ format can be converted to CCP4 format using phenix.mtz2map.
Option 2
- Download the PanDDA event map in .ccp4 format from Zenodo. (Note: use COOT version 0.8.9.2 to visualize maps.)
Where to next?
Our goal is to use macromolecular structural information to make ligand discovery more efficient. We think that identifying and modeling low occupancy states is critical to this endeavor. Developing automated ways to model the low occupancy states identified by PanDDA is a long-term goal. This will speed up ligand modeling and reduce the error/bias that is often associated with manual approaches.
References
[1] Pearce, N. M., Krojer, T., Bradley, A. R., Collins, P., Nowak, R. P., Talon, R., Marsden, B. D., Kelm, S., Shi, J., Deane, C. M. & von Delft, F. A multi-crystal method for extracting obscured crystallographic states from conventionally uninterpretable electron density. Nat. Commun. 8, 15123 (2017).
[2] Schuller, M., Correy, G. J., Gahbauer, S., Fearon, D., Wu, T., Díaz, R. E., Young, I. D., Carvalho Martins, L., Smith, D. H., Schulze-Gahmen, U., Owens, T. W., Deshpande, I., Merz, G. E., Thwin, A. C., Biel, J. T., Peters, J. K., Moritz, M., Herrera, N., Kratochvil, H. T., QCRG Structural Biology Consortium, Aimon, A., Bennett, J. M., Brandao Neto, J., Cohen, A. E., Dias, A., Douangamath, A., Dunnett, L., Fedorov, O., Ferla, M. P., Fuchs, M. R., Gorrie-Stone, T. J., Holton, J. M., Johnson, M. G., Krojer, T., Meigs, G., Powell, A. J., Rack, J. G. M., Rangel, V. L., Russi, S., Skyner, R. E., Smith, C. A., Soares, A. S., Wierman, J. L., Zhu, K., O’Brien, P., Jura, N., Ashworth, A., Irwin, J. J., Thompson, M. C., Gestwicki, J. E., von Delft, F., Shoichet, B. K., Fraser, J. S. & Ahel, I. Fragment binding to the Nsp3 macrodomain of SARS-CoV-2 identified through crystallographic screening and computational docking. Sci Adv 7, (2021).
[3] Erlanson, D. A., McDowell, R. S. & O’Brien, T. Fragment-based drug discovery. J. Med. Chem. 47, 3463–3482 (2004).
[4] Murray, C. W. & Rees, D. C. The rise of fragment-based drug discovery. Nat. Chem. 1, 187–192 (2009).
[5] Pearce, N. M., Krojer, T. & von Delft, F. Proper modelling of ligand binding requires an ensemble of bound and unbound states. Acta Crystallogr D Struct Biol 73, 256–266 (2017).
[6] Gahbauer, S., Correy, G. J., Schuller, M., Ferla, M. P., Doruk, Y. U., Rachman, M., Wu, T., Diolaiti, M., Wang, S., Jeffrey Neitz, R., Fearon, D., Radchenko, D., Moroz, Y., Irwin, J. J., Renslo, A. R., Taylor, J. C., Gestwicki, J. E., von Delft, F., Ashworth, A., Ahel, I., Shoichet, B. K. & Fraser, J. S. Structure-based inhibitor optimization for the Nsp3 Macrodomain of SARS-CoV-2. bioRxiv 2022.06.27.497816 (2022). doi:10.1101/2022.06.27.497816
Christian Macdonald
10 June 2022
tags:
#deij_jc
Background
A group of scientists within the Fraser lab have begun a journal club centered around issues of diversity, equity, inclusion, and justice within academia, specifically in the biological sciences.
Our goal is to provide an environment for continued learning, critical discussion, and brainstorming action items that individuals and labs can implement. Our discussions and proposed interventions reflect our own opinions based on our personal identities and lived experiences, and may differ from the identities and experiences of others. We will recap our discussions and proposed action items through a series of blog posts, and encourage readers to directly engage with DEIJ practitioners and their scholarship to improve your environment.
June 10th, 2022 – The STEM Pipeline
Discussion Leader: Chris Macdonald
Articles:
- Problematizing the STEM Pipeline Metaphor: Is the STEM Pipeline Metaphor Serving Our Students and the STEM Workforce? Cannady MA, Greenwald E, and Harris KN. DOI: 10.1002/sce.21108
- Reimagining the Pipeline: Advancing STEM Diversity, Persistence, and Success. Allen-Ramdial SAA, and Campbell AG. DOI: 10.1093/biosci/biu076
- Improving Underrepresented Minority Student Persistence in STEM. Estrada et al. DOI: 10.1187/cbe.16-01-0038
Bonus Article:
Planting Equity: Using What We Know to Cultivate Growth as a Plant Biology Community. Montgomery BL. DOI: 10.1105/tpc.20.00589
Summary
STEM graduates require extensive education, and progressively demand more specialized and advanced training. This has some implications for DEI work. One important one is that each educational level has compounding effects on the following ones. The common metaphor of a “STEM pipeline” has been used to capture this idea, where learners who move away from a STEM career trajectory are the leaks. In a DEI context, this means differential leakiness would be important to consider.
Metaphors can be useful by simplifying complex systems and helping us reason about them. That assumes they accurately capture the important dynamics of the system, however. If they don’t they can hinder our thinking. Some have claimed that the pipeline metaphor is such a case, challenging both its accuracy and the helpfulness of the interventions it suggests.
I picked these three papers because they critically evaluate the value and accuracy of the metaphor and suggest policies to achieve the outcomes we want (a diverse and equitable environment) but that might not come directly from thinking about leaks.
-[Cannady et al.] uses longitudinal data on students in the US to see if the metaphor is accurate, and claims it is not.
-[Allen-Ramdial et al.] builds off the inaccuracy of the metaphor and suggests policies that the “pipeline” might not suggest
-[Estrada et al.] is a product of the Joint Working Group on Improving Underrepresented Minorities (URMs) Persistence in Science, Technology, Engineering, and Mathematics (STEM), which was convened by NIGMS and HHMI. It is an example of how a large working group can adapt the criticisms of the previous two papers and propose policies to achieve an equitable environment.
As I was picking the papers for our discussion, I also thought about alternative metaphors we might use and whether they would help us think differently. I discovered the article by [Beronda L. Montgomery], which offered a wonderful example of a very different way of thinking about education that would lead us to do different things as a result.
Key Points:
- The metaphor may not be accurate: similar numbers of underrepresented minority students and non-underrepresented minority students enter STEM majors, and similar proportions remain through undergraduate education.
- The metaphor leads us to think that trajectories are strictly one way (you can’t unleak), while in fact there is much more fluidity in practice.
- The metaphor focuses our attention on individual failures (the leaks) rather than institutional ones (the pipes).
- There is an important distinction between an institution’s culture, which is essentially the beliefs, policies, and values that guide behavior, and its climate, which is the result of the actual implementation of them. An institution may have an unwelcoming or harmful climate while still having a healthy culture, but the pipeline metaphor focuses our attention on policy rather than implementation.
Open Questions:
- Is “STEM” a useful category, or is it too broad?
- What sorts of trajectories do “typical” successful scientists follow? What is the definition of “success” in STEM?
- What differentiates “leaky” institutions from others?
- How can we take the useful features of the pipeline metaphor and avoid the harmful ones?
- How does the overall educational landscape influence DEI efforts at the post-secondary levels and beyond?
Proposed Action Items:
We broadly agree with the policies suggested by [Allen-Ramdial et al.] and [Estrada et al.], although they are larger-scale interventions. In particular:
- Engage across institutions. Faculty at minority-serving institutions play essential but often ignored roles in diversifying STEM, and DEI initiatives at research-intensive institutions sometimes only engage with other research-intensive institutions. Programs that connect faculty across institutional boundaries can contribute to diversifying trainee access to career opportunities.
- Focus on aligning culture and climate. Ask how students and trainees feel, and listen to them. A failure of good intentions may be a result of both culture and climate.
- Take faculty involvement in DEI seriously. Effective and long-term DEI efforts are much more useful than broad but shallow activities. Institutions can encourage deep engagement by evaluating faculty DEI work on par with teaching and research.
- At an individual level, we found rethinking our metaphors can be a useful exercise. Ask yourself: what sort of environments would I like to create? Are the concepts I deploy sufficient to get there? Are they accurate? Are there alternatives?
Stephanie Wankowicz
10 May 2022
tags:
#how_to
Over the past two years I have done a bunch of structural bioinformatic work, resulting in the paper Ligand binding remodels protein side chain conformational heterogeneity. And I made A LOT of mistakes.
Below are many of the lessons, guidelines, and pitfalls for a structural bioinformatic analysis. While many of the principles below are specifically tailored to a paired analysis (such as apo versus holo or peptide bound versus small molecule bound), these guidelines can help with any structural bioinformatics project.
For specific suggestions, I have the code I created linked at the bottom of each section. This code is built on bash, python, Phenix/cctbx, and qFit. The code should be easily adaptable to other projects/inquiries. If there are any questions, feel free to contact me.
Define your selection criteria early.
Before you start downloading structures, you need to decide what structures you would like to highlight. Some of these items can be subsetted using the PDB advanced selection criteria, including:
- Method of structure (X-ray, CryoEM, NMR, Neutron, ect)
- Resolution
- Cryo or Room Temperature
- Size of the protein
- Type of ligands
- Single or multidomain proteins
You may also want to cross check these structures with external databases (ChemBL, Uniprot, ect). You can do much of this work on the PDB website in their advanced search section.
Once you get a list of structures with your initial criteria, you can parse the header of the PDB or get other statistics of PDB/density file from the MTZ file with a program like phenix.mtz_dump.
This is the stage where you will start creating pairs of structures. Some criteria you will want to think of at this stage include:
Unit cell dimensions and angles
Space group
Sequence (get this from the PDB and not from another database to know which residues were actually resolved in the structure).
Ligand types/crystallographic additives (how much overlap do you want between the paired structures)
Experimental methods such as crystallographic conditions (this will be tricker but may be important and worth it to go through headers manually).
At this stage, I suggest keeping duplicate pairs (ie if you have multiple apo or wildtype proteins for each holo or mutant proteins). Many structures will be thrown out downstream and it can be helpful to have ‘back ups’.
Here is a pipeline you can use to select the PDBs to move forward in your analysis.
Re-refine structures.
The PDB has a lot of structures refined with many different software packages and versions. To ensure that you are comparing apples to apples, pick one refinement software version and re-refine all of your structures.
The software that I used was phenix.refine.
Unless you know exactly how you want to refine your structures, spend some time with ~15 structures and play around with refinement strategies. Some things to think about:
Do you want different resolution cutoffs to have different refinement strategies?
Are you going to refine anisotropically or with hydrogens?
How are different refinement strategies impacting the R-free or R-gap of structures?
Once you have a refinement script you are happy to test your refinement script with ~50 PDBs, find errors and adjust from there. As the PDB files you are feeding your refinement strategy may be labeled in many different ways, you are likely going to have to build in flags as well as if statements to refine the structures.
If 80% of your structures are re-refined, move on. Send bugs to the respective software groups, and accept your losses (trust me, they are not worth it!).
Here is an example re-refinement pipeline that works with Phenix version 1.19.
Here is a pipeline that will re-refine your structures, run qFit, and then refine your qFit structure.
Quality control of structures.
The first, and easiest part of quality control of structures has to do with refinement metrics.
Are you decreasing the R-free or R-gap? This can be extracted through refinement log files or running additional analysis.
Are there any clashes in the structure?
Are there Ramachandran outliers?
With pairs, you want to assess how well they align together in 3D. Aligning them is a beast in and of itself. Due to structures with the same sequence having different chain ids or residue numbers, we will need to match those up as all downstream analyses will rely on this.
There are many different methods to align structures, but I landed with pymol, alpha carbon align. This did not work well for 100% of the paired structures I had but it is what worked for the majority of them.
I then required all chains start with residue 1. Then, as I was working with paired structures, I based all holo structures off of apo structures. Therefore, I reassigned the closest geometric chain in the holo to the chain in the apo.
Some additional criteria you will want to think about in this stage include:
How well do the backbones of structures line up? This can be assessed by alpha carbon RMSD between the structures. While some analyses may want to keep large changes, others may want to throw them out.
How much do the ligands overlap between the structures?
Here is a pipeline that will extract and compare R-values, align your pairs, and spit out alpha RMSD and ligand overlap between the pairs.
Analysis of structures
Now we get to the fun stuff!
Before we can run any analysis, you need to think about how you want to extract information from the structures. Are you going to do it based on chain and residue numbering or based on location. I choose the former as it is easier downstream. However, this required me to reassign the chain and residue numbers for many structures (see above).
The other thing to think about when comparing structures is if there are duplicates. In my case, I had multiple holo assigned to multiple apo. Therefore in the analysis, it was important to keep track of not just the PDB, but also the PDB’s matched pair.
Finally, you also need to consider how you are going to look at certain sections of the PDB. For example, I wanted to examine binding site residues. But my criteria (any residue heavy atom within 5A of any ligand heavy atom) sometimes gave me one or two different residues in the holo or apo depending on how much those residues moved. I decided to look at the union of those two lists, but you could also look at the intersection of those two lists.
Here are a bunch of analyses that I ran on my pairs or individual models.
Quality control of the analysis
For almost every single analysis I did, I would plot the result and have a few outrageous outliers. This was always a clue of something I coded wrong in the analysis, or something incorrect about the labeling of the PDBs.
When looking at the result of your analysis, always look at the minimum and maximum values on both an individual basis (ie if you are looking at some sort of residue metric), as well as on a structure basis. Take at least the top and bottom five metric values and go through checking for the following:
Is residue 1, chain A of structure 1 within your RMSD cutoff of residue 1, chain A of structure 2?
If you manually calculate the metric you are measuring, is it matching what your code says?
Look at the structure in Pymol or Chimera. Does the numerical value of the metric line up with what you are visualizing?
Repeat this process until you can visually/biologically explain at least the top and bottom five metric values.
Stephanie Wankowicz
06 May 2022
tags:
#deij_jc
Background
A group of scientists within the Fraser lab have begun a journal club centered around issues of diversity, equity, inclusion, and justice within academia, specifically in the biological sciences.
Our goal is to provide an environment for continued learning, critical discussion, and brainstorming action items that individuals and labs can implement. Our discussions and proposed interventions reflect our own opinions based on our personal identities and lived experiences, and may differ from the identities and experiences of others. We will recap our discussions and proposed action items through a series of blog posts, and encourage readers to directly engage with DEIJ practitioners and their scholarship to improve your environment.
Article: Addressing racism through ownership. Dutt, K. DOI: 10.1038/s41561-021-00688-2
Article: Black Scientists Are Not the Door to Diversity. Hayes, CA. DOI: 10.1021/acschemneuro.1c00375
Article: The Burden of Service for Faculty of Color to Achieve Diversity and Inclusion: The Minority Tax. Trejo J. DOI: 10.1091/mbc.E20-08-0567
Summary: Marginalized people are expected to dismantle oppressive systems that actively disenfranchise them, an expectation known as the Minority Tax. How does this tax impact people at different stages of their career, and how do we combat this expectation?
Key Points:
- Successful DEIJ work takes teamwork.
- There is a lack of respect for DEIJ work.
- This is most obvious in the deliberate exclusion of DEI work from traditional metrics of professional progress, such as promotion and funding.
- Universities are often recognized and praised for the contributions of individuals to DEIJ work.
- There is a lack of support for DEIJ work.
- This work is rarely done by an expert. Instead, those with lived experiences are often tasked with developing and implementing DEI efforts. This model results in mixed outcomes while allowing universities to claim they support DEIJ initiatives.
Open Questions
- How do we get more people involved in DEIJ work?
- What do you do with people who are not interested in contributing to DEIJ?
- How do we track and reward DEIJ effort among academic personnel?
- How do we evaluate the impact of diversity efforts in academia?
- Is it appropriate for basic scientists to create and lead DEIJ efforts?
Proposed Action Items:
- Estimate how much time you are spending on DEIJ work compared to others. Take note of who shows up to meetings, comes up with ideas, and executes those ideas.
- Increase the importance of service work, specifically DEIJ work, in tenure and promotion decisions.
- Provide material resources, such as hiring full-time staff or providing money for consultants, to implement DEIJ projects or initiatives.