News from the Fraser Lab


Developing a foundation in the scientific literature

Gabriella Estevam
26 May 2025
tags: #teaching

As the saying goes, “chance favors the prepared mind.” The process of scientific discovery begins with a deep understanding of the knowns, such that we can address the unknowns. Each project I’ve worked on as a biochemist has required its own literature foundation, and as a scientist who likes to study a variety of proteins and work on multiple projects in parallel, I’ve developed a system for rapidly building a foundation in the scientific literature.

Entering a new scientific field is as exciting as it is challenging, and in my career, happens often. While the unifying theme is always structural biology and enzymology, the exact systems I’ve studied have been distinct and required a careful understanding of their specific scientific history, and really, what works and what doesn’t. The faster I can identify gaps in knowledge and develop a hypothesis, the faster I can get to the best part: testing it.

So, what are the right papers? Where are the papers? Who are the scientists in the field? What are the key discoveries?

Here is my method:

Define the field and focus

When it’s clear why I’m reading, it makes it that much easier to identify what to read. A project is constructed from three things: a core question and hypothesis, a set of methods, and a broader field. Defining the contents of those categories allows for targeted and goal-oriented reading.

If I am to use my PhD as an example, the core question of my project was: how can we comprehensively map MET kinase resistance mutations?

To address this question, there are several things I need to know, which for me spiral into an extended list of questions like: what kind of protein is MET? What is its role? In what receptor tyrosine kinase (RTK) family does it belong? What is the current status of pathologic mutation annotation in MET-associated diseases? How does resistance develop? What methods have been used to study MET, and what were the caveats? What model systems have been used to study MET? Is there a structure? How was that structure solved and what is the resolution? What are the motifs, domains, PTMs, protein-protein interactions? What is the state of the art for comprehensively identifying sensitizing and resistance mutations? How have these questions been addressed in other proteins? You get the point…

By outlining learning objectives in this way, I can mentally organize and group questions based on theme. From there, it is a matter of tackling each topic like a to-do list and generating an intellectually fulfilling reading strategy. For the questions above, this is how I might group and define them:

  • MET kinase (core focus of project)
    • Biochemistry
    • Structure
    • Model systems
    • Disease mutations
  • Deep Mutational Scanning (central method)
    • DNA library construction
    • Selection-based pooled screening
    • NGS
    • Coding & data analysis
    • Molecules studied to date
  • RTKs, protein kinases, signaling (broader field)
    • Protein kinase phylogeny
    • Phosphorylation relay
    • RTK and protein kinase subfamilies
    • Structure-function similarities
    • Activation mechanisms
    • Disease implications

Expect overlap when organizing. For instance, while reading for methods, there might be a paper that performed a deep mutational scan (DMS) on a different kinase, and found potential mechanisms of resistance through exhaustive mutagenesis and selection – two birds, one scone! This is an opportunity to learn more about my broader field in the context of the exact method I want to apply. I can focus on the results, caveats, data interpretation, and begin to develop a realistic picture of how things might work for my project.

In the situation where I’m in the early stages of conceptualizing a project and am independently generating a hypothesis, which is the position I’m mostly in now, I use this strategy in the context of a high-level problem, but this might need its own post in the future. Nevertheless, blueprinting required literature is the first step.

Scan and collect titles

Before deeply reading papers, I first collect papers. If there is even one paper already acting as a starting point, which I’ve gotten as a suggestion or otherwise, I will begin collecting titles from the references as a strong pre-filtered list. However, my favorite way to collect reading material is by simply searching keywords or questions in Google Scholar. For instance, to broadly understand crystallography, I’ll type “crystallography” into the search engine. If there is an abstract set of concepts in mind and I want to understand what exists, whatever it is, I’ll type that. Collecting titles is becoming easier and more reliable with AI tools, and my approach here is the same, but with more prompting, .bib importing, and PDF attaching.

The goal of this stage is coverage of the literature, which means that early, foundational papers are some of the most important. Therefore, I ensure my searches are based on relevance and not date. By sampling papers over time, discovery trends can be mapped out and potentially used to predict the next wave of research.

From there, I scan titles, the first two sentence snippets, authors, and date – again, to sample my reading across time and avoid recency bias. If the title looks relevant, I’ll open it and give the whole paper a visual scan. At this point, I’m looking at figure content, skimming the abstract, skimming the discussion, and taking note of authors, but not spending more than a couple minutes per paper on this process.

If the paper content looks relevant or interesting, I save the PDF through a paper manager (Zotero, Mendeley, Paperpile, etc.) using their browser extension. My philosophy here is to use the manager as a “paper bank.” Each project I work on is given a dedicated folder, and as I make my way through the literature, I keep the papers I want to reference and read again, but remove the ones I don’t.

When I’m building a literature foundation, it is a daily process of collecting and filtering. Reading volume varies as I trade off between deeply absorbing a handful of papers versus three dozen papers at lesser depth. Ultimately, my highest reading value is interest.

Attenuate attention

The most immediate outcome of reading primary research is the development of hypotheses, approaches, experimental implementation, and iteration. However, one of the most important long-term outcomes is a curated reference list for your next paper.

Reading papers word-by-word is unnecessary and can distract from developing the literature breadth needed to build a reference list. The key is sampling and reading enough to filter out the most relevant papers.

Interact with papers

An effective way to understand literature is to engage with it. Whether conceptualizing, leading, or joining a project, there will be unfamiliar topics. When reading for comprehension, stop when more information is needed.

Stop reading to understand acronyms, look up terminologies, and quickly find summaries of methods. If there is a larger concept that is unclear or piques interest, open and skim the cited works. Dig until ready to jump back into the paper. This can lead to a tangent of unexpected primary research reading, but that’s often when I branch out and discover the most across the scientific literature.

Interact with the literature based on how you practice science. Since I work with proteins, when I read a structure paper, one of the first things I do is open the cited PDB files and reference the model as I’m reading. I’ll highlight specific residues, ligands, and toggle between different visual representations. I’ll find all the structural models of the same protein through its UniProt reference ID, generate ensembles, and build a broader understanding of authors and methods used for a given protein. If there is complex data interpretation or visualization, I’ll visit the published repository and scan the code to understand the analysis process.

At this stage, it is also important to know who is in the field – learn who the authors are. What are their affiliations? What else have they written? Visit their websites, OrcID, Google Scholar, etc. There is a probability the authors might be at the next conference you’re attending.

Keep reading

After building a literature foundation, maintain it. Revisit core papers if a refresh or reframe is due. Expand or contract reference lists. Stay current with developments. To this day I use the Fraser Lab method of following the scientific literature, which I adopted during my time in the lab and highly recommend.

Then repeat the process in another scientific space! This keeps things exciting, fresh, and creative when ideas are founded from multiple scientific domains.

If you’ve made it this far, thanks for reading, and find this co-posted at Gabriella’s site!


The Tortured Proteins Department, Episode 3

James Fraser
16 May 2025
tags: #podcast

The third episode of The Tortured Proteins Department is out now!

We chatted about grant cancellations, exciting regional meetings and reunions, two fun new preprints, community norms around code release, and the importance of giving kudos.

The pre-prints discussed in this episode:


The Tortured Proteins Department, Episode 2

James Fraser
15 April 2025
tags: #podcast

The second episode of The Tortured Proteins Department is out now!

We chatted about the continued chaos of science infrastructure and dived into some cool science from recent meetings Jaime and Stephanie attended. We introduce two new segments: preprints and how to do science.

The pre-prints discussed in this episode:


The Tortured Proteins Department, Episode 1

James Fraser
15 March 2025
tags: #podcast

Former lab member Stephanie Wankowicz and I have started a podcast called the Tortured Proteins Department. The first episode is out now!

We discuss the declining support of science in the US and how it may impact the future of graduate science education.


AlphaFold3 Validation and the Role of Journals

James Fraser
22 May 2024
tags: #publishing

Like many others, I was disappointed with the lack of code, or even executables accompanying the publication of AlphaFold3 in Nature. This made it impossible to test the most exciting claim of the paper: impressive performance predicting the structures of proteins bound to novel ligands. I was even more upset to learn that my colleague Roland Dunbrack was “ghosted” after he submitted his initial review.

We organized an open letter to Nature, questioning why a journal would fail to enforce its written policies. In doing so, Nature implies that it enforcess those policies inequitably and to the detriment of the overall scientific community.

In response to this letter, Deepmind announced they would release the code in 6 months. This was a reversal from their previous quote (from a Nature News article):

””““We have to strike a balance between making sure that this is accessible and has the impact in the scientific community as well as not compromising Isomorphic’s ability to pursue commercial drug discovery,” says Pushmeet Kohli, DeepMind’s head of AI science and a study co-author.”””

Now Nature has replied in an unsigned editorial, saying it wants to be “in conversation” with our community around ensuring openness of the research ecosystem. This editorial focuses a lot on code disclosure. Journals want to play an important role in the research ecosystem going forward and have established that they will perform some valuable services:

  • coordinating peer review
  • performing ethics checks
  • ensuring data and code are properly deposited

The editorial makes it seem like this matter is only about a small execption regarding the code. And writes that the release of the code after a 6 month delay, spurred by the open letter and community outcry - not Nature, is:

“… an important step, and Nature will update the published paper once the code is released.”

Yet still, we cannot validate the most fundamental claims about protein ligand predictions. I find Nature’s description of the server disingenuous:

“The basics of how the community can use the new version of AlphaFold remain the same: anyone with a Google account can use the tool for free, for non-commercial applications.”

and

“In addition to the non-availability of the full code, there are other restrictions on the use of the tool — for example, in drug development. There are also daily limits on the numbers of predictions that individual researchers can perform.”

The server is restricted to 20 natural metabolites and ions. We still cannot even reproduce the figures of the paper.

Obviously, many companies want the Nature “stamp” of approval - this editorial shows, nakedly, that this “stamp” is a toxic part of our current research ecosystem, one that bends easily to corporate interests and applies inequitable standards. The canard that the private sector won’t publish if they don’t let companies play by different rules is particularly problematic. Nature broke their peer review process here (see Roland Dunbrack’s experiences) and with a little bit of community pressure, the authors changed course and promised an eventual code/executable release.

What’s the solution going forward? We can raise the bar! Academics should push the envelope in data and code disclosure alongside preprints with open review. Companies can also lead by example (see Arcadia Science, Pat Walters, and others) by doing a 1st class job of disclosing data outside journals.

I’m optimistic about the scientific ideas presented in the AF3 paper. It’s an exciting time for AI and biosciences. Let’s make the future get here faster by building on each others work!