As the saying goes, “chance favors the prepared mind.” The process of scientific discovery begins with a deep understanding of the knowns, such that we can address the unknowns. Each project I’ve worked on as a biochemist has required its own literature foundation, and as a scientist who likes to study a variety of proteins and work on multiple projects in parallel, I’ve developed a system for rapidly building a foundation in the scientific literature.
Entering a new scientific field is as exciting as it is challenging, and in my career, happens often. While the unifying theme is always structural biology and enzymology, the exact systems I’ve studied have been distinct and required a careful understanding of their specific scientific history, and really, what works and what doesn’t. The faster I can identify gaps in knowledge and develop a hypothesis, the faster I can get to the best part: testing it.
So, what are the right papers? Where are the papers? Who are the scientists in the field? What are the key discoveries?
Here is my method:
Define the field and focus
When it’s clear why I’m reading, it makes it that much easier to identify what to read. A project is constructed from three things: a core question and hypothesis, a set of methods, and a broader field. Defining the contents of those categories allows for targeted and goal-oriented reading.
If I am to use my PhD as an example, the core question of my project was: how can we comprehensively map MET kinase resistance mutations?
To address this question, there are several things I need to know, which for me spiral into an extended list of questions like: what kind of protein is MET? What is its role? In what receptor tyrosine kinase (RTK) family does it belong? What is the current status of pathologic mutation annotation in MET-associated diseases? How does resistance develop? What methods have been used to study MET, and what were the caveats? What model systems have been used to study MET? Is there a structure? How was that structure solved and what is the resolution? What are the motifs, domains, PTMs, protein-protein interactions? What is the state of the art for comprehensively identifying sensitizing and resistance mutations? How have these questions been addressed in other proteins? You get the point…
By outlining learning objectives in this way, I can mentally organize and group questions based on theme. From there, it is a matter of tackling each topic like a to-do list and generating an intellectually fulfilling reading strategy. For the questions above, this is how I might group and define them:
- MET kinase (core focus of project)
- Biochemistry
- Structure
- Model systems
- Disease mutations
- Deep Mutational Scanning (central method)
- DNA library construction
- Selection-based pooled screening
- NGS
- Coding & data analysis
- Molecules studied to date
- RTKs, protein kinases, signaling (broader field)
- Protein kinase phylogeny
- Phosphorylation relay
- RTK and protein kinase subfamilies
- Structure-function similarities
- Activation mechanisms
- Disease implications
Expect overlap when organizing. For instance, while reading for methods, there might be a paper that performed a deep mutational scan (DMS) on a different kinase, and found potential mechanisms of resistance through exhaustive mutagenesis and selection – two birds, one scone! This is an opportunity to learn more about my broader field in the context of the exact method I want to apply. I can focus on the results, caveats, data interpretation, and begin to develop a realistic picture of how things might work for my project.
In the situation where I’m in the early stages of conceptualizing a project and am independently generating a hypothesis, which is the position I’m mostly in now, I use this strategy in the context of a high-level problem, but this might need its own post in the future. Nevertheless, blueprinting required literature is the first step.
Scan and collect titles
Before deeply reading papers, I first collect papers. If there is even one paper already acting as a starting point, which I’ve gotten as a suggestion or otherwise, I will begin collecting titles from the references as a strong pre-filtered list. However, my favorite way to collect reading material is by simply searching keywords or questions in Google Scholar. For instance, to broadly understand crystallography, I’ll type “crystallography” into the search engine. If there is an abstract set of concepts in mind and I want to understand what exists, whatever it is, I’ll type that. Collecting titles is becoming easier and more reliable with AI tools, and my approach here is the same, but with more prompting, .bib importing, and PDF attaching.
The goal of this stage is coverage of the literature, which means that early, foundational papers are some of the most important. Therefore, I ensure my searches are based on relevance and not date. By sampling papers over time, discovery trends can be mapped out and potentially used to predict the next wave of research.
From there, I scan titles, the first two sentence snippets, authors, and date – again, to sample my reading across time and avoid recency bias. If the title looks relevant, I’ll open it and give the whole paper a visual scan. At this point, I’m looking at figure content, skimming the abstract, skimming the discussion, and taking note of authors, but not spending more than a couple minutes per paper on this process.
If the paper content looks relevant or interesting, I save the PDF through a paper manager (Zotero, Mendeley, Paperpile, etc.) using their browser extension. My philosophy here is to use the manager as a “paper bank.” Each project I work on is given a dedicated folder, and as I make my way through the literature, I keep the papers I want to reference and read again, but remove the ones I don’t.
When I’m building a literature foundation, it is a daily process of collecting and filtering. Reading volume varies as I trade off between deeply absorbing a handful of papers versus three dozen papers at lesser depth. Ultimately, my highest reading value is interest.
Attenuate attention
The most immediate outcome of reading primary research is the development of hypotheses, approaches, experimental implementation, and iteration. However, one of the most important long-term outcomes is a curated reference list for your next paper.
Reading papers word-by-word is unnecessary and can distract from developing the literature breadth needed to build a reference list. The key is sampling and reading enough to filter out the most relevant papers.
Interact with papers
An effective way to understand literature is to engage with it. Whether conceptualizing, leading, or joining a project, there will be unfamiliar topics. When reading for comprehension, stop when more information is needed.
Stop reading to understand acronyms, look up terminologies, and quickly find summaries of methods. If there is a larger concept that is unclear or piques interest, open and skim the cited works. Dig until ready to jump back into the paper. This can lead to a tangent of unexpected primary research reading, but that’s often when I branch out and discover the most across the scientific literature.
Interact with the literature based on how you practice science. Since I work with proteins, when I read a structure paper, one of the first things I do is open the cited PDB files and reference the model as I’m reading. I’ll highlight specific residues, ligands, and toggle between different visual representations. I’ll find all the structural models of the same protein through its UniProt reference ID, generate ensembles, and build a broader understanding of authors and methods used for a given protein. If there is complex data interpretation or visualization, I’ll visit the published repository and scan the code to understand the analysis process.
At this stage, it is also important to know who is in the field – learn who the authors are. What are their affiliations? What else have they written? Visit their websites, OrcID, Google Scholar, etc. There is a probability the authors might be at the next conference you’re attending.
Keep reading
After building a literature foundation, maintain it. Revisit core papers if a refresh or reframe is due. Expand or contract reference lists. Stay current with developments. To this day I use the Fraser Lab method of following the scientific literature, which I adopted during my time in the lab and highly recommend.
Then repeat the process in another scientific space! This keeps things exciting, fresh, and creative when ideas are founded from multiple scientific domains.
If you’ve made it this far, thanks for reading, and find this co-posted at Gabriella’s site!