News from the Fraser Lab

AlphaFold3 Validation and the Role of Journals

James Fraser
22 May 2024
tags: #publishing

Like many others, I was disappointed with the lack of code, or even executables accompanying the publication of AlphaFold3 in Nature. This made it impossible to test the most exciting claim of the paper: impressive performance predicting the structures of proteins bound to novel ligands. I was even more upset to learn that my colleague Roland Dunbrack was “ghosted” after he submitted his initial review.

We organized an open letter to Nature, questioning why a journal would fail to enforce its written policies. In doing so, Nature implies that it enforcess those policies inequitably and to the detriment of the overall scientific community.

In response to this letter, Deepmind announced they would release the code in 6 months. This was a reversal from their previous quote (from a Nature News article):

””““We have to strike a balance between making sure that this is accessible and has the impact in the scientific community as well as not compromising Isomorphic’s ability to pursue commercial drug discovery,” says Pushmeet Kohli, DeepMind’s head of AI science and a study co-author.”””

Now Nature has replied in an unsigned editorial, saying it wants to be “in conversation” with our community around ensuring openness of the research ecosystem. This editorial focuses a lot on code disclosure. Journals want to play an important role in the research ecosystem going forward and have established that they will perform some valuable services:

coordinating peer review
performing ethics checks
ensuring data and code are properly deposited

The editorial makes it seem like this matter is only about a small execption regarding the code. And writes that the release of the code after a 6 month delay, spurred by the open letter and community outcry - not Nature, is:

“… an important step, and Nature will update the published paper once the code is released.”

Yet still, we cannot validate the most fundamental claims about protein ligand predictions. I find Nature’s description of the server disingenuous:

“The basics of how the community can use the new version of AlphaFold remain the same: anyone with a Google account can use the tool for free, for non-commercial applications.”

and

“In addition to the non-availability of the full code, there are other restrictions on the use of the tool — for example, in drug development. There are also daily limits on the numbers of predictions that individual researchers can perform.”

The server is restricted to 20 natural metabolites and ions. We still cannot even reproduce the figures of the paper.

Obviously, many companies want the Nature “stamp” of approval - this editorial shows, nakedly, that this “stamp” is a toxic part of our current research ecosystem, one that bends easily to corporate interests and applies inequitable standards. The canard that the private sector won’t publish if they don’t let companies play by different rules is particularly problematic. Nature broke their peer review process here (see Roland Dunbrack’s experiences) and with a little bit of community pressure, the authors changed course and promised an eventual code/executable release.

What’s the solution going forward? We can raise the bar! Academics should push the envelope in data and code disclosure alongside preprints with open review. Companies can also lead by example (see Arcadia Science, Pat Walters, and others) by doing a 1st class job of disclosing data outside journals.

I’m optimistic about the scientific ideas presented in the AF3 paper. It’s an exciting time for AI and biosciences. Let’s make the future get here faster by building on each others work!

IT suggestions for new faculty

Daniel Hogan
22 April 2024
tags: #it

This is an opinionated guide for how to set up IT infrastructure for a new lab. It assumes that you have at least some computing background, though it should be possible to follow along without one if you do a bit of research whenever you encounter something you don’t understand.

Web domain

Create an AWS account with your personal (i.e., not .edu) email address. Tie your personal credit card to the account so that it’s clear that you’re the one paying and that it’s not owned by the university. (Domain registration for .com and .org is under $15 per year)
After logging in, navigate to “Route 53” (the name of the AWS domain registration interface)
Register a domain name. I recommend sticking to either a .com or a .org domain. (.edu bars registrations for anything other than an accredited school)
- Aim for something short, memorable, and lacking weird characters (preferably only the 26 letters)
- Avoid having your university name in case you transfer to a different one sometime in the future
- Choose carefully, since it’s a giant pain to change domains once you start using them

I strongly recommend going with AWS over other providers like GoDaddy or Namecheap since they’re a multi-billion dollar business that won’t be going anywhere for decades. Moving domains between providers is possible, but annoying. AWS also has a reasonable API in case you want to do more advanced things in the future, like programmatically updating entries.

Website

Create a personal Github account if you don’t already have one. Github user names and organization names occupy the same namespace, so you’ll need two different names (I’m calling them “example” and “examplelab” in the below examples)
Create a free organization for your lab
Create a repository for the examplelab organization (not for your user account) called “examplelab.github.io”
Follow these instructions to create DNS records in Route 53 so that example.org loads the Github page automatically

I recently set up a new website for the Manglik lab here, which is generated by this Github repository. There is a single source of truth for lab members, _data/authors.yml, which is used both to generate the members page and set blog post authorship. To avoid any repetition, the publication list is generated from blog posts where the front matter contains publication: true, making it possible to both have standard blog posts and ones that announce publications while simultaneously creating an entry on the publications list. See here for an example. I recommend starting with the Manglik lab website as a template, and editing the contents as appropriate since it’s much cleaner than the repository that generates the Fraser lab webite; you’ll only need to edit the following:

_data/authors.yml to include your lab members
_pages/about.md to include your contact info
_pages/members.md to edit the “Joining” section
_pages/publications.md to edit the Pubmed link to your own name
the contents of research_/ to set your research interests
the contents of assets/images/ (but not assets/css/)
CNAME to match the URL of your website
the paths in README.md
the top few entries in _config.yml to set the site name, PI info, and site description
the contents of _posts/, which generates both the blog posts and publications list as described above

Theoretically, Github limits these pages to less than 1 GB (which you hit surprisingly quickly once you start adding article PDFs or high-res images), but I don’t think they enforce it. Ideally, you’ll want to host anything over a couple MB separately, but that’s kind of a pain. Generally, Github and large files don’t play friendly since Git maintains an append-only history which begins to add up when you’re adding and removing files.

Email

Consider registering for an email service on your domain so that you’re not tied to your university’s email infrastructure. It will be hard to find one that less than $5 per user per month, which will add up quickly. Most email services don’t support archiving accounts, so you’ll be paying that amount forever unless you’re okay with deleting everything.

I personally like Fastmail, which has a nice, snappy user interface, excellent support, and a free 30 day trial. Its family plan supports up to 6 users for a flat $11 per month if paid yearly (discounted if you subscribe for longer). They also pro-rate unused subscriptions, so when you exceed 6 users you can transition to a business plan that scales to an unlimited number of users at $5 per user per month without wasting money.

Topicbox

Topicbox is an email-based service that has inboxes designed to be shared. It’s $15 per month for up to 50 users and for any number of virtual addresses, and there’s a three month free demo. I recommend this since you can create one virtual inbox per vendor or group of people.

For example:

labmanager@example.org, so that you have an address that doesn’t change when your lab managers change
dms@example.org for all the people working on DMS in the lab
ni@example.org for all your LabVIEW licenses, so that you don’t have to email whoever originally registered for the account after they’ve left the lab
thermo@example.org for all your Thermo-Fisher warranty info

There’s a web interface that shows all the emails received for each virtual address. Additionally, you can set it up so that users can subscribe to any subset of the various virtual inboxes and automatically receive a copy of any email received by those addresses.

You’ll need to follow the directions here to use your own domain instead of a topicbox.com domain. You can’t easily host both the user emails mentioned above and the shared virtual addresses on the same domain due to limitations on how email routing works, so I suggest hosting it on a subdomain (e.g., box.example.org) while your primary email is hosted on the main domain.

Lab wiki

Lab wikis are great for storing general lab info, like an onboarding guide. I really like Wiki.js, since you can set it up to sync with git; this allows you to update the wiki similarly to how you update your website in addition to the built-in editor. The wiki files are all plain-text which means that it’s reasonably browsable through Github in case Wiki.js ever stops being developed and easy to port to a different wiki engine (e.g., docuwiki or mediawiki, which powers Wikipedia) if you ever want to. You’ll have to host it yourself, which can be done either on DigitalOcean using an image pre-configured by the Wiki.js developers following this official guide or on AWS by following this community guide.

Right now, I’m working on setting up a pipeline that will enable us to take a recording from a meeting, convert it to text using a speech-to-text model, label each sentence by speaker, summarize it a couple paragraphs using the open-source Mixtral 8x7B, then upload it to the fully-searchable wiki by creating a git commit that’s pushed to Github and synced to the wiki, all without human intervention. If you’re a member of the Fraser lab, you can check this out for an example.

Instant messaging

Uhh… Welcome to the land of only bad options, approximately ordered from least bad to terrible:

Element:
- Pros: open source, based on standards, can be hosted on a custom domain, has quality mobile and desktop versions, and smooth collaboration with users on other servers (if you can find any outside of Wikipedia and open-source projects)
- Cons: you’ll have to host it yourself (which means that you can’t send or receive messages if something breaks) or pay someone ~$5 per user per month to host it for you
Signal:
- Pros: popular, open source, and free (and likely to stay free since it’s supported by a non-profit foundation)
- Cons: designed mostly for phones (though there is a desktop app that can be linked to your phone’s account) and lacks separation between personal use and business use
Discord:
- Pros: free (at least for now), and has quality mobile and desktop versions
- Cons: designed more for gaming and voice chat than business, and likely not to stay free forever
Slack:
- Pros: popular with a refined UI
- Cons: nearly $10 per user per month if you want access to messages older than 90 days and cross-workspace collab is cumbersome
Teams: absolutely not

Fraser Lab DEIJ Journal Club - Gender Disparities in Academic Retention

Stephanie Wankowicz
31 January 2024
tags: #deij_jc

Background
Our journal clubs aim to provide an environment for continued learning and critical discussion. Based on the discussion, we also brainstorm action items that individuals and labs can implement. Our discussions and proposed interventions reflect our opinions based on our identities and lived experiences. Consequently, they may differ from the discussions held by those with other identities and/or experiences. This journal club took place among the entire Fraser lab. Due to the size of the lab, we split into three groups. Each group had unique but overlapping conversations.

Discussion Leader: Mohamad Dandan, Daphne Chen, Tushar Raskar

Articles: Gender and retention patterns among U.S. faculty

Summary and Key Points: In academia, there is a notable gender imbalance. Despite significant strides in acquiring doctoral degrees, women remain underrepresented in tenure-track faculty positions1,2. Further, and even more surprising, this gap tends to increase as the tenure-track stage increases (assistant professor to full professor)3. This issue is more acute in prestigious institutions. This paper presents data showing that the commonly held belief that this disparity stems mainly from work-life balance is a misconception. Rather, the paper reveals that workplace climate and culture are significant factors.

To identify the underlying contributions to why women tend to leave tenure-track positions, the authors split the reasons for leaving into ‘pushes’ or ‘pulls’. Pushes include workplace climate (including gendered harassment), work-life balance, or work-related reasons (funding issues). Pulls are recruitment for attractive external positions. While push reasons are more common overall, these are more common for women. However, the challenge in addressing these disparities lies in the subjective nature of what constitutes push and pull factors in an academic career. Personal life experiences heavily influence perceptions of these factors, making it challenging to devise universal solutions. For example, people have different expectations of work-life balance, and these expectations are likely to change over time. Second, there is an often subtle difference in how conversations, friendships, and collaborations exist between two parties of the same or different gender. How to mitigate these differences is difficult as the reasoning for them is multifaceted. A conscious effort is needed, particularly from men, to be mindful of these dynamics. Such awareness and a willingness to step back can contribute significantly to narrowing the gender gap and creating a more balanced and inclusive academic environment. Below are some open questions we have after reading this article.

Open Questions:

How does the definition of work-life balance vary among individuals in academia, and what strategies can institutions implement to respect these varying needs?
In what ways does parenthood influence academics’ decisions to leave the field, and what alternatives do they often consider?
How does having a dual-academic career impact the attrition rates, especially regarding gender differences?
Why is there a larger attrition gap among women who are full professors, contrary to expectations?
How does the prestige of an academic institution affect the attrition rates of faculty members?
Would improving gender parity within departments alter the current distribution of academic attrition rates?
What specific challenges and dynamics do parents in academia face, and how do these challenges differ from non-parent academics?
How can academic institutions better support dual-career academic couples to mitigate gendered attrition?

Proposed Action Items:

Explore the possibility of peer mentoring at higher levels of academia, especially for professors in tenure-track positions.
During career development conversations, mentors could discuss potential careers that mentees are interested in in terms of their pushes and pulls. Reframing potential careers in terms of pushes and pulls may make for a more nuanced conversation compared to simple pros/cons, since pushes and pulls ask the mentee to prioritize which values and activities are more important for their particular career goals.

Citations:

Wapman, K. H., Zhang, S., Clauset, A. & Larremore, D. B. Quantifying hierarchy and dynamics in US faculty hiring and retention. Nature 610, 120–127 (2022).
National Center for Science and Engineering Statistics, Women, minorities, and persons with disabilities in science and engineering 2021 (2022).
Kaminski, D. & Geisler, C. Survival analysis of faculty retention in science and engineering by gender. Science 335, 864–866 (2012).

Chalk talk guideines

James Fraser
24 January 2024

We recently drafted this guide for the chalk talks for faculty candidates to our search in BTS. We thought it might be useful to others, so we are posting it here.

The goal of the chalk talk is to get a sense of the potential directions of your lab. A good general rule is to map out the major themes and questions your lab will address in the first 5-7 years. The audience is faculty-only.
We will start with about 5 minutes uninterrupted for you to introduce the area and provide any brief background highlights. If you would like this 5 minutes to be from a projected set of slides, let us know and we will make arrangements. Alternatively, the entire content can be whiteboard. During the intro, we want to know:
- What is your vision?
- Why is it new, exciting, and important?
- Why are you the right person to execute on it?
Our preferred format for the rest of the chalk talk is on whiteboard. We will provide markers and erasers. The rest of the talk generally, but not always, consists of mapping out 2-3 projects.
We will make sure you have ~15 minutes in the room to prepare and, if you choose, to pre-write anything on the board. Of course, you are welcome to refer to your notes during the chalk talk.
During the chalk talk, most questions will follow from the scientific content of your presentation. In addition, some faculty may ask you how these broad directions map to a grant strategy or to potential projects for a student/postdoc - but it is not essential to structure the directions around those potential questions.
Our goal is for the chalk talk to be a constructive brainstorming conversation that is full of new ideas!

How to moderate a session at a meeting

James Fraser
07 November 2023

To effectively moderate sessions at a meeting or a graduate program retreat, it’s crucial to manage time efficiently to ensure that the event runs smoothly and on schedule. This involves clear communication with presenters about the session rules, managing Q&A sessions judiciously, and being prepared to enforce time limits with a firm but fair approach. The goal is to create an environment where each speaker has their allotted time respected, the audience remains engaged, and the overall program adheres to its intended timeline. Here’s a guide I wrote for the recent QBC retreat for student moderators.

Days-hours before the session send the presenters an email to instruct them about the “rules” for the session.
Prior to gathering speakers figure out how questions will work. Will people just shout from their seats, will there be a microphone runner passing the microphone to people in their seats, or a few microphone stands for people to queue up at.
Gather your speakers in the break before the session starts and ensure their laptops plug into the AV system. Tell each of them the following rules:
- When you will give them a warning wave that they have X minutes remaining (usually 1 for a 5-10 min talk, 2 for a 10-20 min talk, 5 for a 40 min talk)
- That you will stand up when they are at time (I often also threaten with a beach ball or some object at full time)
- That their laptop will be unplugged if they exceed the time of the talk + questions
- To plug in the laptop of the next speaker while the preceding speaker is answering questions
- Note that powerpoint sometimes has issues when the presentation is already in full screen when you plug into the projector. Better to start out of presentation mode and start it AFTER plugging into the projector.
There isn’t a need to do a long introduction for a session with multiple speakers (I reserve long intros for keynotes or single seminars). Simply state - our next speaker is X. Most speakers will start with their title anyways - so no need for you to read it. Keep it moving!
At the end of the talk, you will stand up and moderate the q/a portion. They only get questions if they finish their talk with enough time remaining. If they went over time, NO QUESTIONS! If they encroach into the question time, then limit questions to 1 or 2.
- if there are no questions after an awkward beat… YOU MUST ASK A QUESTION. It can be as simple as:
  1. I didn’t understand X, can you explain it again
  2. What would you do next?
  3. What is the type of data that can’t currently be collected, but you dream would answer this question
- Chose audience members to ask questions:
  1. favour learners (postdocs/students), especially for the first question.
  2. keep in mind diversity of who gets to ask questions
  3. cut off questions at the full time with the line “It is wonderful to see such enthusiasm. Speaker X will be around later to answer questions. Our next speaker is Y.”

Here is an example email I sent for the Protein Society this summer, where I moderated a session:

Looking forward to meeting at the upcoming Protein Society meeting. As the session moderator for “RNA-Protein Machines: Ancient Synergies”, I am passing along some of the instructions here: 1. Session Preparation: Please make sure to be present in the session room at least 15 minutes before the scheduled start time. This will allow us to coordinate and ensure that there are no A/V hiccups. 2. Time Management: To maintain the session’s schedule, it is essential that each speaker starts and ends their presentation on time. I am an “activist moderator” and will cut you off if you go over time (maybe with some kind of beach ball or water gun)! 3. The following time limits have been set for the respective presentation types:

Senior Talks: 25 minutes for the presentation + 5 minutes for discussion
Young Investigator Talks: 12 minutes for the presentation + 3 minutes for discussion
Flash Talks: 2 minutes each for introducing your research/poster (with no Q&A session) Let me know if you have any questions and I look forward to a great session!