"Just do the thing"
When you’re starting out in a technical field, you often just “do the thing.”
The “thing” is usually whatever’s hot at the moment among your peers.
Sure, doing the “thing” can be valuable for getting started fast, but in many cases, it won’t actually solve problem.
Solving the problem
Often, people have a good story in their head of how their work will lead to the terminal goal of improving the world. Unfortunately, this story is often misguided.
Let’s go through how someone might think about global problems today:
Step 1: Figuring out there is a problem
Go ahead and take a quick moment to think about what is causing problems in today’s world.
…
Got a few? Great. Let’s consider three examples and stick with them:
- Global warming: Climate disasters and mass migration will cause thousands of deaths, both leading to geopolitical instability, so we need governments to reduce carbon emissions immediately.
- Uninterpretable general intelligence: Language models are uninterpretable and it’s important for us to understand what happens inside of them so we can change and stop AI agents as they manipulate financial markets or destroy a government’s cybersecurity.
- Human minds vs AI: Human minds won’t be able to compete with the ever growing intelligence of AI so we have to create brain-computer interfaces to improve human processing power to match.
Wait a second.
Did we just propose solutions to all three problems as we thought of them?
Yes, we did.
This is the curse of the “thing.”
This curse makes problems implicitly connected to a specific “thing” that will solve the problem. As you go into each field, you will find that each of the above solutions are relatively popular in their respective fields.
Step 2: Realizing the limits of these solutions
Yet, if we consider the viability of each approach:
- We’re far past the point of no return on global warming and atmospheric CO2 concentrations are already too high for us to solve it simply by reducing emissions1.
- Do we really gain insight from basic interpretability research, or are we working at the wrong layer of abstraction, giving ourselves a false sense of security? And is just understanding agents really the most effective way to stop them within the time frame we have in mind?
- Even with BCIs, humans won’t outcompete AI. A 10x faster interface (BCI) to a 1x brain still loses to a 10x intelligence with a 100x faster interface (APIs)?
Step 3: Accepting complexity and finding the right solutions
Surprisingly, global problems turn out to be complex! Who would have thought.
The real solutions require us to think back from our terminal goal to why the problems are truly problems:
- To maintain a growing standard of life for humans and animals while solving global warming, we need to invent new energy technologies, build more dams, implement urban flood protection and remove human and animal diseases2.
- Pythia was all about giving researchers model checkpoints to explore what happens with the weights as we train a foundation model. If you look at papers that cited Pythia, literally no one did this. What the heck? [EDIT: Thanks to Daniel Paleka for uncovering a paper that came out with the lead author of Pythia as senior author] And if we don’t even know the difference between pre-trained and fine-tuned models, how come so much of interpretability work is happening on fine-tuned or toy models? Why aren’t the absolute first questions of interpretability about this divide instead of zero of 200 popular research ideas3?4
- With the math from the last section, it looks like human minds won’t be able to compete even with brain-computer interfaces. Then we can safely discard this solution and pursue A) uploading human minds, B) creating aligned AI police agents or C) designing international controls for AI training5.
Do the thing.
Your goal, if you want to make real change, is to get to Step 3. Along the way, you might find people stuck at earlier stages. Don’t let what cursed them get to you.
- Evangelists, advocating for the “agreed-upon” solutions. They’re stuck at Step 1.
- Doomers, who understand the problem’s difficulty and give up on actually solving the problem. They’re stuck at Step 2.
The message here is simple: don’t get hijacked by obvious solutions, find the true challenges and make a real impact.
Now, go and “just do the thing”!
-
When I say point of no return, I mean we have passed the point where CO2 reductions will get us to pre-industrial levels of CO2. And as a side-note, it’s honestly pretty neocolonialist to say that developing nations should reduce their emissions when it’s exactly increased emissions that made developed nations developed. ↩
-
Notice that the fourth item isn’t even about global warming! “Why will this then solve global warming?” you might ask. Well, if people weren’t sick all the time, societies would develop. Earning enough for your family to eat then wouldn’t be the first thing on your mind as you wake up. Now you can happily spend brain cycles thinking about solutions to the problem. This is one of the reasons why I’m a big proponent of progress (and even national growth targets!). ↩
-
Absolutely no shade to Neel (the author), I think he does fantastic work <3 And to his credit, you can see that many questions in the table are actually about solving a problem (e.g. the “Getting rid of superposition” category). ↩
-
What I’m criticizing isn’t the theory of impact for interpretability research. I love that field and it was one of my early entries into AI safety. But the actual questions asked in the field are frankly unambitious and I’d want junior researchers to ask “what is the hardest problem in AI safety?” and go for that. Then you can always downgrade your interpretability project from there. ↩
-
There is a valid case to be made for human-brain interfaces which is that the other methods simply won’t work before human-level AGI is here and hence, it’s just our best shot. You can also argue that human mind upload will use the same technological development, though I would disagree. ↩