Another quick one while I continue my recuperatory week of doing as little as possible.
I recently got to see a fascinating sequence of microtalks—each maybe 5-10 minutes long—from a team of AI researchers that was much larger than I initially realized.
Each researcher raced through their set of increasingly technical slides before relaying the mic to the next speaker, and by the time Q&A began, I felt a bit overwhelmed.
Don’t worry, here’s my oversimplified summary.
So first, a bit of background: Organ donation is a moral mess. We never have enough, they can’t travel very far, and we want to send them where they’ll have the most impact—whatever that means.
On top of all that, there are about 1 zillion ways that organs get distributed unfairly due to every sort of financialized advantage/racialized disadvantage you can imagine, and many more you probably can’t.
So this team was trying to use AI to help human decision-makers allocate organs more fairly. Make sense?
That’s an amazing goal, and I’m sure the folks forced to make these snap decisions would appreciate any useful guidance they could get.
But how do you actually train an AI model to be fair?
Or even, fairer than us?
That’s where I’m still a bit fuzzy.
Basically, they fed their model a ton of popular fairness metrics, had it try weighting these in different ways, and then trained it on human feedback.
How does that work? Well, imagine being shown a ranking of transplant recipients spat out by the model, listed from highest to lowest priority, and then being asked something like:
How fair is this allocation of organs from 1 (very unfair) to 7 (very fair)?
Then keep tinkering until humans think the AI is very fair. Yay, we did it!
As you can imagine, this procedure is perfect, and this article is now over.
Anyway, at the end of all these microtalks they were like, thanks so much! Let us know if you have any questions! And I had given up writing more questions halfway through when I realized the micro-Q&As weren’t coming, so now I had to synthesize all my concerns into one question.
So I basically went, this has been a great series of talks. But because the Q&A is at the end, rather than after each individual talk, we can’t dig in to give any of you very granular feedback. By putting all the Q&A time at the end, you accidentally guaranteed that we could only engage with you at a summary level!
Similarly, if our AI model boils down everything it’s learned into One Single Ranking, we can’t actually interact with it in a very granular way. By combining everything to produce One Single Ranking, you accidentally guaranteed that physicians could only engage with it at a summary level.
And that means that instead of augmenting physicians’ ability to decide what to do—say, by providing a helpful dashboard with multiple pieces of information that doctors that could consult and try to incorporate into their own reasoning—you’ve boiled everything down to one number that threatens to automate away their clinical reasoning altogether!
By cutting physicians out of this weighting process and simply feeding them the model’s end result, you’ve all but guaranteed value capture.
And if your goal was to augment human decision-making…that’s bad, right?
Obviously, that’s a really tough question to answer—I’m getting weirdly good at throwing grenades into Q&As—but I’m still thinking through this basic issue.
If we genuinely want to augment human decision-making, rather than automating it away, we need to give humans useful toolboxes instead of the AI’s single ‘right’ answer.
But what could that look like?
And how much should we allow physicians to, say, customize what useful information they want to appear in their toolbox?
Thoughts welcome below, I’ll stop early so you have a chance to respond.
Interesting, why not just have a simple computer program to find all the closest matching people waiting for an organ (close enough for the people or the organ to meet up at an appropriate hospital).
These people would be listed anonymously (race, sex, etc. blinded), along with pertinent information such as their condition (some way to measure likelihood of success), age (again, potentially helps towards likelihood of success), how long they have been waiting for an organ, etc.
Clearly some work what is shown in the report and ordering of the people would need to be done.
Then the doctors could make a decision based on facts, instead of handing it off to a machine to decide. Easily done, much…