Dr. ChatGPT Will See You Now

And even if it is right, an AI agent can’t complement the information it provides with the knowledge physicians gain through experience, says fertility doctor Jaime Knopman. When patients at her clinic in midtown Manhattan bring her information from AI chatbots, it isn’t necessarily incorrect, but what the LLM suggests may not be the best approach for a patient’s specific case.
For instance, when considering IVF, couples will receive grades for viability for their embryos. But asking ChatGPT to provide recommendations on next steps based on those scores alone doesn’t take into consideration other important factors, Knopman says. “It’s not just about the grade: There’s other things that go into it”—such as when the embryo was biopsied, the state of the patient’s uterine lining, and whether they have had success in the past with fertility. In addition to her years of training and medical education, Knopman says she has “taken care of thousands and thousands of women.” This, she says, gives her real-world insights on what next steps to pursue that an LLM lacks.
Other patients will come in certain of how they want an embryo transfer done, based on a response they received from AI, Knopman says. However, while the method they’ve been suggested may be common, other courses of action may be more appropriate for the specific patient’s circumstances, she says. “There’s the science, which we study, and we learn how to do, but then there’s the art of why one treatment modality or protocol is better for a patient than another,” she says.
Some of the companies behind these AI chatbots have been building tools to address concerns about the medical information dispensed. OpenAI, the parent company of ChatGPT, announced on May 12 it was launching HealthBench, a system designed to measure AI’s capabilities in responding to health questions. OpenAI says the program was built with the help of more than 260 physicians in 60 countries, and includes 5,000 simulated health conversations between users and AI models, with a scoring guide designed by doctors to evaluate the responses. The company says that it found that with earlier versions of its AI models, doctors could improve upon the responses generated by the chatbot, but claims the latest models, available as of April 2025, such as GPT-4.1, were as good as or better than the human doctors.
“Our findings show that large language models have improved significantly over time and already outperform experts in writing responses to examples tested in our benchmark,” Open AI says on its website. “Yet even the most advanced systems still have substantial room for improvement, particularly in seeking necessary context for underspecified queries and worst-case reliability.”
Other companies are building health-specific tools that are specifically designed for medical professionals to use. Microsoft says it has created a new AI system—called MAI Diagnostic Orchestrator (MAI-DxO)—that in testing diagnosed patients four times as accurately as human doctors. The system works by querying several leading large language models—including OpenAI’s GPT, Google’s Gemini, Anthropic’s Claude, Meta’s Llama, and xAI’s Grok—in a way that loosely mimics multiple human experts working together.
New doctors will need to learn how to both use these AI tools as well as counsel patients who use them, says Bernard S. Chang, dean of medical education at Harvard Medical School. That’s why his university was one of the first to offer students classes on how to use the technology in their practices. “It’s one of the most exciting things that’s happening right now in medical education,” Chang says.
The situation reminds Chang of when people started turning to the internet for medical information 20 years ago. Patients would come to him and say, “I hope you’re not one of those doctors that uses Google.” But as the search engine became ubiquitous, he wanted to reply to these patients: “You wouldn’t want to go to a doctor who didn’t.” He sees the same thing now happening with AI. “What kind of doctor is practicing at the forefront of medicine and doesn’t use this powerful tool?”