The system seemed to respond appropriately. But the answer did not take into account the height of the door, which could also prevent the passage of a tank or a car.
OpenAI CEO Sam Altman said the new robot could reason “a little bit.” But his reasoning ability fails in many situations. The older version of ChatGPT handled the question a bit better because it recognized that height and width matter.
You can pass standardized tests.
OpenAI said the new system could score in the top 10 percent of students on the Uniform Bar Examination, which rates lawyers in 41 states and territories. You can also score a 1,300 (out of 1,600) on the SAT and a five (out of five) on the Advanced Placement high school exams in biology, calculus, macroeconomics, psychology, statistics, and history, according to company tests.
Earlier versions of the technology failed the uniform bar exam and did not score as high on most Advanced Placement tests.
One recent afternoon, to demonstrate his test skills, Brockman asked the new robot a multi-paragraph entrance exam question about a man who runs a diesel truck repair business.
The answer was correct but full of legalese. Brockman then asked the robot to explain the answer in layman-friendly language. He also did that.
It is not good to discuss the future.
Although the new robot seemed to reason about things that already happened, it was less adept when asked to hypothesize about the future. It seemed to be based on what others had said rather than creating new conjectures.
When Dr. Etzioni asked the new robot: “What are the important problems to solve in NLP research over the next decade?” – Referring to the kind of “natural language processing” research that drives the development of systems like ChatGPT – he was unable to formulate entirely new ideas.
And it’s still amazing.
The new robot still invents things. The problem, called “hallucination,” haunts all major chatbots. Because the systems do not understand what is true and what is not, they can generate text that is completely false.
When asked for the addresses of websites describing the latest cancer research, he sometimes generated Internet addresses that did not exist.