Meta got caught ai benchmarks game with llama 4

Over the weekend, meta decreases two new 4 model llama: Models are smaller than Scout, and maverick, medium-sized mode can overcome GPT-4O and Gemini 2.0

Maverick has ensured to LIMERAY 2 quickly, AI Benchmark website where humans compares the results from different systems and votes the best. In Meta’s NewsThe company has highlighted the Elaversick’s score of 1417, which laid it over the 4o of Openai and only under Gemini 2.5 Pro. (A higher Elo score means that the model won frequently in the stadium when to head to the opponent.)

Success seems to make the Meta 4th chaos 4 as a serious challenge for artistic form, Openai, Anthropic, and Google. Then researcher AI is digging up the Meta’s file discovered something unusual.

In good prints, Meta recognizes that version of the Test Maverick in Lmasea is not the same as what is available to the people. According to Meta’s material, it applies “Trial Chat version” Maverick to Lmamera that is especially updated “best for discussion,” Techcunch Home Report.

“The interpretation of the meta of our policy does not match what we expect from the provider,” Lmateana Down In X two days after the release model. ຜົນໄດ້ຮັບ “Meta ຄວນເຮັດໃຫ້ມັນແຈ່ມແຈ້ງກວ່າວ່າ ‘Llama-4-Maverick-03-26 -26 -26 -26 -26 -23-26-

Meta spokesman, Ashley Gabriel, said in a statement received that “we try all kinds of preferences.”

We tried the good work in Lmanenea, “Now we have released the open version of Llama 4 for the case they will see and hope they are going to have a care.”

While what meta has done to Meaverick is not clear to Lmamera rules, the facilities shared anxiety About playing gambling systems And carry out procedures to “prevent unproteable and standard leaks.” When the company can send advanced version of testing while different rating towards Lmasea as the Indicator of the actual performance.

“It’s the best general term because all others suck,” AI Research Independence the verge. “When Llama 4 came out, the fact that it came to the second time at 2 years of 2,5 Pro;

Shortly after Meta released Maverick and Scout, AI community began Talk about rumors At Meta also trained 4 his model Llama to perform better tasks while hiding their limitations. The VP of Generator Ai at Meta, Ahmad Al-Dahle, mentioned In Post on X: “We also hear the claims that we are trained in the test suit – that is not true and we will not have the quality of the preference to stability.”

“It’s a very complicated release in general.”

Some Also observed That llama 4 has been released in the odd time. Saturday no tendency as the big news AI declined. After someone to ask why Llama 4 is released on the weekend, meta CEO CEO CEO CEO CEO CEO Marketberg Answer: “That’s when it’s ready.”

Willi, Wateison is generally, “Wateison Closely follow up and file ai document. “Model score we get it worth with me. I can’t use a model they get high scores.”

Meta’s meta route to release Llama 4 is not simple. Rall Report previously From InformationThe company encouraged reactor due to the form that does not meet internal expectations. Those expectations are high especially higher than Deepseek, Open-source-source AI startup from China, get a vast form that makes a vast form.

In the end, using the best model in Lmamaaa putting a developer in a difficult position. Once chosen models such as Llama 4 for their applications, they see according to the standard for guidance. But as the maverick case, those criteria can reflect the actual ability in which public style can access.

While the expedition AI, now shows the criteria to become a battlefield. It also shows how meta is eager to be viewed AI leader, despite means of the system.

Update, April 7: The story was updated to add a meta statement.

New AI Flaw can make chatbots trick hackers like Google Gemini, studying

Complete agency for meeting: AI revealed