CryptoMediaClub
Monday, March 16, 2026
  • All news
  • Bitcoin
  • Ethereum
  • Altcoins
  • NFT
  • Blockchain
  • Analysis
No Result
View All Result
  • All news
  • Bitcoin
  • Ethereum
  • Altcoins
  • NFT
  • Blockchain
  • Analysis
No Result
View All Result
CryptoMediaClub
No Result
View All Result
Home Analysis

Benchmarking ChatGPT’s capabilities against alternatives including Anthropic’s Claude 2, Google’s Bard, and Meta’s Llama2

24.07.2023
A A
0
144
VIEWS
ShareShare

As previously reported, new research reveals inconsistencies in ChatGPT models over time. A Stanford and UC Berkeley study analyzed March and June versions of GPT-3.5 and GPT-4 on diverse tasks. The results show significant drifts in performance, even over just a few months.

gpt4 vs gpt3 performance
Source: StanfordUniversity & UC Berkeley

For example, GPT-4’s prime number accuracy plunged from 97.6% to 2.4% between March and June due to issues following step-by-step reasoning. GPT-4 also grew more reluctant to answer sensitive questions directly, with response rates dropping from 21% to 5%. However, it provided less rationale for refusals.

Both GPT-3.5 and GPT-4 generated buggier code in June compared to March. The percentage of directly executable Python snippets dropped substantially because of extra non-code text.

While visual reasoning improved slightly overall, generations for the same puzzles changed unpredictably between dates. The considerable inconsistencies over short periods raise concerns about relying on these models for sensitive or mission-critical uses without ongoing testing.

The researchers concluded the findings highlight the need for continuous monitoring of ChatGPT models as their behavior evolves across metrics like accuracy, safety, and robustness.

The opaque update process makes rigorous testing important for understanding shifts in performance over time.

Is ChatGPT worse than competitors now?

CryptoSlate conducted a small internal experiment using ChatGPT Plus (GPT-4), OpenAI API (GPT-4), Anthropic (Claude 2), and Google (Bard) using the basic prompt used in part of the research,

‘Is 17077 a prime number?’

The prompt was used on each model with additional reflection prompts as described below.

ChatGPT & OpenAI API

When given the prompt, ChatGPT and OpenAI API responded ‘no’ and hallucinated on the math. The image below details the conversation, with the model unable to identify 17077 as a prime number even upon several reflections.

gpt4 performance
OpenAI API

To be clear, 13 x 1313 is 17,069.

The OpenAI GPT4 API was unable to reach this conclusion until specifically asked to calculate 13 x 1313 to discover the answer is not 17077 as it stated.

Anthropic’s Claude 2

However, Anthropic’s Claude 2 demonstrated its problem-solving process by performing calculations before providing the correct response.

Anthropic 17077
Anthropic Claude 2

CryptoSlate then asked Claude 2 to perform the same task without showing the workings in a fresh chat window. Claude 2 gave a solid answer, refusing to commit while offering additional insight into the solution.

“Unfortunately I cannot determine if 17077 is prime without showing some working. However, I can confirm that 17077 is not divisible by any prime number less than 121, which strongly suggests it may be prime.”

Google Bard

Google Bard tackled the question with a similar strategy to Claude 2. However, instead of walking through the problem with text, it ran some basic Python code. Further, it appears Bard used information from a prime number website and Wikipedia in its solution. Interestingly, the page cited from the prime number site, primenumbers.info, included only information about other prime numbers, not 17077.

Google Bard
Google Bard

Meta’s Llama 2

Interestingly, Meta’s recently released 70 billion parameter open-sourced model Llama2 performed similarly to GPT4 in CryptoSlate’s limited testing.

meta llama2
Meta Llama2

Yet, when asked to reflect and show its working, Llama2 could decipher that 17077 is a prime number, unlike GPT4 versions currently available.

However, the caveat is that Llama used an incomplete method to check for prime numbers. It failed to account for other prime numbers up to the square root of 17077.

Therefore, technically Llama failed successfully.

GPT4-0613 version June 13, 2023

CryptoSlate also tested the math puzzle against the GPT4-0613 model (June version) and received the same result. The model suggested 17077 is not a prime number in its first response. Further, when asked to show its working, it eventually gave up. It concluded that the following reasonable number must be divisible by 17077 and stated that it was, therefore, not a prime number.

Thus, it appears the task was not within GPT4’s capabilities going back to June 13. Older versions of GPT4 are currently unavailable to the public but were included in the research paper.

Code Interpreter

Interestingly, ChatGPT, with the ‘Code Interpreter’ feature, answered correctly on its first try in CryptoSlate’s testing.

gpt4 code interpreter
OpenAI GPT4 Code Interpreter

OpenAI Response & model impact

In response to claims OpenAI’s models are degrading, The Economic Times reported, OpenAI’s VP of Product, Peter Welinder, denied these claims, asserting that each new version is smarter than the previous one. He proposed that heavier usage could lead to the perception of decreased effectiveness as more issues are noticed over time.

Interestingly, another study from Stanford researchers published in JAMA Internal Medicine found that the latest version of ChatGPT significantly outperformed medical students on challenging clinical reasoning exam questions.

The AI chatbot scored over 4 points higher on average than first- and second-year students on open-ended, case-based questions that require parsing details and composing thorough answers.

Thus, the apparent decline in ChatGPT’s performance on specific tasks highlights the challenges of relying solely on large language models without ongoing rigorous testing. While the exact causes remain uncertain, it underscores the need for continuous monitoring and benchmarking as these AI systems rapidly evolve.

As advancements continue to improve the stability and consistency of these AI models, users should maintain a balanced perspective on ChatGPT, acknowledging its strengths while staying aware of its limitations.

The post Benchmarking ChatGPT’s capabilities against alternatives including Anthropic’s Claude 2, Google’s Bard, and Meta’s Llama2 appeared first on CryptoSlate.

Share11Tweet7ShareSharePin2

Related Posts

$700M in Iran war bets and $1.2M in suspicious profits push Washington toward prediction-market crackdown
Analysis

$700M in Iran war bets and $1.2M in suspicious profits push Washington toward prediction-market crackdown

16.03.2026
0

Polymarket and Kalshi are trying to raise money at valuations that put them in the top tier of consumer-fintech names,...

Read moreDetails
The six senators who voted against the March digital dollar ban: Johnson, Lee, Murphy, Scott, Tuberville, and Van Hollen

The six senators who voted against the March digital dollar ban: Johnson, Lee, Murphy, Scott, Tuberville, and Van Hollen

15.03.2026
The illusion of movement: How Coinbase’s 800,000 BTC migration exposes the flaw in raw Bitcoin age metrics

The illusion of movement: How Coinbase’s 800,000 BTC migration exposes the flaw in raw Bitcoin age metrics

15.03.2026
Bitcoin’s $71k rally has a problem most traders aren’t watching

Bitcoin’s $71k rally has a problem most traders aren’t watching

15.03.2026
The latest US inflation report looked like good news — next week may change that

The latest US inflation report looked like good news — next week may change that

14.03.2026
Load More
Next Post

Indonesia’s New Crypto Asset Exchange Will List Binance’s Tokocrpto

0 0 votes
Рейтинг статьи
Subscribe
Notify of
guest
guest
0 комментариев
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Recommended

Cardano Foundation Launches PRAGMA: A New Chapter in Open-Source Blockchain Development   PRAGMA is revolutionizing Cardano by improving its infrastructure with innovative open-source projects.  After upgrading Carndado, the Foundation has improved its ability to protect crime-fighting data.  Coronado has announced the launch of PRAGMA, marking a strategic step towards advancing open-source blockchain innovation.  PRAGMA is a nonprofit organization that partners with dcSpark, Blink Labs, TxPipe, and Sundae Labs to establish a strong blockchain ecosystem for Carnado and other blockchains.    PRAGMA’s Vision and Launch On April 22nd, PRAGMA will be commencing its operations in Zug, Switzerland. This marks an important milestone for the company as it enters a new market and expands its global footprint.   Their main focus aims to create a vibrant ecosystem for Cardano and other blockchains by harvesting the development of open-source technologies.    PRAGMA is dedicated to supporting a variety of open-source projects, both those that are already established and those that are still in their infancy.   In addition, they are actively working to promote the continued development and improvement of emerging tools such as Aiken and Amaru.   These projects are central to PRAGMA’s objective of adopting a straight-thinking development environment.       CEO of the Cardano Foundation, Frederik Gregaard, stated: “ At the Cardano Foundation, we are advocates for the open-source maturity of the Cardano ecosystem, supporting collaborative initiatives that increase the diversity, as well as the quality and quantity of blockchain solutions”.    Goals and Plans PRAGMA’s goal is to cultivate an open-source ecosystem for Cardano, primarily focusing on specific projects like Amaru, a full node in Rust, and Aiken, a platform dedicated to pushing smart contract development.   The Cardano ecosystem has set an ambitious goal to increase its memberships by including a larger number of developers by the year 2025.   This strategic move is aimed at expanding its reach and influence in the developer community and providing more opportunities for developers to participate in the growth of the ecosystem.    The current market value of Cardano (ADA) is $0.5161, which has seen a slight uptick of 0.10% in the past 24 hours.    Over the past week, the price has shown significant growth of 8.50%, signalling a potential bullish market trend for the cryptocurrency.

Cardano Foundation Launches PRAGMA: A New Chapter in Open-Source Blockchain Development PRAGMA is revolutionizing Cardano by improving its infrastructure with innovative open-source projects. After upgrading Carndado, the Foundation has improved its ability to protect crime-fighting data. Coronado has announced the launch of PRAGMA, marking a strategic step towards advancing open-source blockchain innovation. PRAGMA is a nonprofit organization that partners with dcSpark, Blink Labs, TxPipe, and Sundae Labs to establish a strong blockchain ecosystem for Carnado and other blockchains. PRAGMA’s Vision and Launch On April 22nd, PRAGMA will be commencing its operations in Zug, Switzerland. This marks an important milestone for the company as it enters a new market and expands its global footprint. Their main focus aims to create a vibrant ecosystem for Cardano and other blockchains by harvesting the development of open-source technologies. PRAGMA is dedicated to supporting a variety of open-source projects, both those that are already established and those that are still in their infancy. In addition, they are actively working to promote the continued development and improvement of emerging tools such as Aiken and Amaru. These projects are central to PRAGMA’s objective of adopting a straight-thinking development environment. CEO of the Cardano Foundation, Frederik Gregaard, stated: “ At the Cardano Foundation, we are advocates for the open-source maturity of the Cardano ecosystem, supporting collaborative initiatives that increase the diversity, as well as the quality and quantity of blockchain solutions”. Goals and Plans PRAGMA’s goal is to cultivate an open-source ecosystem for Cardano, primarily focusing on specific projects like Amaru, a full node in Rust, and Aiken, a platform dedicated to pushing smart contract development. The Cardano ecosystem has set an ambitious goal to increase its memberships by including a larger number of developers by the year 2025. This strategic move is aimed at expanding its reach and influence in the developer community and providing more opportunities for developers to participate in the growth of the ecosystem. The current market value of Cardano (ADA) is $0.5161, which has seen a slight uptick of 0.10% in the past 24 hours. Over the past week, the price has shown significant growth of 8.50%, signalling a potential bullish market trend for the cryptocurrency.

2 years ago
Warning Prevails: Crypto Concern & Greed Index Creeps to twenty, Nonetheless in Excessive Concern Territory

Warning Prevails: Crypto Concern & Greed Index Creeps to twenty, Nonetheless in Excessive Concern Territory

1 year ago
Bitcoin Breaks $120K As ‘Uptober’ Momentum Rises, Shutdown Fails To Stall Gains

Bitcoin Breaks $120K As ‘Uptober’ Momentum Rises, Shutdown Fails To Stall Gains

5 months ago
Deutsche Bank Unit to Pay $4M for Delayed Suspicious Activity Reports, SEC Says

Deutsche Bank Unit to Pay $4M for Delayed Suspicious Activity Reports, SEC Says

1 year ago

Categories

  • All news
  • Altcoins
  • Analysis
  • Bitcoin
  • Blockchain
  • Ethereum
  • NFT
No Result
View All Result

Highlights

Bitcoin’s $71k rally has a problem most traders aren’t watching

Large Bitcoin Wallets Resume Accumulation as BTC Holds $71K: Santiment

Crypto Leaders Push Back After Boris Johnson Calls Bitcoin a Ponzi

DC Blockchain Summit Pushes On as Dubai Crypto Events Fall to Iran War

CLARITY Act Faces Slim Odds in 2026 Without April Committee Move: Galaxy Exec

The latest US inflation report looked like good news — next week may change that

Trending

$700M in Iran war bets and $1.2M in suspicious profits push Washington toward prediction-market crackdown
Analysis

$700M in Iran war bets and $1.2M in suspicious profits push Washington toward prediction-market crackdown

16.03.2026
0

Polymarket and Kalshi are trying to raise money at valuations that put them in the top tier...

The six senators who voted against the March digital dollar ban: Johnson, Lee, Murphy, Scott, Tuberville, and Van Hollen

The six senators who voted against the March digital dollar ban: Johnson, Lee, Murphy, Scott, Tuberville, and Van Hollen

15.03.2026
The illusion of movement: How Coinbase’s 800,000 BTC migration exposes the flaw in raw Bitcoin age metrics

The illusion of movement: How Coinbase’s 800,000 BTC migration exposes the flaw in raw Bitcoin age metrics

15.03.2026
Bitcoin’s $71k rally has a problem most traders aren’t watching

Bitcoin’s $71k rally has a problem most traders aren’t watching

15.03.2026
Large Bitcoin Wallets Resume Accumulation as BTC Holds $71K: Santiment

Large Bitcoin Wallets Resume Accumulation as BTC Holds $71K: Santiment

15.03.2026
  • All news
  • Altcoins
  • Bitcoin
  • Blockchain
  • Ethereum
  • NFT
  • Analysis
Editor: cryptomediaclub.com@gmail.com
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on CryptoMediaClub is those of writers quoted. It does not represent the opinions of CryptoMediaClub on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
CryptoMediaClub covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023 Crypto News. All Rights Reserved

No Result
View All Result
  • All news
  • Bitcoin
  • Ethereum
  • Altcoins
  • NFT
  • Blockchain
  • Analysis

Disclaimer: Information found on CryptoMediaClub is those of writers quoted. It does not represent the opinions of CryptoMediaClub on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
CryptoMediaClub covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023 Crypto News. All Rights Reserved

wpDiscuz