top of page

Evaluation of the Performance of Artificial Intelligence Tools ChatGPT 3.5, Copilot and Gemini in Beekeeping



Agricultural University of Athens

School of Applied Economics and Social Sciences

Department of Agricultural Economics and Rural Development

 

 

 

 

Evaluation of the Performance of Artificial Intelligence Tools ChatGPT 3.5, Copilot and Gemini in Beekeeping

 

 

A thesis submitted to the

AGRICULTURAL UNIVERSITY OF ATHENS

In partial fulfillment of the requirements for the Integrated Master’s Degree of

 AGRICULTURAL ECONOMICS AND RURAL DEVELOPMENT

By

IRINI KONTALI

Student ID: 416020

 

 

 

Committee Members

Costopoulou Constantina, Professor (Supervisor)

Karetsos Sotirios, Assistant Professor

Malliapis Michael, Teaching Staff

 

 

Athens, September 2024

 

ABSTRACT

The rapid development in the field of artificial intelligence (AI), particularly concerning chatbots widely known as "chatbots" based on large language models (LLMs), has piqued the interest of the research community regarding how they can be utilized as auxiliary tools for research purposes at their current level. These specific tools-applications excel in understanding and generating linguistic content, with potential usefulness in transforming university education and the way research is conducted in every field, including agricultural science. However, it is essential to evaluate the performance of such AI models on a variety of topics to highlight their capabilities, identify errors, and possible limitations. Therefore, this study aims to evaluate the performance of ChatGPT (GPT-3.5), Copilot, and Gemini. Specifically, twenty (20) exam topics from the university course of beekeeping conducted by the corresponding laboratory of the Department of Plant Production Science of the Agricultural University of Athens were selected. The AI tools were tasked with answering the topics in two languages, Greek and English. The validity of the answers was judged by the research staff of the beekeeping laboratory. The evaluation results showed that Copilot achieved a score of 16/20 (80%), followed by Gemini (10/20, 50%), and GPT-3.5 (7/20, 35%) concerning the answers in Greek. In the answers in English, all three applications achieved an equal score (14/20, 70%), answering different questions correctly.

 

Keywords: Artificial intelligence, Large language models, Evaluation, Test exams.

 

Comments


Subscribe to our newsletter • Don’t miss out!

TALLHEDA
EU

Project coordination

Prof. Konstantinos Demestichas

cdemest@aua.gr

Agricultural University of Athens

Project communication

MSc Angeliki Milioti

angeliki@smartagrohub.gr

Smart Agro Hub

Project Framework

TALLHEDA has received funding from the European Union's Horizon Europe research and innovation programme under Grant Agreement No. 101136578.

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency (REA). Neither the European Union nor the granting authority can be held responsible for them.

  • Facebook
  • X
  • LinkedIn
  • Instagram
  • Youtube

Copyright © 2024 SmartAgrohubPowered by Designature

bottom of page