Evaluation of the Performance of Artificial Intelligence Tools ChatGPT 3.5, Copilot and Gemini in Beekeeping

Agricultural University of Athens

School of Applied Economics and Social Sciences

Department of Agricultural Economics and Rural Development

Evaluation of the Performance of Artificial Intelligence Tools ChatGPT 3.5, Copilot and Gemini in Beekeeping

A thesis submitted to the

AGRICULTURAL UNIVERSITY OF ATHENS

In partial fulfillment of the requirements for the Integrated Master’s Degree of

AGRICULTURAL ECONOMICS AND RURAL DEVELOPMENT

IRINI KONTALI

Student ID: 416020

Committee Members

Costopoulou Constantina, Professor (Supervisor)

Karetsos Sotirios, Assistant Professor

Malliapis Michael, Teaching Staff

Athens, September 2024

ABSTRACT

The rapid development in the field of artificial intelligence (AI), particularly concerning chatbots widely known as "chatbots" based on large language models (LLMs), has piqued the interest of the research community regarding how they can be utilized as auxiliary tools for research purposes at their current level. These specific tools-applications excel in understanding and generating linguistic content, with potential usefulness in transforming university education and the way research is conducted in every field, including agricultural science. However, it is essential to evaluate the performance of such AI models on a variety of topics to highlight their capabilities, identify errors, and possible limitations. Therefore, this study aims to evaluate the performance of ChatGPT (GPT-3.5), Copilot, and Gemini. Specifically, twenty (20) exam topics from the university course of beekeeping conducted by the corresponding laboratory of the Department of Plant Production Science of the Agricultural University of Athens were selected. The AI tools were tasked with answering the topics in two languages, Greek and English. The validity of the answers was judged by the research staff of the beekeeping laboratory. The evaluation results showed that Copilot achieved a score of 16/20 (80%), followed by Gemini (10/20, 50%), and GPT-3.5 (7/20, 35%) concerning the answers in Greek. In the answers in English, all three applications achieved an equal score (14/20, 70%), answering different questions correctly.

Keywords: Artificial intelligence, Large language models, Evaluation, Test exams.

Evaluation of the Performance of Artificial Intelligence Tools ChatGPT 3.5, Copilot and Gemini in Beekeeping

Recent Posts

Comments

Subscribe to our newsletter • Don’t miss out!