My LLM Evaluation Prompt Repository

alt text

Purpose

The purpose of this repository is to gather together a curated collection of prompts for conducting controlled experiments aimed at comparing and evaluating the efficacy of different large language models (LLMs), including fine-tuned models, for specific use-cases. These prompts are designed to facilitate targeted testing and help determine which models are most effective in performing certain tasks.

Notes

Some of the prompts are prompts that were captured using voice to text software (usually on my Android). Sometimes, these are marked as such but in other instances they're interspersed with other prompts.

Important note: no effort (whatsoever) has been made to rewrite them for clarity (for example, fixing mixtranscribed words, adding punctuation). As dealing with typos and spotty speech to text performance is an essential attribute of multimodal LLMs performing in the real imperfect world, I thought it was better to keep these prompts in their unedited format for more realistic evaluation when trying to compare the efficacy of various models.

Structure

Eval-Prompts - Prompts I've earmarked for evaluation purposes.
Old-Prompts - Real prompts that I've previously run for generations (in some cases edited for PII-removal). Many of these are not that useful and repetitive. But they're reflective of the type of casual prompting I commonly use so I use them as "pipeline" candidates for future evaluations.

Use Cases

The prompts included in this repository reflect some of the specific use-cases I have identified in my work with LLMs. These use-cases are unique to my context and needs, so the prompts may not necessarily be applicable or useful outside of this context. However, I am open-sourcing them here in case they may prove helpful to others looking to run similar experiments or tests.

Experiment Results

Please note that in order to keep the data separate, I will not be including the actual results of the experiments conducted with these prompts in this repository. This repository is focused solely on the prompts themselves, serving as a foundation for evaluation and comparison.

How to Use

Feel free to clone or download this repository, and use the prompts for your own LLM evaluations. Each prompt is stored in markdown format, with additional metadata that may help you contextualize or adapt it for your own experiments.

Licensing

This repository is licensed under CC-BY-4.0 (Attribution 4.0 International) License

Summary of the License

The Creative Commons Attribution 4.0 International (CC BY 4.0) license allows others to: - Share: Copy and redistribute the material in any medium or format. - Adapt: Remix, transform, and build upon the material for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

License Terms

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions: You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

For the full legal code, please visit the Creative Commons website.