Datasets

A Dataset is a collection of inputs and expected outputs of an LLM App. This is a good way to run bulk evaluations and collaborate with the Subject Matter Experts (SMEs) that typically annotate your data entries with high quality human labels and scores.

What are Datasets?

Dataset Entries

Each entry of a Datasets is composed of:

Element	Definition
Query	The input query to your LLM App.
Output	The output answer of your LLM App.
Reference	The expected output of your LLM App. It typically is a high-quality ground truth value that is used by an evaluator to assess the quality of the Output. This is ideally annotated by a human.
Comments	General information, typically annotated by a human, to provide more clarity about the Reference or Score they manually insterted.
Score	A score, typically annotated by a human, to indicate the quality of the Reference value. For example, if the Reference annotation is good, but lenghtly, a human annotator might penalize its Score. We encourage to use a value in [0-1] as a Score.

Create a Dataset

Tip

Please consult our API Reference or full Swagger API documentation to create a dataset via APIs.

To create a new datasets, click the Create a Dataset button on the datasets page. You will be prompted to insert the name for that dataset.

Upload CSV

To upload a CSV file to your datasets, select your dataset of choice from the datasets page, use the Browse... button and select file to upload from your system and finally click on Upload.

Add a Dataset Entry

Tip

Please consult our API Reference or full Swagger API documentation to create a dataset via APIs.

To upload a new entry to your datasets, select your dataset of choice from the datasets page, click the Add Entry button and fill in some or all the available fields (Query, Reference, Output, Comments, Score).