Tags

Tags provide a convenient way to track all the details of your evaluation runs. A tag can repesent various elements, such as your local experiment name, a pull request number, the dataset used to evaluate, your environment name (e.g., Development, Test, User Acceptance Testing (UAT), Staging and Production) and/or location (e.g., Washington, Zurich, and Singapore).

Tags can later be used to filter evaluation runs using the convenient search-bar. This can be done after clicking Eval Runs on the projects page or directly on the trends page.

What are Tags?

Add Tags to Evaluation Run

Use the tags keyword argument to add tags to your evaluation runs when using the Lynxius Python library.

Example-1

In this example the model used (GPT-4), the name of the LLM App under examination (chat_pizza) and the database used for evaluating (Pizza-DB:v2) were povided as tags.

label = "PR #325"
tags = ["GPT-4", "chat_pizza", "Pizza-DB:v2"]
answer_correctness = AnswerCorrectness(label=label, tags=tags)

Example-2

Warning

Tags for data entries within a specific dataset (e.g., Gregory-House and neurology) are not yet supported. This feature is coming soon.

In this example the model used (GPT-3.5-turbo), the LLM App's task under examination (summarization), the database used for evaluating (Plainsboro-Hospital-DB:v5), the doctor who annotated the data entries (Gregory-House) and the relevant medical discipline for those entries (neurology) were povided as tags.

label = "PR #325"
tags = ["GPT-3.5-turbo", "summarization", "Plainsboro-Hospital-DB:v5", "Gregory-House", "neurology"]
answer_correctness = AnswerCorrectness(label=label, tags=tags)

Filter Based on Tags

Use a list of tags to filter evaluation runs using the convenient search-bar. This can be done after clicking Eval Runs on the projects page or directly on the trends page.

The input list of tags is parsed using an OR condition to filter the entries. For example GPT-4 chat_pizza Pizza-DB:v2 will search all entries tagged with at least one of the three input tags.

`latest` Dataset Version

Warning

Automatic versioning of Datasets and the special tag latest are not yet supported. This feature is coming soon.

If you are testing from your Continuous Integration (CI) pipeline and you always want to evaluate against to the latest version of your dataset that you uploaded on Lynxius platform, make sure to use the latest tag when launching evaluation runs. You can later also filter using the latest tag (e.g., Pizza-DB:latest).