Tags
Tags provide a convenient way to track all the details of your evaluation runs. A tag can repesent various elements, such as your local experiment name, a pull request number, the dataset used to evaluate, your environment name (e.g., Development, Test, User Acceptance Testing (UAT), Staging and Production) and/or location (e.g., Washington, Zurich, and Singapore).
Tags can later be used to filter evaluation runs using the convenient search-bar. This can be done after clicking on the projects page or directly on the trends page.
What are Tags?
Add Tags to Evaluation Run
Use the tags
keyword argument to add tags to your evaluation runs when using the Lynxius Python library.
Example-1
In this example the model used (GPT-4
), the name of the LLM App under examination (chat_pizza
) and the database used for evaluating (Pizza-DB:v2
) were povided as tags.
label = "PR #325"
tags = ["GPT-4", "chat_pizza", "Pizza-DB:v2"]
answer_correctness = AnswerCorrectness(label=label, tags=tags)
Example-2
Warning
Tags for data entries within a specific dataset (e.g., Gregory-House
and neurology
) are not yet supported. This feature is coming soon.
In this example the model used (GPT-3.5-turbo
), the LLM App's task under examination (summarization
), the database used for evaluating (Plainsboro-Hospital-DB:v5
), the doctor who annotated the data entries (Gregory-House
) and the relevant medical discipline for those entries (neurology
) were povided as tags.
label = "PR #325"
tags = ["GPT-3.5-turbo", "summarization", "Plainsboro-Hospital-DB:v5", "Gregory-House", "neurology"]
answer_correctness = AnswerCorrectness(label=label, tags=tags)
Filter Based on Tags
Use a list of tags to filter evaluation runs using the convenient search-bar. This can be done after clicking on the projects page or directly on the trends page.
The input list of tags is parsed using an OR condition to filter the entries. For example GPT-4 chat_pizza Pizza-DB:v2
will search all entries tagged with at least one of the three input tags.
latest
Dataset Version
Warning
Automatic versioning of Datasets and the special tag latest
are not yet supported. This feature is coming soon.
If you are testing from your Continuous Integration (CI) pipeline and you always want to evaluate against to the latest version of your dataset that you uploaded on Lynxius platform, make sure to use the latest
tag when launching evaluation runs. You can later also filter using the latest
tag (e.g., Pizza-DB:latest
).