Trends

Trends provide a comprehensive overview of your organization's projects analytics at a glance. It allows you to quickly ensure that your LLM Apps are performing well in various environments (e.g., Development, Test, User Acceptance Testing (UAT), Staging and Production), across different locations (e.g., Washington, Zurich, and Singapore), or within different experiments you are running on your laptop.

Trends illustrate the mean values of the selected evaluation runs, where each evaluation run represents the aggregated performance of specific bulk evaluations.

Each plot represents the scores of a different evaluator, with each line corresponding to a combination of the selected project in your organization and a list of tags.

Analyzing trends over a specific dataset is crucial to discover insightful time-based evolutions of your system.

What are Trends?

Inspect Trends

All the trends for your projects are visible on the trends page. You can use the Projects multi-select dropdown to choose which projects to display, while you can use the search-bar to filter based on tags.

Each point on the plots represents a specific evaluation run, displaying the aggregated score for a bulk evaluation. Each point is clickable and redirects you to the relevant evaluation run.

Compare Trends

You can use the Projects multi-select dropdown to choose which projects to display and the search-bar to filter based on tags. Here a few ideas for trends comparisons you can make:

Compare Local Development with Main Branch

Comparing the performance of your latest local code changes with the main branch baseline can be done with the help of tags. Name your experiment locally (e.g., experiment-25) before launching your evaluation. You can later use the query experiment-25 main Pizza-DB on the tags search-bar to compare. Make sure to follow our Best Practices for Filtering Time-Based Trends.

Compare Across Environments

Tip

To distinguish between different environments, we encourage you to use separate projects for each environment rather than using tags.

Comparing the performance of your LLM Apps' across Dev, Test, and Prod environments is as simple as selecting these three environments through the Projects multi-select dropdown. For time-based trends, make sure to follow our Best Practices for Filtering Time-Based Trends.

Best Practices for Filtering Time-Based Trends

Warning

Automatic versioning of Datasets and dataset UI filtering capabilities are not yet supported. This feature is coming soon.

To uncover insightful time-based trends, it is very crucial to filter project and tags meaningfully and avoid combining together evaluation runs' scores calculated over different datasets.

We strongly encourage to always provide the relevant dataset name and version (e.g., Pizza-DB:v2 or Pizza-DB:latest) before launching your evaluation. This ensures accurate time-based trends without mixing different datasets.

Here some best practices for using tags to filter trends effectively:

We recommend to upload your dataset to Lynxius throught the datasets page to benefit from our automatic datasets versioning. Dataset versions are automatically generated by the Lynxius platform. This eliminates the need for you to manage dataset version updates manually. If you decide to version your datasets on your own, make sure to maintain a rigorous tags strategy on the Lynxius platform.
To see time-based trends for the latest dataset version (e.g., Pizza-DB:latest), use the Datasets multi-select dropdown to select the Pizza-DB:latest dataset on the trends page. Using latest is the right strategy to follow when testing from your Continuous Integration (CI) pipeline, where the dataset remains the same, but gets updated constantly. Be sure to read our documentation on the latest Dataset Version.
To see time-based trends for a specific dataset version (e.g., Pizza-DB:v2), use the Datasets multi-select dropdown to select the Pizza-DB:v2 dataset on the trends page.