Skip to content

Lynxius Setup

Begin your LLM evaluation with Lynxius in just one minute!

Signup for free here to create an account. Don't forget to validate your email!

Install Lynxius library

Install Lynxius Python library locally, or on your server.

pip install lynxius

Create Your First Project and API Key

Create your first project and create an API Key for it to start your LLM evaluations. Rememeber to store your secret key somewhere safe, you won't be able to access to its value anymore.

Use the newly generated secret key to set up your project API key locally, or on your server, to start your LLM evaluations.

export LYNXIUS_API_KEY='your-api-key-here'
export LYNXIUS_API_KEY='your-api-key-here'
set LYNXIUS_API_KEY "your-api-key-here"

Select Evaluation Setup

Remote vs Local Evaluations

The Lynxius remote evaluation setup is a fully managed service where the API keys for the models used in testing are provided and managed by the Lynxius team. Evaluation tasks are executed in the background, freeing you from delays and compute costs.

The Lynxius local evaluation setup is convenient if you prefer to use your OpenAI free credits and are comfortable managing the API keys for the models used in testing.

If you plan to run evaluations remotely, you can skip to Run Evals Remotely. If you prefer to run evaluations locally, make sure your OpenAI API key is set in your environment.

export OPENAI_API_KEY='your-api-key-here'  # only needed to run evals locally
export OPENAI_API_KEY='your-api-key-here'  # only needed to run evals locally
set OPENAI_API_KEY "your-api-key-here"  :: only needed to run evals locally

Start Evaluating

Pick an evaluator from our library and run your evaluations!

from lynxius.client import LynxiusClient
from lynxius.evals.answer_correctness import AnswerCorrectness

# add tags for frontend filtering
label = "PR #111"
tags = ["GPT-4", "chat_pizza", "PROD", "Pizza-DB:v2"]
answer_correctness = AnswerCorrectness(label=label, tags=tags)

client = LynxiusClient()

answer_correctness.add_trace(
    query="What is pizza quattro stagioni? Keep it short.",
    # reference from Wikipedia (https://github.com/lynxius/lynxius-docs/blob/main/docs/public/images/quattro_stagioni_wikipedia_reference.png)
    reference=(
        "Pizza quattro stagioni ('four seasons pizza') is a variety of pizza "
        "in Italian cuisine that is prepared in four sections with diverse "
        "ingredients, with each section representing one season of the year. "
        "Artichokes represent spring, tomatoes or basil represent summer, "
        "mushrooms represent autumn and the ham, prosciutto or olives represent "
        "winter."
    ),
    # output from OpenAI GPT-4 (https://github.com/lynxius/lynxius-docs/blob/main/docs/public/images/quattro_stagioni_gpt4_output.png)
    output=(
        "Pizza Quattro Stagioni is an Italian pizza that represents the four "
        "seasons through its toppings, divided into four sections. Each section "
        "features ingredients typical of a particular season, like artichokes "
        "for spring, peppers for summer, mushrooms for autumn, and olives or "
        "prosciutto for winter."
    )
)

client.evaluate(answer_correctness)
from lynxius.client import LynxiusClient
from lynxius.evals.answer_correctness import AnswerCorrectness

# add tags for frontend filtering
label = "PR #111"
tags = ["GPT-4", "chat_pizza", "PROD", "Pizza-DB:v2"]
answer_correctness = AnswerCorrectness(label=label, tags=tags)

client = LynxiusClient(run_local=True)  # run evals locally

answer_correctness.add_trace(
    query="What is pizza quattro stagioni? Keep it short.",
    # reference from Wikipedia (https://github.com/lynxius/lynxius-docs/blob/main/docs/public/images/quattro_stagioni_wikipedia_reference.png)
    reference=(
        "Pizza quattro stagioni ('four seasons pizza') is a variety of pizza "
        "in Italian cuisine that is prepared in four sections with diverse "
        "ingredients, with each section representing one season of the year. "
        "Artichokes represent spring, tomatoes or basil represent summer, "
        "mushrooms represent autumn and the ham, prosciutto or olives represent "
        "winter."
    ),
    # output from OpenAI GPT-4 (https://github.com/lynxius/lynxius-docs/blob/main/docs/public/images/quattro_stagioni_gpt4_output.png)
    output=(
        "Pizza Quattro Stagioni is an Italian pizza that represents the four "
        "seasons through its toppings, divided into four sections. Each section "
        "features ingredients typical of a particular season, like artichokes "
        "for spring, peppers for summer, mushrooms for autumn, and olives or "
        "prosciutto for winter."
    )
)

client.evaluate(answer_correctness)

Click on the Eval Run link of your project to explore the output of your evaluation. Result UI Screenshot tab below shows the result on the UI, while the Result Values provides an explanation.

answer_correctness_eval_run

Score Value Interpretation
Correct 0.66667 The output is only partially correct when compared to the reference.
Evaluator Output
{
    "TP": [
        "Pizza Quattro Stagioni is an Italian pizza that represents the four seasons through its toppings, divided into four sections.",
        "Each section features ingredients typical of a particular season, like artichokes for spring, mushrooms for autumn, and olives or prosciutto for winter."
    ],
    "FP": [
        "peppers for summer"
    ],
    "FN": [
        "tomatoes or basil represent summer"
    ]
}
A good part of the tokens in the reference are similar to the output text.