Evaluation

How to use `turkish-lm-tuner` to evaluate your model¶

turkish-lm-tuner provides task specific metrics and evaluator for easier evaluation of fine-tuned language models.

Import task specific metrics¶

Evaluator class provides task specific metrics for tasks. It takes two arguments: task and metrics. The supported tasks are:

classification
summarization
paraphrasing
title_generation
nli
semantic_similarity
ner
pos_tagging

The supported metrics are:

accuracy
precision
precision_weighted
recall
recall_weighted
f1
f1_macro
f1_micro
f1_weighted
pearsonr
bleu
meteor
rouge
ter
squad
seqeval

For example, to import metrics for classification task:

from turkish_lm_tuner import Evaluator

eval = Evaluator(task='classification')

Compute metrics¶

Metrics are then computed by calling compute_metrics method of Evaluator class. compute_metrics method takes two arguments: preds and labels. labels is the ground truth labels and preds is the predicted labels.

For example, to compute metrics for classification task:

eval.compute_metrics([0, 0, 1, 1], [1, 0, 1, 1])

Evaluation

How to use turkish-lm-tuner to evaluate your model¶

Import task specific metrics¶

Compute metrics¶

How to use `turkish-lm-tuner` to evaluate your model¶