Evaluation
How to use turkish-lm-tuner
to evaluate your model¶
turkish-lm-tuner
provides task specific metrics and evaluator for easier evaluation of fine-tuned language models.
Import task specific metrics¶
Evaluator
class provides task specific metrics for tasks. It takes two arguments: task
and metrics
.
The supported tasks are:
classification
summarization
paraphrasing
title_generation
nli
semantic_similarity
ner
pos_tagging
The supported metrics are:
accuracy
precision
precision_weighted
recall
recall_weighted
f1
f1_macro
f1_micro
f1_weighted
pearsonr
bleu
meteor
rouge
ter
squad
seqeval
For example, to import metrics for classification
task:
from turkish_lm_tuner import Evaluator
eval = Evaluator(task='classification')
Compute metrics¶
Metrics are then computed by calling compute_metrics
method of Evaluator
class. compute_metrics
method takes two arguments: preds
and labels
. labels
is the ground truth labels and preds
is the predicted labels.
For example, to compute metrics for classification
task:
eval.compute_metrics([0, 0, 1, 1], [1, 0, 1, 1])