Evaluation
How to use turkish-lm-tuner to evaluate your model¶
turkish-lm-tuner provides task specific metrics and evaluator for easier evaluation of fine-tuned language models.
Import task specific metrics¶
Evaluator class provides task specific metrics for tasks. It takes two arguments: task and metrics.
The supported tasks are:
classificationsummarizationparaphrasingtitle_generationnlisemantic_similaritynerpos_tagging
The supported metrics are:
accuracyprecisionprecision_weightedrecallrecall_weightedf1f1_macrof1_microf1_weightedpearsonrbleumeteorrougetersquadseqeval
For example, to import metrics for classification task:
from turkish_lm_tuner import Evaluator
eval = Evaluator(task='classification')
Compute metrics¶
Metrics are then computed by calling compute_metrics method of Evaluator class. compute_metrics method takes two arguments: preds and labels. labels is the ground truth labels and preds is the predicted labels.
For example, to compute metrics for classification task:
eval.compute_metrics([0, 0, 1, 1], [1, 0, 1, 1])