Getting-Started¶

turkish-lm-tuner simplifies the process of finetuning and evaluating transformer language models on various NLP tasks, with a special focus on Turkish language datasets. It is built on top of the transformers and supports both encoder and encoder-decoder models.

Key Features¶

Support for Multiple Tasks: Includes wrappers for tasks like summarization, text classification, and more.
Easy Dataset Import and Processing: Utilities for importing and preprocessing datasets tailored for specific NLP tasks.
Simple Model Finetuning: Streamlined finetuning of models with customizable parameters.
Comprehensive Evaluation Metrics: Offers a wide range of metrics for different tasks, making evaluation straightforward.

Installation¶

turkish-lm-tuner can be installed as follows:

pip install turkish-lm-tuner

Finetuning¶

Example: Finetuning TURNA on TR News Dataset¶

Importing and Processing the Dataset¶

from turkish_lm_tuner import DatasetProcessor

# Define parameters
dataset_name = "tr_news"
task = "summarization"
task_format = "conditional_generation"
model_name = "boun-tabi-LMG/TURNA"
max_input_length = 764
max_target_length = 128

# Initialize and process dataset
dataset_processor = DatasetProcessor(dataset_name, task, task_format, '', model_name, max_input_length, max_target_length)
train_dataset = dataset_processor.load_and_preprocess_data('train')
eval_dataset = dataset_processor.load_and_preprocess_data('validation')

Setting up Training Parameters and Finetuning¶

from turkish_lm_tuner import TrainerForConditionalGeneration

# Define training and optimizer parameters
training_params = {
    'num_train_epochs': 10,
    'per_device_train_batch_size': 4,
    'per_device_eval_batch_size': 4,
    'output_dir': './',
    'evaluation_strategy': 'epoch',
    'save_strategy': 'epoch',
    'predict_with_generate': True
}
optimizer_params = {
    'optimizer_type': 'adafactor',
    'scheduler': False
}
model_save_path = "turna_summarization_tr_news"

# Finetuning the model
model_trainer = TrainerForConditionalGeneration(model_name, task, training_params, optimizer_params, model_save_path, max_input_length, max_target_length, dataset_processor.dataset.postprocess_data)
trainer, model = model_trainer.train_and_evaluate(train_dataset, eval_dataset, None)

# Save the model
model.save_pretrained(model_save_path)
dataset_processor.tokenizer.save_pretrained(model_save_path)

Evaluation¶

Example: Using Evaluator for the Summarization Task¶

Importing Task-Specific Metrics¶

from turkish_lm_tuner import Evaluator

# Initialize evaluator for summarization task
eval = Evaluator(task='summarization')

Computing Metrics¶

# Example predictions and labels
preds = [generated_summary1, generated_summary2]
labels = [true_summary1, true_summary2]

# Compute metrics
results = eval.compute_metrics(preds, labels)

Contributing¶

We welcome contributions to turkish-lm-tuner! Whether it's improving documentation, adding new features, or reporting issues, your input is valuable.