Getting-Started¶
turkish-lm-tuner
simplifies the process of finetuning and evaluating transformer language models on various NLP tasks, with a special focus on Turkish language datasets. It is built on top of the transformers and supports both encoder and encoder-decoder models.
Key Features¶
- Support for Multiple Tasks: Includes wrappers for tasks like summarization, text classification, and more.
- Easy Dataset Import and Processing: Utilities for importing and preprocessing datasets tailored for specific NLP tasks.
- Simple Model Finetuning: Streamlined finetuning of models with customizable parameters.
- Comprehensive Evaluation Metrics: Offers a wide range of metrics for different tasks, making evaluation straightforward.
Installation¶
turkish-lm-tuner
can be installed as follows:
pip install turkish-lm-tuner
Finetuning¶
Example: Finetuning TURNA on TR News Dataset¶
Importing and Processing the Dataset¶
from turkish_lm_tuner import DatasetProcessor
# Define parameters
dataset_name = "tr_news"
task = "summarization"
task_format = "conditional_generation"
model_name = "boun-tabi-LMG/TURNA"
max_input_length = 764
max_target_length = 128
# Initialize and process dataset
dataset_processor = DatasetProcessor(dataset_name, task, task_format, '', model_name, max_input_length, max_target_length)
train_dataset = dataset_processor.load_and_preprocess_data('train')
eval_dataset = dataset_processor.load_and_preprocess_data('validation')
Setting up Training Parameters and Finetuning¶
from turkish_lm_tuner import TrainerForConditionalGeneration
# Define training and optimizer parameters
training_params = {
'num_train_epochs': 10,
'per_device_train_batch_size': 4,
'per_device_eval_batch_size': 4,
'output_dir': './',
'evaluation_strategy': 'epoch',
'save_strategy': 'epoch',
'predict_with_generate': True
}
optimizer_params = {
'optimizer_type': 'adafactor',
'scheduler': False
}
model_save_path = "turna_summarization_tr_news"
# Finetuning the model
model_trainer = TrainerForConditionalGeneration(model_name, task, training_params, optimizer_params, model_save_path, max_input_length, max_target_length, dataset_processor.dataset.postprocess_data)
trainer, model = model_trainer.train_and_evaluate(train_dataset, eval_dataset, None)
# Save the model
model.save_pretrained(model_save_path)
dataset_processor.tokenizer.save_pretrained(model_save_path)
Evaluation¶
Example: Using Evaluator for the Summarization Task¶
Importing Task-Specific Metrics¶
from turkish_lm_tuner import Evaluator
# Initialize evaluator for summarization task
eval = Evaluator(task='summarization')
Computing Metrics¶
# Example predictions and labels
preds = [generated_summary1, generated_summary2]
labels = [true_summary1, true_summary2]
# Compute metrics
results = eval.compute_metrics(preds, labels)
Contributing¶
We welcome contributions to turkish-lm-tuner! Whether it's improving documentation, adding new features, or reporting issues, your input is valuable.