重现步骤
在线训练报错,选的图像分类
软硬件版本信息
K230,固件2.9
错误日志
gen_dataset
python3 -u gen_dataset.py -t classification -i /workspace/datasets/classification/12854 -o /workspace/datasets/classification/12854_d
cd /workspace/code/k230_training_code/
cd /workspace/code/k230_training_code/
激活Python 2.9.0 环境。
train task
shell generate successfully.
python3 -u run_task.py -c /workspace/datasets/classification/12854/params.json -g 4
激活Python 2.9.0 环境。 Subfolders copied successfully.
激活Python 2.9.0 环境。 Creating task...
Initializing training module...
Parsing config from /workspace/datasets/classification/12854/params.json...
Setting split ratio, split ratio is [training: validation: testing]=[0.8:0.1:0.1]
There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
Starting training...
Training module initialization completed!
epoch:1/700
File "/usr/local/lib/python3.9/dist-packages/torchmetrics/utilities/checks.py", line 292, in _check_classification_inputs
File "/usr/local/lib/python3.9/dist-packages/torchmetrics/classification/accuracy.py", line 569, in update
mode = _mode(preds, target, self.threshold, self.top_k, self.num_classes, self.multiclass, self.ignore_index)
_check_num_classes_mc(preds, target, num_classes, multiclass, implied_classes)
File "/workspace/code/k230_training_code/run_task.py", line 84, in
File "/workspace/code/k230_training_code/algorithm/task.py", line 148, in start_pipeline
Traceback (most recent call last):
File "/workspace/code/k230_training_code/algorithm/cls_code/classification_engine/trainer.py", line 264, in train
File "/workspace/code/k230_training_code/algorithm/cls_code/classification_engine/trainer.py", line 124, in train
start_training(config_path, gpu_id)
File "/workspace/code/k230_training_code/run_task.py", line 64, in start_training
You have set num_classes=1
, but predictions are integers. If you want to convert (multi-dimensional) multi-class data with 2 classes to binary/multi-label, set multiclass=False
.
result_acc = acc(outputs, targets)
raise e
task.start_pipeline()
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return self.metric(preds, target.to(torch.int))
File "/usr/local/lib/python3.9/dist-packages/torchmetrics/metric.py", line 309, in _forward_reduce_state_update
return forward_call(*args, **kwargs)
self.trainer.train()
File "/workspace/code/k230_training_code/algorithm/task.py", line 119, in start_pipeline
raise e
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/usr/local/lib/python3.9/dist-packages/torchmetrics/metric.py", line 395, in wrapped_func
return forward_call(*args, **kwargs)
update(*args, **kwargs)
self.update(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torchmetrics/functional/classification/accuracy.py", line 424, in _mode
self._forward_cache = self._forward_reduce_state_update(*args, **kwargs)
File "/workspace/code/k230_training_code/algorithm/cls_code/classification/metric.py", line 42, in forward
File "/usr/local/lib/python3.9/dist-packages/torchmetrics/metric.py", line 245, in forward
mode = _check_classification_inputs(
File "/usr/local/lib/python3.9/dist-packages/torchmetrics/utilities/checks.py", line 156, in _check_num_classes_mc
raise ValueError(
ValueError: You have set num_classes=1
, but predictions are integers. If you want to convert (multi-dimensional) multi-class data with 2 classes to binary/multi-label, set multiclass=False
.