Image Classification Using Google Vision Transformer (ViT)
Overview: Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at
resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. Source: https://huggingface.co/google/vit-base-patch16-224
Input Image
Prediction Output
-
great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
0.986
-
tiger shark, Galeocerdo cuvieri
0.013
-
hammerhead, hammerhead shark
0.001
-
electric ray, crampfish, numbfish, torpedo
0.000
-
0.000