Who is the best at X ?

Who is the best at X ?


Are we there yet ?

Did you ever want to quickly learn
which paper provides the best results on standard dataset X ?
Wait no more, just click below and discover the current state of the art.

About where does this data come from ?

Frustated by seeing too many papers omit the best performing methods, and inspired by Hao Wooi Lim’s blog,
here you have a crowd sourced list of known result one some of the “major” visual classification, detection, and pose estimation datasets.

You are most welcome to add new (or old) results.

Every entry on this page has been checked (once) by me. I might have made mistakes.
Feel free to indicate any inacurracy in the listed data.

Many thanks to the dozens of contributors who have helped build this collection.

Datasets who is the best at X ?

Classification

MNIST 50 results collected

Units: error %

Classify handwriten digits. Some additional results are available on the original dataset page.


CIFAR-10 49 results collected

Units: accuracy %

Classify 32x32 colour images.


CIFAR-100 31 results collected

Units: accuracy %

Classify 32x32 colour images.


STL-10 18 results collected

Units: accuracy %

Similar to CIFAR-10 but with 96x96 images. Original dataset website.


SVHN 17 results collected

Units: error %

The Street View House Numbers (SVHN) Dataset.

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST(e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.


ILSVRC2012 task 1

Units: Error (5 guesses)

1000 categories classification challenge. With tens of thousands of training, validation and testing images.

See this interesting comparative analysis.


Detection

Pascal VOC 2007 comp3 17 results collected

Units: mAP percent

Pascal VOC 2007 is commonly used because the test set has been realased. comp3 is the objects detection competition, using only the comp3 pascal training data.


Pascal VOC 2007 comp4 3 results collected

Units: mAP percent

Just like comp3 but “any training data” can be used.


Pascal VOC 2010 comp3 10 results collected

Units: mAP percent

Pascal VOC 2010 version of the challenge. comp3 is the objects detection competition.


Pascal VOC 2010 comp4 3 results collected

Units: mAP percent

Just like comp3 but “any training data” can be used.


Pascal VOC 2011 comp3

Units: mAP percent

Last Pascal VOC challenge instance (2012 version had identical data).


Caltech Pedestrians USA

Units: average miss-rate %

Project website.


INRIA Persons

Units: average miss-rate %

Evaluated using the Caltech Pedestrians toolkit. Original dataset website.


ETH Pedestrian

Units: average miss-rate %

Evaluated using the Caltech Pedestrians toolkit. Only left images used. Original dataset website.


TUD-Brussels Pedestrian

Units: average miss-rate %

Evaluated using the Caltech Pedestrians toolkit. Original dataset website.


Daimler Pedestrian

Units: average miss-rate %

Evaluated using the Caltech Pedestrians toolkit. Original dataset website.


KITTI Vision Benchmark

Units: average recall %

A rich dataset to evaluate multiple computer vision tasks, including cars, pedestrian and bycicles detection.


Pose estimation

Leeds Sport Poses 5 results collected

Units: PCP %

2000 poses anotated pictures from Flickr. From a selected set of activities and with the person at the center of the pictures.


Semantic labeling

MSRC-21 12 results collected

Units: accuracy % per-class / (and) per-pixel

One of the oldest and classic dataset for semantic labelling. 21 different categories of surfaces are considered. Despite the innacuracies in the annotations and how unbalanced the classes are, this dataset still is commonly used as reference point. Note that here we consider the original annotations (where most results are published), not the cleaned-up version.

The results are reported per-class and per-pixel (this is sometimes called “average” and “global” result, respectively).

Original dataset website


Saliency/Segmentation

Salient Object Detection benchmark

Units: AUC (precision/recall area under the curve) and MAE (mean absolute error)

This benchmark aggregates results from 36 methods over five datasets (MSRA10K, ECSSD, THUR15K, JuddDB, and DUTOMRON).