Menu
Loading...

Track 1: Video object classification and detection from unconstrained mobility platforms

How does the current state of the art perform for detecting and recognizing objects in videos having different scales, illumination, noises, weather condition and occlusion? Can the application of image enhancement and restoration algorithms as a pre-processing step improve image interpretability for automatic visual recognition to classify scene content? The UG2+ Track 1 aims to advance the analysis of videos collected by small UAVs by applying image restoration and enhancement algorithms to improve recognition performance using the UG2 (UAV, Glider, Ground) dataset, which has been collected specifically for this purpose.

What should a software system that can interpret images from UAVs actually look like? It must incorporate a set of algorithms, drawn from the areas of computational photography and machine learning, into a processing pipeline that correct undesired visual degradation in UAV-based image collection and subsequently classifies images across time and space. Image restoration and enhancement algorithms that remove corruptions like blur, noise and mis-focus, or manipulate images to gain resolution, change perspective and compensate for lens distortion are now commonplace in photo editing tools. Such operations are necessary to improve the quality of images for recognition purposes. But they must be compatible with the recognition process itself, and not adversely affect feature extraction or decision making. Exploratory work is needed to find out which image pre-processing algorithms, in combination with the strongest features and supervised machine learning approaches, are promising candidates for UAV-related computer vision applications.

UG2+ Challenge 1 consists of two sub-challenges:

  1. Object Detection Improvement on Video
    • 1st Place: $15K
    • 2nd Place: $10K
  2. Object Classification Improvement on Video
    • 1st Place: $15K
    • 2nd Place: $10K
Support for this challenge track is provided solely by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA).


Sub-Challenge 1.1: Object Detection Improvement on Video

The goal of this challenge is to detect objects from a number of visual object classes in unconstrained environments (i.e., not pre-segmented objects). It is fundamentally a supervised learning problem in that a training set of labeled images will be provided. Participants are not expected to develop novel object detection models. They are expected to use a pre-processing step (super-resolution, de-noising, deblurring, etc. and any combinations of these algorithms are within scope here) in the detection pipeline. A list of detection algorithms that will be used for scoring will be made available to the participants in order to facilitate studies of the interaction between image restoration and enhancement algorithms and the detectors. During the evaluation, the selected object detection algorithms will be run on the sequestered test images. In-line with popular detectors, the metrics will be *mAP@0.5 (mean average precision), mAP@0.75, mAP@0.9* for Glider, Ground and UAV collection.

  • For the dataset and evaluation, please click here.
  • Details about how to use the dataset and evaluation is provided in this Readme.
Collection mAP@50 mAP@75 mAP@90
UAV Collection
88.62%
39.54%
1.93%
Glider Collection
91.06%
40.34%
2.95%
Ground Collection
100%
96.65%
54.73%
Using the previous procedure, we obtain the following baseline classification results per each object class in all three collections:


Sub-Challenge 1.2: Object Classification Improvement on Video

The goal of this challenge is to provide an improvement on the classification performance of a given object in a video captured from an unconstrained mobility platform. Participants will be tasked with the creation of a procesing pipeline to correct visual aberrations present in video in order to improve the classification results obtained with out-of-the-box classification algorithms. The evaluation protocol will allow participants to make use of within dataset training data, and as much outside training data as they would like for training / validation purposes. Participants will not be tasked with the creation of novel classification algorithms. During the evaluation, the selected classification algorithm will be run on the sequestered test images.

The evaluation process will be as follows:

  1. For each object sequence (an object sequence is a group of frames during which a specific object of interest is visible, the object in the frame as well as its visibility is specified in the video’s annotation file (see the Dataset page for more information on the annotation file contents)), we use its annotations file to locate and crop the object from each frame, resizing the cropped region to 224x224 (input size for the VGG16 classification network).
  2. We feed the cropped objects to an out-of-the-box VGG16 network (using the imagenet weights provided by Keras).
  3. We calculate the Label Ranking Average Precision (LRAP) of the classification output (array of vectors containing a confidence score for each of the 1000 imagenet classes). LRAP averages over all the frames in the sequence to answer the following question: for each ground truth label, what fraction of higher-ranked labels were true labels? This performance measure will be higher if you are able to give better rank to the labels associated with each sample. The obtained score is always strictly greater than 0, and the best values is 1.
    • We define true labels as the set of imagenet classes that are contained within an UG2 superclass. For example, the UG2 class “Bicycle”, has two true labels: “(n03792782 - mountain bike), and (n02835271 - bicycle-built-for-two)”.
Overall the UG2 dataset contains 576 object sequences, divided among all three collections:
UAV Collection Glider Collection Ground Collection
Number of object sequences
242
206
128
Number of classes
31
19
20
Average LRAP
12.20%
10.73%
46.26%
Using the previous procedure, we obtain the following baseline classification results per each object class in all three collections:

Footer