The UG2 Dataset

About the Dataset

UG2 contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground. Over 150,000 annotated frames for hundreds of ImageNet classes are available.


The data we release can be used for training and validation purposes. With respect to restoration and enhancement approaches that must be trained, we encourage a cross-dataset protocol where some annotated training data could come from outside UG2. Additionally we encourage participants to make use of their own data or data from other sources for training. However, un-annotated videos for additional validation purposes and parameter tuning are provided.

Video Collections

Available Videos

Annotated Videos

Extracted objects

Data Collections

Annotated videos per video collection

UAV Collection
Glider Collection
Ground Collection

UAV Collection

Creative Commons tagged UAV videos extracted from Youtube, contains 31 ImageNet superclasses. Video artifacts/problems include shaking, camera blur, annotations occlusion, etc.

Glider Collection

Fixed wing glider videos, contains 20 ImageNet superclasses. Video artifacts/problems include shaking, camera blur, rain, etc.

Ground Collection

Controlled ground data collection, contains 20 ImageNet superclasses. Controlled conditions:
  • Distances (30, 40, 50, 60, 70, 100, 150, 200 ft.)
  • Motion through elliptical shaker table (0, 120, 140, 160, 180 rotations per minute).
  • Data Annotations

    The dataset contains annotations for 162,136 object-level annotated images. Bounding boxes establishing object regions were manually annotated using the VATIC Video Annotation Tool, we provide the VATIC annotation files for every annotated video in the dataset.

    Each annotation file follows the annotation structure provided by VATIC. Each line contains one object annotation which is defined by 10 columns. The definition of each column is as follows:

    1. Track ID. All rows with the same ID belong to the same path of the same object through different video frames.
    2. xmin. The top left x-coordinate of the bounding box.
    3. ymin. The top left y-coordinate of the bounding box. xmax. The bottom right x-coordinate of the bounding box.
    4. ymax. The bottom right y-coordinate of the bounding box.
    5. frame. The frame that this annotation represents.
    6. lost. If 1, the annotation is outside of the view screen. In this case we did not extract any cropped region.
    7. occluded. If 1, the annotation is occluded. In this case we did not extract any cropped region.
    8. generated. If 1, the annotation was automatically interpolated. label. The class for this annotation, enclosed in quotation marks.
    About the Challenge

    Support for this challenge workshop is provided under IARPA contract #2016-16070500002. This workshop is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA). The views and conclusions contained herein are those of the organizers and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.