Autonomous Driving has attracted tremendous attention in the last few years. Among the many enabling technologies for autonomous driving, environmental perception is the most relevant to the vision community. As such we host a challenge to understand the current status of computer vision algorithms in solving the environmental perception problems for autonomous driving. In this challenge, we have prepared a number of large scale datasets with fine annotation. Based on the datasets, we have define a set of realistic problems and encourage new algorithms and pipelines to be invented for autonomous driving, rather than applied on autonomous driving.


A total amount of 10000 USD cash prize will be awarded to top performers.

in each task will provide 2,500 USD

● 1st place - $1,200

● 2nd place - $800

● 3rd place - $500

Each winner must submit a paper describing their approaches after the competition is closed.

Data Sets

We have collected and annotated two large scale datasets.

The first is provided by Berkeley DeepDrive (BDD). The BDD database includes 100K HD 720P unqiue videos, which is currently the most diverse driving video dataset. All the videos come with GPS/IMU info for driving behavior study. Each video are tagged with weather, scene type and time of the day. The BDD team also extracts key frames from each of the video to label bounding boxes for all the road objects, lane markings, drivable areas and instance segmentation. More information can be found on the BDD database website.

The second data set, ApolloScape data set, is provide by Baidu. ApolloScape contain survey grade dense 3D points and registered multi-view RGB images at video rate, and every pixel and every 3D point are semantically labelled. In addition precise pose for each image is provided.

Task 1: Drivable Area Segmentation
Our first task is drivable area segmentation. This task requires the system to find the road area that the vehicle is driving or it can potentially drive on.
Task 2: Road Object Detection
This task is to detect the objects that are most relevant for driving policy, more specifically, the following classes of objects are to be detected with bounding box: vehicles, persons, and traffic signs/signals.
Task 3: Domain Adaption of Semantic Segmentation
BDD dataset and ApolloScape combined have the advantage of covering diverse domains in weather, time of day, and geographic diversity. In this task the participants are given the annotation in one conditions and required to semantically segment test images captured under different conditions. Two types of adaption are to be evaluated. One is on the time/weather conditions; and the other is geographical adaption, more specially training/testing will be from California (USA) and Beijing (China).
Task4: Instance-level Video Segmentation
In this task, participants are given a set of video sequences with fine per-pixel labeling, in particular instances of moving objects such as vehicles and pedestrians are also label. The goal is to evaluate the state of the art in video—based scene parsing, a task that has not been evaluated previously due to the lack of fine labeling. Some very challenging environments have been captured. The average moving instances per frame can be over 50, in comparisons, only up to 15 cars/pedestrians are labelled in the KITTI dataset.