1 · Introduction

The dataset consists 5165 image pairs and corresponding disparity maps, where 4156 image pairs are used for training, and 1009 image pairs are used for testing. The images are extracted from Apollo dataset. Ground truth has been acquired by accumulating 3D point clouds from Lidar and fitting 3D CAD models to individually moving cars (obtained from 3d car instance understanding dataset). The dataset contains varying traffic conditions with heavy occlusion, which are very challenging.

2 · Data Download

Training data

Testing data

3 · Data Structure

The structure of the dataset is following:

• intrinsic.txt: intrinsic parameters

• fg_mask: foreground mask

• bg_mask: background mask

• Camera5: images captured by camera 5

• Camera6: images captured by camera 6

• disparity: the ground truth disparity

Note that to show the disparity better, the disparity value is 200 times larger than groundtruth. If you upload results, the results should also be increased 200 times.

4 · Evaluation

The evaluation code are released on github here.

5 · Metric formula

For each image, given the predicted disparity di and the ground truth di*, the metric for evaluation is defined as:

Here the mask can be either foreground (fg), background (bg) or the whole region (merge of fg and bg). N is the number of image

6 · Rules of ranking

Result benchmark will be:

Rank Method D1_all D1_fg D1_bg
xxx xx xx xx xx

7 · Format of submission file

{split}/{data_type}/{image_name}

data_type: disparity: the estimated disparity

The dataset we released is  desensitized street view  for academic use only.