Visual Storytelling Dataset (VIST)

What is VIST? (SIND v.2)

We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling. The dataset includes 81,743 unique photos in 20,211 sequences, aligned to descriptive and story language. VIST is previously known as "SIND", the Sequential Image Narrative Dataset (SIND).


Example Generated Story

1 2 3 4 5
The dog was ready to go. He had a great time on the hike. And was very happy to be in the field. His mom was so proud of him. It was a beautiful day for him.

Photos by kameraschwein / CC BY-NC-ND 2.0

Story ≠ Descriptive Text

Desc-in-Isolation A group of people that are sitting next to each other. Adult male wearing sunglasses lying down on black pavement. The sun is setting over the ocean and mountains.
Story-in-Sequence Having a good time bonding and talking. [M] got exhausted by the heat. Sky illuminated with a brilliance of gold and orange hues.

Photos by mharvey75 / CC BY-NC 2.0, janelle / CC BY-NC-ND 2.0, lance_mountain / CC BY-NC-ND 2.0

1 Story ≠ 5 Captions

1 2 3 4 5
Desc-in-Isolation A black frisbee is sitting on top of a roof. A man playing soccer outside of a white house with a red door. The boy is throwing a soccer ball by the red door. A soccer ball is over a roof by a frisbee in a rain gutter. Two balls and a frisbee are on top of a roof.
Desc-in-Sequence A roof top with a black frisbee laying on the top of the edge of it. A man is standing in the grass in front of the house kicking a soccer ball. A man is in the front of the house throwing a soccer ball up. A blue and white soccer ball and black Frisbee are on the edge of the roof top. Two soccer balls and a Frisbee are sitting on top of the roof top.
Story-in-Sequence A discus got stuck up on the roof. Why not try getting it down with a soccer ball? Up the soccer ball goes. It didn't work so we tried a volley ball. Now the discus, soccer ball, and volleyball are all stuck on the roof.

Photos by rbieber / CC BY-NC-ND 2.0