Basic Tips for Curating & Captioning a Dataset: A Beginner's Guide

Introduction

We trained four models to show some differences so you can begin your journey into curating and captioning datasets

Two of the models were trained with larger datasets and two with smaller and less diverse datasets.

We hastily captioned one of each, and did a much more detailed captioning of the other two large and small datasets.

It will be obvious here that those with smaller datasets that are less diverse, and less focused in their captioning, get less nuanced results.

woman pilot crying
woman smiling, full body

Tips for Curating Datasets

- Create a dataset that has as much range as possible, be it different poses, emotions, or settings.

- Use 15-20 images for a strong training.

- If you have trouble finding diverse images DO NOT rely on a large dataset. 7-15 images is better for less ranged images.

- Both bad and good details will be learned, so try for the best images you can.

Here you can see both the large and small datasets used for the examples:

GE8sqfyXMAA7wyh
GE8vDqRWwAAJ-IE

Captions can be more of an afterthought for style models, but for characters and objects, captions are incredibly important. Here is an example caption for the image below:

hmr woman with robotic headgear and armor, neutral expression, slight smile

GE8t9rYXgAMPO6Q

Captioning Tips for Characters

- Describe some (not all) consistent defining features of your character.

- Describe their mood.

- Describe how much of their body is in frame.

- Use a unique token (in this case I used hmr) that does not easily conjure results on vanilla SDXL. Put it before the gender of your character in the captions.

(to check this, prompt the token on the SDXL model under the Foundational Models tab of the Models page)

Screenshot 2024-02-08 at 3.30.22 PM

What if my Dataset isn't big enough?

There isn't a super consistent "lightning in a bottle" if you are starting with a dataset that is too small. However, if you are willing to have a similar character that is not exactly the same as your starting images, you can make a LoRA Composition with your smaller dataset model.

In our opinion it is better than starting from scratch, and you can leverage the over consistency of your smaller dataset model to prompt a consistent character.

Conclusion

- More Ranged Images + Good Captions = GREAT

- Limited Images + Good Captions = GOOD

- Ranged Images + Average Captions = OK

- Limited Images + Average Captions = NO GO

Table of content

You'll also like