zeroshot-detect

Drop an image. Type any English noun (or several, comma-separated). See bounding boxes — no class list, no fine-tuning.

Model: google/owlv2-base-patch16-ensemble · CPU inference: 5–15 s per image.

Image

Labels (comma-separated)

Confidence threshold

Lower = more candidates (incl. false matches on look-alikes). Higher = fewer, cleaner boxes. 0.25 is the empirically-calibrated default; dense compositions may want 0.3-0.4.

0.05 0.5

Annotated

Detections