Drop an image. Type any English noun (or several, comma-separated). See bounding boxes — no class list, no fine-tuning.
Model: google/owlv2-base-patch16-ensemble · CPU inference: 5–15 s per image.
Detections