Track Vehicles Frame by Frame and Paint a Calm Overlay in PIL

Vehicle tracking and PIL overlay
Suggested cover: tracked vehicles receiving labels, timestamps and speed overlays on a live frame.

Detection code becomes clearer when it splits setup from drawing

A detection function usually wants to do too much. It talks to the model, unwraps tensors, computes speeds, formats text and paints the frame. The healthiest version is not a shorter version. It is a version where those responsibilities are grouped in a way the eye can follow.

The first half of the function should prepare the scene. It asks the detector for tracked objects, converts the returned structures into Python-friendly arrays and opens a drawing surface. Only after that does it make sense to talk about per-object overlays.

That split also helps when the code is reused in an article. A reader can pause after the setup slice, understand the data shape, and then continue into the loop that turns raw results into readable annotations.

while cap.isOpened():
   success, frame = cap.read()
   if not success: break
   results = model.track(frame, persist=True, classes=[2, 5, 7],
                         conf=0.3, iou=0.5, imgsz=1280, verbose=False)
   if results[0].boxes.id is not None:
       boxes = results[0].boxes.xyxy.int().cpu().numpy()
       ids = results[0].boxes.id.int().cpu().tolist()
       clss = results[0].boxes.cls.int().cpu().tolist()
       centers = results[0].boxes.xywh.cpu().numpy()

This alternative component intentionally isolates the display string so the later recoverable fragment can focus on model output and image preparation.

That helper is a good example of a neighboring component. It belongs to the same idea, but it is not part of the main stored slice.

       # First draw the boxes (OpenCV)
       for box in boxes:
           cv2.rectangle(frame, (box[0], box[1]), (box[2], box[3]), (255, 100, 0), 2)
       # Switch to PIL for cleaner text rendering (ONCE PER FRAME)
       img_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
       draw = ImageDraw.Draw(img_pil)

The third marked slice stops just before the per-object loop, which makes it a clean breakpoint for anyone reconstructing the file later.

Stopping there is useful because it preserves the boundary between model output and visual annotation. The first part explains what has been collected. The second part explains what will be done with it.

Overlay concernWhat the code needsWhy the split helps
Tracked boxesCoordinates, ids and class ids from the detector.It gives the loop enough structure before any text is drawn.
Drawing surfaceA PIL image plus an ImageDraw handle.It keeps annotation code separate from OpenCV tensor handling.
Speed contextA previous y-position per tracked object.It turns motion into a cheap derived signal without a second model.

Alternative component: keep the text box math separate

       for box, track_id, cls, center in zip(boxes, ids, clss, centers):
           x1, y1 = box[0], box[1]
           x_c, y_c = int(center[0]), int(center[1])
           label = model.names[cls].upper()
           time_str = datetime.datetime.now().strftime("%H:%M:%S")
           # Speed calculation (your logic)
           speed = round(abs(y_c - track_history.get(track_id, y_c)) * 2.5, 1)
           track_history[track_id] = y_c

This lilac block is intentionally adjacent to the recoverable material, but the color shift signals that it is a neighboring solution rather than part of the numbered sequence.

The actual loop is easier to absorb after the geometry has been discussed in plain English. That keeps the main rose slice from feeling like a wall of micro-decisions.

           # Build the text label
           display_text = f" {label} ID:{track_id} | {time_str} | X:{x_c} Y:{y_c} | {speed} km/h "
           # Draw the background and clean text
           tw, th = draw.textbbox((0, 0), display_text, font=font)[2:]
           draw.rectangle([x1, y1 - th - 10, x1 + tw, y1], fill=(255, 100, 0))
           draw.text((x1, y1 - th - 7), display_text, font=font, fill=(255, 255, 255))

The fourth marked slice covers the per-object work: speed estimation, row assembly, overlay text and the final BGR conversion.

This is where the function earns its keep. It does not merely detect objects. It transforms them into a story the operator can read and into rows the database can store.

  • Prepare all detector outputs before the per-object loop starts.
  • Let the overlay logic describe one object at a time instead of juggling all concerns at once.
  • Convert back to OpenCV's color space only after the drawing work is complete.