Adventures in Machine Vision
3 min read
A Nat Friedman anecdote about an AI watching him drink water sent me digging for a spare Raspberry Pi. Now a camera and a sensor HAT watch my desk, a self-inflicted panopticon that logs who walks by and what the room is doing.
A Nat Friedman anecdote recently caught my eye:
Most of it is fluff. Friedman, Daniel Gross, and the Collisons are positioned to consume AI at a level most of us only read about: all the toys, all the tokens, early access, and people on payroll to wire it together. A lot of what gets demoed from that vantage point is a postcard from a budget I don't have. But the kernel of the idea is almost always cheap, you don't need their resources to point some consumer hardware at yourself and see what falls out.
Now I need to preface this by making it clear that blanketing your house with cameras and letting an AI watch you 24/7 sounds dystopian.
I want you to walk to the kitchen right now and drink a bottle of water, and I'm going to watch to make sure you do it.
Maybe dystopian is the wrong word. This is self-inflicted panopticon horror.
Let's build our own.
The build
I had a spare Raspberry Pi Zero 2 W rattling around in a drawer, so I gave it a job. A 5MP Arducam (the OV5647, a native Pi sensor, so no driver wrangling) rides the camera ribbon, and a repurposed Waveshare Environment Sensor HAT sits on the GPIO header. The camera and the HAT use different connectors, so they coexist on one board on my desk.
The pipeline is small. picamera2 pulls frames, a quantized TFLite SSD-MobileNet-v2 runs COCO object detection, and every confirmed sighting gets appended to a daily JSONL log. A read-only FastAPI service reads that log back out over my tailnet. The detector writes, the API reads, and they run as separate services so I can restart one without disturbing the other.
The Zero 2 W is the bottleneck: quad A53, 512 MB, around 2 to 6 FPS for full COCO. That's slow for video and plenty for "who or what just walked by." It debounces per label, so it records presence over time instead of flooding the log every frame.
It stores no video. Frames stay in memory unless I ask for them, and what lands on disk is metadata: one JSON object per detection with the label, confidence, and bounding box. There's a live preview too, off by default and tailnet-only, plain MJPEG that renders into an <img> and encodes on the Zero 2 W's hardware block instead of the CPU.
The HAT writes its own stream, polling temperature, humidity, pressure, and light every thirty seconds. The board heats itself, so the temperature reads high. A live reading of the SoC temperature feeds a thermal-divider model that subtracts the self-heating and pulls the number back toward the actual room.
Right now it isn't doing anything. It watches my desk and writes things down. But I have ideas:
- track my water intake during the workday
- turn the lights on and off depending on whether I'm actually at my desk
- etc.