Alice | Yuetian Chen

I was on a shoot last year when my camera’s autofocus decided to stop working. Not completely, mind you. It would lock onto faces perfectly in the viewfinder, then slowly drift out of focus during the actual take. The issue? A linear polarizer on the lens was confusing the phase-detection system.

This is a common problem. Polarizers scatter the light that AF sensors rely on. The camera thinks it has focus, but it doesn’t. You can shoot manual, of course, but that’s hard when your subject is moving.

I started wondering: what if I could measure distance directly and skip the camera’s AF entirely? That question eventually became Alice (Autofocus Lens Interface for Cinema Equipment), an Android app that turns a phone into an external autofocus controller.

The complete Alice system: an Android phone running the app, Intel RealSense depth camera for distance measurement, and nRF52840 dongle for wireless motor control.

Why Build This?

Camera autofocus systems are amazing pieces of engineering, but they have blind spots. Here are some situations where they struggle:

Manual focus lenses have no electronics. You cannot autofocus a vintage Zeiss lens no matter how good your camera body is. There’s nothing for the camera to talk to.
Adapted lenses often lose AF capability. Mount a Canon EF lens on a Sony body with an adapter, and you might get slow, hunting autofocus or none at all.
Cinema cameras often skip AF entirely. The RED Komodo, BMPCC 6K, and many others expect you to pull focus manually or hire someone to do it.
Polarizers and ND filters can interfere with phase-detection. The light hitting the AF sensor isn’t the same as the light hitting the main sensor.

For any of these cases, the usual answer is “pull focus manually” or “hire a focus puller.” But what if you’re shooting solo? What if the subject is unpredictable?

The Core Idea

Instead of trying to make the camera’s AF work, bypass it completely. Use a depth camera to measure actual distance to your subject. Use that distance to drive an external motor attached to your lens’s focus ring.

This is what Alice does. It doesn’t care what camera you’re using, what lens is attached, or whether they can communicate with each other. It builds an independent autofocus system that sits alongside your existing gear.

Demo

Here’s Alice in action, demonstrating face tracking autofocus with a manual focus cinema lens:

How the System Works

Let me walk through each piece of hardware and what it does.

The Depth Camera

Alice uses Intel RealSense cameras for depth sensing. These are small USB cameras designed for robotics and computer vision. They project infrared patterns onto the scene and measure how those patterns distort, which tells them how far away each point is.

The supported models are the D415, D435, D435i, D435f, D455, and D405. I’ve tested primarily with the D435, which offers a good balance between size and accuracy.

The camera connects to your phone via USB and streams depth frames at 60 frames per second. Each frame is a grid of distance values, one for each pixel. A 424×240 frame means you get about 100,000 distance measurements every 16 milliseconds.

The complete hardware setup mounted on a camera rig. The depth camera points at the scene, the dongle handles wireless communication, and the phone runs the app.

The Wireless Motor Controller

To actually turn the focus ring, Alice uses the Tilta Nucleus Nano II. This is a wireless follow focus system popular in video production. It consists of a small motor that clamps onto your lens and a hand controller (called a “knob”) that controls it wirelessly.

The clever part is that Tilta’s system uses IEEE 802.15.4, the same radio protocol as Zigbee. With the right hardware, you can send commands to the motor yourself.

That hardware is an nRF52840 USB dongle running custom firmware. The nRF52840 is a microcontroller from Nordic Semiconductor with built-in support for 802.15.4 radio. I wrote firmware (using Zephyr RTOS) that turns it into a bridge: the phone sends commands over USB, and the dongle transmits them wirelessly to the motor.

Why Not Bluetooth?

The Tilta motor doesn’t use Bluetooth. It uses IEEE 802.15.4, which is a different 2.4 GHz protocol. The nRF52840 supports both, but we need 802.15.4 to talk to the motor.

The Phone

The Android app ties everything together. It receives depth frames from the RealSense, processes them to determine distance, looks up the corresponding focus position from a calibration file, and sends motor commands to the dongle.

Why a phone? Two reasons. First, phones have good processors that can handle the depth processing in real time. Second, phones have screens that can show you what’s happening. You can see the depth visualization, tap to select focus points, and monitor everything without additional hardware.

The app requires Android 8.0 or newer and works best with USB 3.0. If you only have USB 2.0, it still works but with some limitations (more on that later).

The Problem with Raw Depth Data

You might think the hard part is getting the depth data. Actually, that’s easy. The hard part is making it usable.

Raw depth readings are noisy. Here’s what I mean. Suppose you point the camera at a wall 2 meters away and read the center pixel. You might get readings like:

Frame 1: 2.03 m
Frame 2: 1.98 m
Frame 3: 2.01 m
Frame 4: 2.05 m
Frame 5: 1.97 m

That’s a 8 cm spread in half a second, even though nothing moved. If I sent each of these readings directly to the focus motor, it would constantly hunt back and forth. The video would be unusable.

The noise comes from several sources. The infrared projector isn’t perfectly uniform. Environmental light interferes with the IR pattern. Some surfaces (like glass or shiny metal) reflect IR strangely. The sensor itself has electronic noise.

Alice needs to figure out the true depth despite all this noise.

Spatial Filtering

The first step is to look at more than one pixel. Instead of reading just the center point, Alice reads a small region around it. The size of this region adapts based on how stable the scene is.

When the depth readings are steady (low variance), Alice uses a larger region of 24×24 pixels. More pixels means more averaging, which reduces noise.

When the depth readings are jumping around (high variance), Alice shrinks the region to 8×8 pixels. A smaller region responds faster to real changes, which matters if your subject is moving.

But there’s a catch. Simple averaging doesn’t work well when the focus point is near an edge. Imagine you’re focusing on someone’s face, and there’s a wall behind them. If you average all the pixels in a box around their nose, you’ll include some face pixels and some wall pixels. The average will be somewhere in between, which is wrong.

Alice uses a bilateral filter instead. This is a type of weighted average where pixels only contribute if they’re both spatially close and similar in depth. Face pixels contribute to the face average. Wall pixels contribute to the wall average. They don’t mix.

The math looks like this:

\[d_{filtered} = \frac{\sum_{i} w_s(i) \cdot w_d(i) \cdot d_i}{\sum_{i} w_s(i) \cdot w_d(i)}\]

where $w_s$ is a spatial weight (closer pixels count more) and $w_d$ is a depth weight (similar depths count more). The depth weight drops off sharply when there’s a big depth difference, which is what keeps edges sharp.

Temporal Filtering

Spatial filtering helps within a single frame, but the noise can also vary frame to frame. For this, Alice uses a Kalman filter.

A Kalman filter is a way to combine predictions with measurements. It keeps track of two things: an estimate of the true depth, and how confident it is in that estimate.

Each frame, the filter does two steps:

Predict: Assume the depth hasn’t changed since last frame. The confidence goes down slightly because we’re less sure over time.
Update: Look at the new measurement. Combine it with our prediction, weighted by how confident we are in each. If the prediction is very confident and the measurement looks noisy, trust the prediction more. If the measurement looks clean, trust it more.

Over time, this produces much smoother depth estimates than raw measurements.

The filter also adapts to the scene. When depth is changing rapidly (subject moving toward or away from camera), it increases the process noise so it can keep up. When depth is stable, it reduces the measurement noise so it can filter more aggressively.

Here are the key equations:

\[\begin{aligned} P_{pred} &= P + Q \cdot \Delta t & \text{(confidence decreases over time)} \\ K &= \frac{P_{pred}}{P_{pred} + R} & \text{(Kalman gain)} \\ \hat{x} &\leftarrow \hat{x} + K \cdot (z - \hat{x}) & \text{(update estimate)} \\ P &\leftarrow (1 - K) \cdot P_{pred} & \text{(update confidence)} \end{aligned}\]

The variable $Q$ controls how much uncertainty we add each frame (higher = faster response). The variable $R$ controls how much we trust measurements (lower = trust more). Alice adjusts both based on recent scene behavior.

Converting Depth to Motor Position

Okay, so now we have a clean depth reading. How do we turn that into a focus motor position?

Every lens has a different relationship between focus distance and ring rotation. A 50mm lens might need 90 degrees of rotation to go from 1 meter to infinity. An 85mm lens might need 180 degrees. Some lenses focus by rotating the ring clockwise, others counterclockwise.

The Tilta motor reports position as a number from 0 to 4095. Position 0 is one end of its travel, position 4095 is the other end. But where those correspond to on your lens depends on how you mounted the motor.

Alice learns this relationship through calibration.

The Calibration Process

Calibration is straightforward but requires a few minutes per lens.

Set up your camera and lens normally
Position something at a known distance (start with close focus, maybe 0.5 meters)
Use the app’s manual mode to adjust the motor until the subject is sharp
Tap “Record Point” to save that depth-to-position pair
Repeat at several distances (1m, 2m, 5m, near infinity)

The result is a list of points like:

Distance	Motor Position
0.3 m	342
0.5 m	891
1.0 m	1847
2.0 m	2689
5.0 m	3412
10.0 m	3891

Alice stores these as a JSON file with some metadata about the lens:

{
  "name": "Canon 50mm f/1.4",
  "description": "Calibrated on full frame, ring reversed",
  "mappingPoints": [
    {"depth": 0.3, "motorPosition": 342, "confidence": 0.95},
    {"depth": 0.5, "motorPosition": 891, "confidence": 0.98},
    {"depth": 1.0, "motorPosition": 1847, "confidence": 0.97},
    {"depth": 2.0, "motorPosition": 2689, "confidence": 0.96},
    {"depth": 5.0, "motorPosition": 3412, "confidence": 0.94},
    {"depth": 10.0, "motorPosition": 3891, "confidence": 0.92}
  ],
  "metadata": {
    "focalLength": 50,
    "maxAperture": 1.4,
    "focusDirection": "reversed"
  }
}

Interpolation

When Alice measures a depth of 1.5 meters, there’s no exact calibration point for that. Instead, it interpolates between the neighboring points (1.0m at position 1847 and 2.0m at position 2689).

Linear interpolation gives us:

\[\text{position} = 1847 + \frac{1.5 - 1.0}{2.0 - 1.0} \cdot (2689 - 1847) = 2268\]

This assumes the relationship is roughly linear between calibration points. For most lenses over short intervals, this is close enough. If you need more accuracy, add more calibration points.

How Many Points Do You Need?

Three points is the minimum for usable autofocus. Five points (minimum, 1m, 2m, 5m, infinity) is usually enough. More points help if your lens has a highly nonlinear focus throw.

Focus Modes

Alice provides four ways to control focus, depending on what you’re shooting.

Manual Mode

Manual mode gives you direct control. A slider on screen moves the motor from 0 to 4095. Preset buttons let you jump to common positions. This mode is useful for setting focus marks before a shot or when you want full control.

The motor controller accepts position commands and smoothly moves to them. Large jumps are broken into smaller steps (50 units at a time, 100 steps per second) so the movement looks smooth rather than jerky.

Single Point (AF-S)

Tap anywhere on the screen. Alice measures depth at that point, calculates the motor position, and moves the motor. Focus locks until you tap again.

This works well for static shots. Tap your subject, let the motor settle, then shoot. If you need to recompose, the focus stays locked at that distance.

The lock happens at the motor level, not by remembering a distance. This means if something walks into frame at the same distance as your subject, it will also be in focus.

Continuous Point (AF-C)

Tap to select a focus point, then Alice continuously tracks depth there. As your subject moves closer or farther, focus adjusts automatically.

Updates happen at about 30 Hz (33ms between updates). This is fast enough to track walking pace but won’t keep up with fast action. There’s a trade-off: faster updates mean more motor movement and potentially more noise in the video.

You can adjust the response speed in settings. Lower response speed means slower, smoother focus changes. Higher response speed means faster tracking but potentially visible focus shifts.

Face Tracking (AF-F)

This mode uses ML Kit (Google’s machine learning library) to detect faces in the frame. When a face is detected, Alice automatically focuses on it.

If multiple faces appear, you can tap to select which one to track. A colored box shows which face is selected.

Face Detection Stability

To prevent focus from jumping around, the face detector requires three consecutive frames before confirming a new face. It also keeps tracking a face for five frames after it disappears, handling brief occlusions.

Face tracking works best when faces are reasonably sized in the frame. The detector requires faces to be at least 25% of the image width, which prevents false positives from small background faces.

The Firmware

The nRF52840 dongle needs firmware to work. I wrote this using Zephyr RTOS, which is an open-source real-time operating system that supports the nRF52840’s radio hardware.

What the Firmware Does

The firmware acts as a bridge between USB and wireless. It receives commands from the phone over USB (as a CDC-ACM serial device) and transmits them over IEEE 802.15.4 radio to the Tilta motor.

Commands are simple text strings:

POS 2048       # Set motor position to 2048
DEST AB CD     # Set destination address to 0xABCD
STATUS         # Report current status

The firmware responds with acknowledgments:

OK:POS=2048
OK:DEST=ABCD
ERROR:Invalid command

Motor Communication Protocol

The Tilta motor expects 16-byte packets in a specific format. Here’s what each byte does:

Bytes	Name	Purpose
0	Frame	Always 0x0F, marks start of frame
1-2	Control	Frame type flags, always 0x61 0x88
3	Sequence	Incremented each packet for tracking
4-5	PAN ID	Network identifier, 0xE4 0x3D for Tilta
6-7	Destination	Motor address (discovered during setup)
8-9	Source	Dongle address (fixed in firmware)
10-12	Payload	Position value plus control flags
13	Checksum	XOR of bytes 0-12
14-15	Padding	Reserved, set to 0x00

The position is encoded in the payload bytes as a 12-bit value (0-4095). The firmware handles all the encoding and checksum calculation.

Motor Smoothing

When the phone sends a large position change (say, from 1000 to 3000), the firmware doesn’t jump there immediately. Instead, it moves in steps of 50 units at 100 Hz, taking about 400ms to complete the move. This makes focus transitions look smooth instead of abrupt.

The firmware also adds a tiny bit of position noise (±1 unit) every 10 milliseconds. This sounds counterproductive, but it prevents the motor from entering a low-power idle state. The Tilta motor has a “sleep” behavior when it receives identical position commands, and the noise keeps it responsive.

LED Status Indicators

The dongle has an LED that shows what’s happening:

Color	Meaning
Blue	Waiting for USB connection
Green	Connected and running normally
Red	Error state

The LED changes from blue to green when the phone opens the serial connection. This is a quick way to verify the firmware is working.

USB Bandwidth Management

Here’s a problem I didn’t expect: USB bandwidth.

The Android phone connects to three USB devices: the motor dongle, the RealSense camera, and optionally a UVC camera for monitoring. Each device wants bandwidth, and USB 2.0 only has 480 Mbps to share.

The RealSense at full resolution (424×240 color + depth at 60fps) uses about 150 Mbps. The UVC camera might use another 100 Mbps. The motor dongle uses almost nothing (a few kilobytes per second).

If you connect everything at once, the devices can fight over bandwidth. Sometimes they fail to enumerate. Sometimes they connect but drop frames.

Alice solves this with sequenced connection. Instead of connecting all devices simultaneously, it connects them one at a time with delays between each:

Connect motor dongle (500ms delay)
Connect UVC camera (500ms delay)
Connect RealSense (waits for others to stabilize)

This gives each device time to negotiate its bandwidth needs without conflicts.

USB 2.0 Limitations

If you only have USB 2.0 (common on older phones or cheaper USB hubs), Alice automatically reduces RealSense bandwidth by disabling the color stream. Depth-only mode uses much less bandwidth and still provides everything needed for autofocus.

USB 3.0 has 5 Gbps of bandwidth, which is enough for everything at full quality. If your phone supports it, use a USB 3.0 hub.

Confidence and Reliability

Not every depth reading is trustworthy. Glass, mirrors, shiny metal, and very dark surfaces can produce garbage readings. Alice needs to know when to trust its measurements and when to hold focus.

The Confidence Score

Each depth measurement comes with a confidence score from 0 to 1. This is calculated from two factors:

Validity ratio: What fraction of pixels in the ROI have valid readings? Some pixels might return “no data” if the surface couldn’t be measured.

Stability: How much are the readings varying? High variance suggests unreliable measurements.

The formula combines these:

\[\text{confidence} = 0.6 \times \text{validityRatio} + 0.4 \times \text{stability}\]

where stability is calculated as:

\[\text{stability} = \frac{1}{1 + \sqrt{\text{variance}} / 100}\]

A reading with 95% valid pixels and low variance might have confidence 0.92. A reading with 60% valid pixels and high variance might have confidence 0.45.

Confidence Threshold

In settings, you can set a minimum confidence threshold. When confidence drops below this threshold, Alice holds the last good focus position instead of chasing bad readings.

A threshold of 0.7 is a good starting point. Lower if you’re getting too many focus rejections. Higher if focus is jumping around too much.

When Confidence Fails

If you’re shooting through glass or at highly reflective surfaces, the depth camera may never achieve high confidence. In these cases, switch to manual mode or face tracking (which can fall back to detected face size for rough distance estimation).

What You Need

Here’s the complete hardware list:

Phone: Android 8.0 or higher. USB 3.0 recommended for best performance. Most modern phones work fine.

Depth Camera: Intel RealSense D415, D435, D435i, D435f, D455, or D405. The D435 is a good all-around choice.

Focus Motor: Tilta Nucleus Nano II motor unit. You’ll need the hand controller (knob) for initial pairing, but after that Alice takes over.

Wireless Bridge: Nordic nRF52840 USB dongle. These cost about $10 and are widely available.

USB Hub: Something to connect all these devices to your phone. A powered hub is more reliable than unpowered.

Firmware: The dongle needs custom firmware. Pre-built hex files and source code are in the project repository.

Total cost for the new hardware (excluding camera and motor you might already own) is roughly \$200-300 for the RealSense and $10 for the dongle.

Limitations

Alice works well in many situations, but it’s not magic. Here’s where it struggles:

Reflective surfaces: Glass, mirrors, and shiny metal confuse the depth camera. You’ll get low confidence readings or wrong distances.

Dark scenes: The RealSense uses infrared, which works in low light, but very dark surfaces absorb IR and don’t return good signals.

Long distances: Depth accuracy drops with distance. At 10+ meters, the error might be 20-50 cm, which matters for telephoto lenses.

Fast motion: The 30 Hz update rate can’t track very fast movement. If your subject is running at the camera, focus may lag behind.

Non-parfocal zooms: If your lens changes focus when zooming, you need separate calibrations for each focal length. This is a lot of work for zoom lenses.

When to Use Alice

Alice works best as a helper when camera AF isn’t available or isn’t reliable. For critical cinema work with moving subjects, a skilled focus puller is still the gold standard. But for solo shooters, documentary work, or situations where you just need “good enough” autofocus, Alice fills a real gap.

The Code

The project is open source. The Android app is written in Kotlin using Jetpack Compose for the UI. The firmware is written in C using the Zephyr RTOS framework.

The app architecture uses coroutines for async I/O, flows for state management, and a coordinator pattern to handle the multiple USB devices. The depth processing pipeline runs on background threads to keep the UI responsive.

If you want to build from source:

Android app: Open in Android Studio, let Gradle sync, build. Requires JDK 11+ and Android SDK API 35.

Firmware: Install the nRF Connect SDK, open in VS Code with the nRF extension, select the nrf52840dongle board, build.

Pre-built releases (APK and firmware hex) are available on GitHub if you just want to use it without building.

Future Ideas

A few things I’d like to explore:

Predictive focus: Track not just current depth but velocity. If the subject is walking toward the camera at 1 m/s, pre-adjust focus to where they’ll be in 100ms.

HDMI overlay: Display focus information on the camera’s HDMI output so you can monitor without looking at the phone.

More motor support: The Tilta protocol was reverse-engineered. Other wireless follow focus systems (DJI, Moza, etc.) could potentially work with different firmware.

Better face tracking: Currently uses center of face bounding box. Could use eye detection for more precise focus, especially at wide apertures.

For now, Alice does what I built it to do: provide reliable autofocus when the camera’s built-in system won’t cooperate. If you’ve ever missed a shot because AF failed at the wrong moment, give it a try.

Repository

Everything is on GitHub: app source, firmware source, build instructions, and pre-built releases.

Feel free to open issues if you run into problems or have suggestions. Pull requests are welcome too.