Glad you are thinking about this. I have thought about these things for 30 or so years, off and on. Way back then, the processing power was not available to do this for low cost.
Here is what I might experiment with. Rather than try to bounce a signal off the effector and measure the time like Radar, place a speaker (Piezo) on the effector and two microphones (same Piezo) widely separated at fixed locations (this is really better suited for Wally X,Y). The electronics knows when a tone pulse was generated, and can measure when that tone was received by the mic. Take the difference in time to the mics and triangulate the position of the speaker. Actually I would measure the time from when the tone stop being sent. Much more reliable. The precision will be based on the timer resolution. The update rate will be based on the tone frequency. Reflections could pose a problem if they are strong, so a surface treatment for things (like a Wally backboard) might need to be made. You would want updates to be faster than the fastest "step rate". Do some math.
What clock rate would be needed in the timer for the speed of sound and 5 micron position resolution?
What could you do with a 20KHz tone?
How much would the Doppler effect the received tone when moving at full speed?