Voice and AI control of my sandtable

Hey folks!

I had a few hours to kill yesterday and did something fun. I put together a little pipeline that takes voice commands and asks ChatGPT to turn them into gcode for my sand table. From GPT-4’s point of view gcode is just another language, and he’s (I’m ok anthropomorphizing my AIs) really good at languages. Turned out to be much easier than I expected, which is a bit scary.

The pipeline (written in python) looks like this - prompt for a voice command, use the OpenAI Whisper API to convert it into text, then send it via their Completions API to GPT-4-Turbo with a system setup that tells him about the sand table and that he needs to generate gcode to follow the prompt. Then I’m just sending the gcode to the table via telnet.

I uploaded a video demo here : https://youtu.be/wyBQh8Gm9Xo .

You can check out the source code here : GitHub - jscotkin/sandgpt: AI voice control of ZenXY sandtable . The code is really minimal, was a quick prototype, and I don’t expect it would work out of the box for anyone else but it’s also quite short and should be easy to play with for anyone familiar with python.

I’m incredibly impressed with how well it works, including some fairly complex commands - “draw a series of circles along the Y axis”, “draw the olympic rings”, “draw the letters V1E”, etc. He struggled with more complex shapes - I was able to get “draw a five pointed star with 8 inch arms” to work interactively by asking him to double check his calculations, but was not able to get it work as a one-shot in this code. His instructions include asking him to double-check that the gcode will not push the marble outside the bounds of the table, but he often gets that wrong, particularly with arcs and circles.

Anyway, check out the video - I think the potential is really very cool. As the AI improves on math and spatial reasoning it will get much better at this kind of task. I certainly wouldn’t use it for CNC yet!

For the next test I’m thinking about calling the image processing version, sending it an overhead picture, and having it outline shapes on the table. You could imagine a coffee table where whenever you put down your glass or a book it automatically draws the outline of it, for instance. AFAICT it should be quite easy - GPT-4 is already quite good at parsing out images (you can give it photo of the contents of your fridge and ask for recipes it can make), and there really isn’t much more to it than that and talking to it in English.

Thanks and have fun!

Joel

10 Likes

That’s pretty cool.
I created a GPT (CNC Buddy), which I’m specializing to answer questions related to FluidNC, and building CNC machines in general.

It’s also good at answering questions like: What does this gcode do?

And: What size struts should I cut if my table is this big?

Well, I guess my job is the first to fall to AI at V1…

I think the Olympic rings were the version after the robot uprising. You have to add “Pre-2028 version of Olympic rings” next time :slight_smile:.

Seriously though. What a fun project. Arc commands always have X/Y at the ending coordinates. So you can tell at a glance they weren’t arcs for one circle. Interesting process.

That’s really cool - love to check it out.

I think you’re safe for a while, and Sandify is awesome.

It definitely has a tough time fully understanding how to use arcs and circles. I asked for a row of circles centered on the X-axis for instance but it has a hard time understanding the offset so it will draw them with the starting point on the centerline but not the center of the circles (ie they are all one radius over). For the five pointed star test it draws five lines but winds up making an hourglass or just having them come off at random angle.

I did a quick test on the vision front, but they even note in the release notes that it isn’t any good at estimating positions, and that turned out to be the case. There are plenty of more dedicated object recognition libraries that would probably be fine for the purpose but I haven’t looked at them yet, and might wait for future improvements since it’s probably coming.

I don’t know a ton about AI, but I have done a lot of computer vision using opencv. With a controlled environment, indoors, you can probably code something up. But it wouldn’t be quick. You could spend a lot of time just coming up with the image coordinates for the corners of the table. If you rigidly mounted the camera and table, you could force those corners. There will be issues like that.

Determining which pixels are covered should be pretty simple.

Determining the pattern to do based on which pixels are covered isn’t so bad. But if you want it to look nice, it will take a few iterations.

It was either a very productive day or a very unproductive one, depending on how you think about things. Caught an early (10 am!) showing of Dune 2 (3 1/4 hours with commercials, gack), had a late lunch with my friends, and then thought I’d poke at OpenCV and see how far I could get. It actually worked out pretty well.

The idea is to pull the table out of the background of an image (and fortunately my table has a 2 inch black border all around which gave me something concrete to latch onto). Then I move inside that border and get a bounding box that isolates the sand. Once I have that I can mask out all of the distractions and highlight just the objects on the table (and I picked dark ones), and use that to get bounding boxes around them. Then a bit of math to scale the pixel counts from the image to the known size of the table, and transform the coordinate reference frame to be correct for gcode.

I put up a video of the first live test here :

There’s a bit of smoke and mirrors in that I took a picture on my phone and uploaded it to the computer before I hit go on the script, but that’s not the part I’m worried about automating anyway. All the rest of it is legit.

It looks like I got the math a bit wrong (I had to adjust the x-axis frame of reference, and I think instead of adding a visual buffer on each side of the object I effectively subtracted one), and I’m a little bit off center on the y, but those should be pretty easy fixes. I should also sort the objects in order from 0,0 to avoid too much backtracking.

Other than that I’m really happy with how its turning out. And Dune was really pretty good. So I’m still not sure if the day was productive or not but I had fun! :slight_smile:

Thank you!!!
Joel

3 Likes

In terms of real world applications, the crazy idea would be to tie a camera into a cnc router, let you draw out the patterns you want directly on your board, and have it just go ahead and cut them out for you…

You should check this thread out then…

1 Like

Yep, that’s fantastic. Thanks for the pointer to the thread!

Quick update, I worked on the code a bit to draw the exact shape of the object above it, and tuned it a bit to be accurate for my table. Still taking a picture with my cell phone and copying it to the computer.

I also pushed the code up to the same github repo as the chatgpt version:

GitHub - jscotkin/sandgpt: AI voice control of ZenXY sandtable

In other bad ideas I was looking at my pc game controller (steam controller) that I put there mostly because it was black, and I’m thinking it would be really easy to just have it joystick control the sand table like an etch-a-sketch…

Have fun and thanks again!
Joel

5 Likes

Excellent work? That process sounds very good and you did it in record time.