Hey folks!
I had a few hours to kill yesterday and did something fun. I put together a little pipeline that takes voice commands and asks ChatGPT to turn them into gcode for my sand table. From GPT-4’s point of view gcode is just another language, and he’s (I’m ok anthropomorphizing my AIs) really good at languages. Turned out to be much easier than I expected, which is a bit scary.
The pipeline (written in python) looks like this - prompt for a voice command, use the OpenAI Whisper API to convert it into text, then send it via their Completions API to GPT-4-Turbo with a system setup that tells him about the sand table and that he needs to generate gcode to follow the prompt. Then I’m just sending the gcode to the table via telnet.
I uploaded a video demo here : https://youtu.be/wyBQh8Gm9Xo .
You can check out the source code here : GitHub - jscotkin/sandgpt: AI voice control of ZenXY sandtable . The code is really minimal, was a quick prototype, and I don’t expect it would work out of the box for anyone else but it’s also quite short and should be easy to play with for anyone familiar with python.
I’m incredibly impressed with how well it works, including some fairly complex commands - “draw a series of circles along the Y axis”, “draw the olympic rings”, “draw the letters V1E”, etc. He struggled with more complex shapes - I was able to get “draw a five pointed star with 8 inch arms” to work interactively by asking him to double check his calculations, but was not able to get it work as a one-shot in this code. His instructions include asking him to double-check that the gcode will not push the marble outside the bounds of the table, but he often gets that wrong, particularly with arcs and circles.
Anyway, check out the video - I think the potential is really very cool. As the AI improves on math and spatial reasoning it will get much better at this kind of task. I certainly wouldn’t use it for CNC yet!
For the next test I’m thinking about calling the image processing version, sending it an overhead picture, and having it outline shapes on the table. You could imagine a coffee table where whenever you put down your glass or a book it automatically draws the outline of it, for instance. AFAICT it should be quite easy - GPT-4 is already quite good at parsing out images (you can give it photo of the contents of your fridge and ask for recipes it can make), and there really isn’t much more to it than that and talking to it in English.
Thanks and have fun!
Joel