Are you collecting those via the ESP32 Web UI, or via FluidTerm?
FluidTerm would exercise the path out the CP2102. The Web UI would not.
That almost looks to me like lines of output are getting dropped, besides possible serial IO corruption to the TMCs.
There’s plenty of UARTs at play here, not just GPIO0.
NOTE: This isn’t just communication to the steppers that is being impacted (the GPIO 0 stuff), it’s the output from the Jackpot back to whichever interface fired the $ss into FluidNC.
This is my question too. I assumed you were doing it over the wifi and it was somewhere in the network stack that was dropping whole packets. That would explain entire lines getting dropped. Jono is focused on UART, probably because of the boot pin discussion. But that is UART with the drivers.
True, but I’m focused on the UART because of the discussion of these being ‘bad ESP32s’ and there being some form of hardware related issue.
The whole dropping entire lines thing is a bit odd and definitely points towards some form of software issue, whether that be lines getting packetized as full lines and then dropped, lines being assembled and not finished/sent before something flushes/refills a buffer, whatever else.
Whatever it is, it isn’t the UART or lines in a hardware sense, none of that hardware ‘sees’ things as lines so it cannot care.
It could well still be some form of hardware issue that’s poorly handled in software causing a whole host of other issues, but I’d wager healthy money on it not being a UART issue.
WebUI. These are all tested with drivers in place and USB unplugged because of the previous errors with some only booting properly with the USB cable in place.
I almost got the Logic analyzer figured out last night, I was seeing data, but it seems that the boot pin is only used at boot. There is an initial bust of info, and then nothing (unless I am doing something wrong). So I will plug in the rest of the data pins today. I thought the TX and RX lines were used more, seems they are not what I thought. So I am guessing that is just for flashing?
That means the issue is either another line/pin, or, like you are saying heffe, something else entirely.
WebUI is another layer that could drop whole lines. If the problem of not booting without USB is separate from the $ss problem then it would be worth seeing what $ss does over USB to see if it is different. Maybe hardware is the root cause but it seems like software is contributing too.
The booting sequence. Only time pin4 is talking, pin7 gets a blip
0=gpio4
1=pin4
2=pin16
3=pin17
4=pin18
5=pin19
6=pin22
7=pin21
99% sure this list is right, the numbers are off on the label so this is me moving them in my head.
I will swap out the pin with no info (gpio19 MISO) and Check pin 23 later MOSI. I am guessing these are just SD card stuff though, and not going to show any info unless I read the card.
Using the webui to jog looks like this. Slightly different for each axis.
$ss doesn’t show like that so I am guessing it is buried in the 2 or 6 data. So I can dig deeper into that…and check those pins on the scope? Not having this plugged into the Jackpot means no answers are coming back but it still shows bad $ss sometimes.
So much info to dig into.
Learning a new tool is fun.
Learning that I know very little about how the Jackpot actually works is weird.
If you’re running $ss from the Web UI, I wouldn’t expect much to appear in these at all.
If I understand the architecture correctly, the WebUI runs in one core of the ESP32 while FluidNC proper is running in the other core. The WebUI would be pulling the startup log out of an in-memory buffer somewhere.
I’m curious now if the bad ESP32s that are giving you wildly different results are actually rebooting some of the time when running $ss.
For the bad ones, I’d really like to see how it compares when run from the WebUI vs when run from FluidTerm
The terminal in the WebUI, if I’m not mistaken, runs off of the WebSocket connection.
I’m not certain, but I think I remember seeing commits about dropping messages in the WebSocket if it thinks the connection is overloaded.
I’m not sure of the details of that implementation, but it could be possible that if it’s trying to send a bunch of messages very quickly, some could get dropped.
I would think FluidTerm would be a much more reliable source
Fluidterm I believe works fine. When loading the firmware I do a reboot and watch the terminal and have never seen an issue there… That I can remember. Now that I have a know horrible one, I can test more…and check some of the other ones more. That is why I assumed it had to do with Pin zero. I figure the pull up might have been messing with the signal. Seems the signal is not actually on that pin so my theory is out the window.
Well the good news is that you have identified a collection of repeatably bad behaving boards.
That means it’s only a matter of time before we debug what thing(s) cause the trouble- and then we can either fix it or at least know what to avoid.
These are two different things. The bad boards have something going on that causes the Web UI to drop whole lines of output being sent from $ss. The thing that is causing that may be related to whatever is going on over in the FluidNC side of things which is causing motion or other functionality to fail.
Okay I think I Understand that. Slightly funky board has $ss issues, really funky boards have bigger issues in addition to, and may or may not be related to $ss issues.
Something we should try (on a hunch)…
Get one of your worst behaving ESP32 boards. De-solder the AMS1117. Replace it with something that doesn’t have a really slow transient response.
UA78M33CDCY, SGM2212-3.3, maybe someone can find an alternate in the sweet spot between good performance and low cost.
If you want a smaller intermediate step, Maybe take a good look at the caps around either side of that board’s regulator. We want to find the equivalents to C8 / C9 on your prototype ESP32 Dev board.
Replace those with good 22uF 10v ceramic caps.
If you’d rather use some scope troubleshooting, we should set up to trigger the scope on the ESP32 3.3V rail on the falling edge below some point in the brownout/reset region. I bet on the bad ESP32s you see that there are issues. (Still a hunch, but depending on what you see here you might do comparisons with other ESP32s that don’t misbehave- or even before/after if you swap regulators.)
Let me take a look with the scope, and see if I see anything. I am not so good with hot air reworking, and I would not be able to get any of those components for a while anyway.
Interesting perspective. I didn’t even think a brown out was a possibility.
No significant sags in voltage, maybe 0.01V if I really payed attention. The bad one has the lowest average voltage of 3.27V and the lowest min of 3.22V. The data sheet says the esp is good down to 3V though, as high as 3.6. A few hundredths of a volt should not matter? The genuine had an average of 3.3V and a low of 3.28, a good clone avg 3.31 low of 3.24.
Pin 17 I2s-ws is the only one I could see a signal and while it was not a perfect square wave it matched the genuine exactly.
A brown out would reset the wifi stuff. It should be pretty noticeable.
My money is on a bad antenna or something related to the 2.4GHz. I bet 9/10 packets are being dropped, and the Websocket/tcp is retrying enough to get most things through, but some stuff is being dropped (3 times is the typical max for retries). In Linux, the dropped counts are in the output from ifconfig. I don’t know if there is a similar way to read that from the network layer in esps or windows.
In a healthy connection, a couple of dropped packets is abnormal. In a bad connection, it will be a dozen or so per minute, at least.
That would be separate from the wifi signal I get?
I can still use an external antenna version fine with no antenna showing one bar signal from pretty far away with no errors. The bad one shows full signal and I am inches away.