Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Papers

Troubleshooting a microcontroller that has stopped responding

+3
−0

When working with microcontrollers, one will sooner or later face the situation where the MCU is not responding.

Either it's a new PCBA you are trying to get up and running, or something that previously worked then something happened and now you are stuck. The in-circuit debugger is typically giving you some unhelpful message like "cannot communicate with the target".

This could of course have any number of causes. This post is meant as a troubleshooting checklist for a generic MCU, where the MCU and its programming interface is assumed to be at least somewhat modern, with on-chip flash memory. (Troubleshooting microcontrollers from before the flash era is a whole chapter of its own.)


Suspect #1 - supply voltage

Did you plug in the power? This is surprisingly often the cause. Engineers have a tendency to suspect the most complex error scenario they can come up with and then they simply forgot to turn on their bench supply or something silly like that.

Measure the voltages to your on-board regulator, both the voltage in and the voltage out. These obviously need to be clean, stable and of the expected levels. Make sure the regulator is stable and that there are no oscillations.

In case you have multiple voltage sources then make sure these go live at the same point in time. For example if there is a main logic supply and one specialized analog supply. If the microcontrollers Vdd pin is low while some Vref analog supply pin has voltage present, the MCU might go in some latch-up mode or voltage supervisor reset. So if you have things like LDO or shunt voltage references to the analog supply, make sure these are derived from the main voltage regulator's output and not from its input.

High voltages are bad - you killed the MCU. If you at some point managed to feed the MCU or one of its pins with a higher voltage beyond specification, then it is very likely toasted now. Most often some internal protection diode gets fried, possibly along with the hardware peripheral it belongs to. The MCU can then otherwise work fine, but you'll experience an unexpectedly high current draw and the physical part gets very hot. If this happens - tough luck, replace the MCU.

Suspect #2 - cables, connectors and soldering

Is the debug connector attached backwards? This might actually be the #1 most common reason of all. One of the dumbest ideas I've ever had many times in the past was to use connectors that don't come with a mechanical guidance to ensure correct direction. Just some double row header strips with a pin 1 marking in silk screen at best, which means that you will always manage to put the connector backwards on those. Over and over again. At least use so-called "box headers". They don't add any cost but will save you so much time trouble-shooting. If board space is tight, well then use some custom board-to-wire connector that can't be connected the wrong way.

Ribbon cables are error prone. Unfortunately the vast majority of in-circuit debuggers come with one flavor of ribbon cable or the other. These aren't really meant to be connected over and over like we tend to do while programming a microcontroller. And so they will eventually fail.

In case you have the classic 2.54mm (0.1″) pitch connectors, these are at least somewhat rugged in comparison with most wire-to-board ones. You should be getting one with mechanical strain relief on the cable, so that tugging at the cable doesn't cause strain at the connector. If you happen to have a regular IDC connector without strain relief, then these will eventually break sooner or later.

Particularly horrible is the standard 1.27mm 2x5 connectors for SWD/ARM microcontrollers. These break quickly and subtly - you may not visually notice that the connector has given up the ghost. They can also give in intermittently so that they sometimes work, sometimes not. The general advise is to not use the 1.27mm 2x5 interface at all, because it is simply not suitable neither for debugging nor production.

Also keep in mind that programming interfaces were not designed for ridiculously long distances. So if you built some home made 5m cable between your PC and the lab, think again. Wires between the in-circuit debugger and the board should be some 200-300mm at most.

Always suspect bad soldering. At a glance, some fine pitch part like your debug connector or the MCU itself might look ok with the naked eye. If you take a closer look at it using a microscope or at least a magnifying glass, you might notice that a pin just sits on top of a pad with no solder present. This in turn could be caused by some solder paste problem or that the pad is badly designed - connected directly to a big copper pour without thermal relief around it.

Also check soldering of crystals - the good old HC49 SMD crystals have a bad reputation in particular since these come with diverse pad layouts that aren't always ideal and tend to soak up heat, potentially leading to cold joints.

Like with bad connectors, bad soldering might also manifest itself intermittently. A good way to provoke bad soldering problems is to temperature cycle the part. Bad solder joints tend to fail in cold temperatures in particular. Even if you have no professional equipment for this, you could just toss the board into a regular freezer at some -15 to -20°C.

(Be aware of condensation when you remove it, however, as all moisture in the air tends to draw towards the cold parts. So you have a short time window to test before this happens. A powered PCBA covered in water is obviously going to fail very soon.)

BGA and QFN parts are particularly nasty to troubleshoot - if you designed in such parts then ensure that the assembly contractor has X-ray inspection.

If you have found a part with bad soldering, then applying generous amounts of flux manually and then briefly heating it with a solder rework station usually does the trick.

Checking that pin #1 on the IC and connector is where it ought to be is also a good idea. And that the right part got mounted - one interesting problem I got at one point was when the assembly contractor somehow managed to swap positions between the MCU and some RFIC. All LQFP48 look the same, right?

Suspect #3 - the physical programming interface

Probe the signals. Having checked all of the above where applicable, the next step is to fire up the good old oscilloscope. (If you don't have one, now is the time to get one - it is a mandatory tools for any electrical engineer or embedded systems programmer.) Settle for your MCU voltage levels, 3.3V etc, and a reasonably fast speed of some 10µs/square or such - not important.

Open up some PC tool that you are using to program the MCU. It could be a debugger or a production programming one. Then while sending a command from the PC such as download/erase MCU, check the signals of the programming interface. Depending on what interface you got, these are a bit different. But generally you have at least one data line, some manner of clock, a reset line, a voltage reference and ground.

When issuing a program command from the PC, there should be some manner of digital activity on the data, clock and reset lines - we need not know anything about this protocol or even the baudrate, just look for square wave digital signals. Follow these signals with your scope probe from the connector all the way to the MCU pin and make sure they are there all the way.

Similarly, measure the voltage reference pin and ground and make sure both of these are present on the debug connector. Most in-circuit debuggers work by checking the voltage reference on the target and adapting its own signal levels accordingly. A few of them also support drawing supply from the target, in which case you have to provide enough power on the correct pin - not to be confused with the voltage reference pin.

And if you do the other way around - supply the board from the debugger, then be very careful with where you attach that supply. One classic mistake is to take the clean 5V supply from the debugger/USB etc and connect it directly to your 5V net. Forgetting the reason you are supplying the board through the debugger in the first place, namely that you don't want to connect supply in the usual way during programming/debugging. Now of course your on-board voltage regulator is truly going to hate getting 5V on its output pin while it got 0V on the input pin. It will get damaged from that. So ideally connect the supply from the debugger to your regulator's input. Or as a dirty compromise, but a Schottky in reverse across the regulator so that it has an input of some 0.2V lower than the output and maybe that will put the voltages within specified limits.

Pay extra attention to the reset pin. Most MCUs work with a reset pin that is active low, meaning that in the idle state you should be measuring for example 3.3V there. The program interface will attempt to pull this low before programming and you need to follow that signal with your scope from the connector to the MCU pin like the rest of the signals.

Older MCUs might have an external voltage supervisor circuit attached to the reset pin, in which case you have to go through that one as well and making sure it works as expected. Olders MCUs may also require an external pull-up resistor and even on those with a built-in one, you'll still see external ones just to be sure.

Important: verify that the right decoupling capacitor is attached on the reset pin! The MCU manufacturer should specify the value. Many parts work with 100nF but some require lower values to guarantee steep signal edges. So if you put 100nF there while the recommendation is 100pF, you will get very strange error phenomenon that are very hard to track down.

Dumb designs often have a tactile switch attached directly to the reset pin without a series resistor in between. If this is the case on your board - congratulations, you now have good reasons to suspect ESD damage. To protect against ESD damage or general nastiness from the outside world, series resistors is a great cheap fix for many such problems.

The reset pin should be stable when you aren't trying to program it. 5V when idle, 0V when running.

In case you have the kind of MCU where the reset pin works an I/O, you'll also see if the MCU is rebooting by probing the reset pin. In that case, the reset cause might be one of several including the watchdog, clock monitor, low voltage reset etc etc.

All MCUs come with a built-in watchdog and yes you should be aware of this and that it needs to be fed regularly through software. This ought to be mentioned in introductory learning materials about microcontrollers, but still trips beginners often enough. The watchdog is often disabled by default, or it can be disabled in your debug build and enabled in release build. Or you did something by accident and now it is enabled. This will cause the MCU to reset if the watchdog isn't fed regularly.

An enabled watchdog will not cause the MCU to block you from reprogramming it however, though it may or may not block any debugging efforts.

Depending on the nature of the MCU watchdog, an attached debugger may or may not be able to keep up. The watchdog clock is/ought to be separate from the main clock, so if you freeze the program at a breakpoint, then the watchdog might go ahead and reset the MCU in the meantime, unless told to also stop during debugging. Older MCUs may not even have that option, so once you've let the dog out, you are stuck with it.

Suspect #4 - the clock

Check that the oscillator is running. If you have external quartz or a MCU pin where the system clock is available, you could measure that it works as intended with your scope. Keep in mind that the probe in this case will introduce capacitance to the read. And if you have a budget scope then it might not be able to reach high enough frequencies. The main thing to check is however if there's some clock alive there at all when you expect it to be. With some oscillator mishap caused by wrong component choices, bad layout or soldering, the MCU won't run.

Also keep in mind that crystals are somewhat sensitive to physical shocks, so dropping a PCB or the crystal itself on the floor might damage it.

Most modern MCUs boot on an internal RC oscillator however, whether you intend the final product to use that or not. But if you have written start-up code that switches to a clock requiring external quartz or an external oscillator, then everything will come to a halt if the oscillator isn't running.

The internal RC oscillator often requires some manner of external tuning along with the program download however. That's highly MCU-specific and good tools can take care of most of this for you, but that's one thing to look closer at.

Keep in mind that many MCUs have flash with wait states when writing the clock setup code. Meaning it can't keep up when the system clock is raised to the skies using a PLL. If so there should be a wait state register which you should write to before changing the clock. Otherwise this too could cause a sudden unexpected halt when the clock is selected. You'll naturally have to consult the specific MCU manual for this.

Suspect #5 - the in-circuit debugger

Ok so if we got all the way here we might have a serious and highly tool-specific problem. Physical in-circuit debuggers do not break very often in my experience, although USB technology was always on the "plug & pray" side of things. Since this little guide was meant to be general, I can't give specific tool advise. Apart from the previously mentioned oscillator tuning, there are a few usual suspects however.

Copycat protection might be live. All modern MCUs have a feature preventing access to the flash by those who shouldn't meddle with it. Unless some backdoor access was designed in, the only way to deal with this is typically to erase the MCU. Which shouldn't be a big deal if you are sitting on the source yourself, after all. Certain tools and MCUs might have a special option to erase/unsecure the MCU.

PC problems. Certain in-circuit debuggers might have their PC software blocked by your firewall, particularly those that have an Ethernet interface on-board.

Suspect #n - unlikely causes

ESD damage is real - kind of rare these days, but when it happens, it is nasty and it is notoriously hard to trouble-shoot. Modern MCUs can take quite a beating with internal protection diodes, but if you have handled the parts or PCBA carelessly, this might still happen. General anti-ESD precautions is the key. Simple things like avoiding to poke at the board with your fingers or to always touch a ground or chassis part the first thing you do when picking up the board might be simple ways of saving you from trouble-shooting head aches.

Bad parts. Maybe you bought the parts from some shady dealer or maybe you happened to get genuine parts but they are either factory rejects or an early silicon mask. Always read the MCU errata to see if there are early masks you better avoid. If you are sitting on bad parts, well... there's not so much you can do. It is rare, but I've sometimes gotten bad parts even when dealing through trusted vendors and the problem is the silicon manufacturer's side. It is not the first thing I would suspect, but it can happen.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Work in progress (3 comments)