What are the most common causes of CAN bus communication errors?
I can't get my CAN bus application (ISO 11898-2, classic "high speed CAN") working and I'm receiving various CAN error frames. These error frames manifest themselves as sequences of 6 bits with my selected bitrate, either high or low.
Alternatively there is a "bus off" error and nothing works at all - my CAN node is completely silent.
What are the common causes of communication errors and error frames in a CAN network/CAN application and how do I solve them?
1 answer
One rule of thumb is that a CAN network should never have any error frames when it is up and running. It is expected to have some when all nodes are "waking up", or if you hot plug something to the live bus, but other than that you shouldn't be seeing them or there are fundamental hardware or software problems in the network.
Here are some of the most common problems in CAN networks:
-
There must be at least 2 active nodes
A common beginner problem is to buy some single evaluation board, then start to send out CAN frames with nobody listening to them. In order for CAN to function as expected, you need at least 2 nodes on the bus that actively respond with acknowledge that a frame was successfully transmitted over the bus, by pulling the ACK slot bit at the end of the CAN frame to the dominant level. If nobody on the bus fills in this ACK bit, the transmitter will increase its error counter and try again, until it goes into error passive mode.During an evaluation/implementation phase of a project, before the complete hardware is available, CAN controllers have the option to be used in a "loopback" mode where they only listen to themselves and not on the live bus. This can also be used for trouble-shooting software.
-
There must be terminating resistors at the end of the bus
The ideal CAN "topology" is for the bus to be one long line, with terminating resistors in each end. The terminating resistors are there to prevent the signal energy from bouncing back into the data line, which in turn can create various noise phenomenon.Each resistor should ideally be 120 ohm, for a 60 ohm difference between CANH and CANL. Other values can be used and calculated based on specific baudrates and bus lengths - but, as you increase the baudrate, the pickier it gets about correct termination, whereas on lower baudrates you can get away with missing terminating resistors, by chance. So if you don't have termination, you could have a seemingly working system, then change baudrate, lengths, stubs etc and suddenly it stops working, resulting in bus off or overrun error frames.
The first thing you do when trouble-shooting a CAN bus should be to turn off the supply, then measure ~60 ohms resistance between CANH and CANL.
-
All data communication needs a common ground potential
This means that you should ideally include a signal ground with your CAN lines. The PCB designer may then route this ground so that it picks up as little noise as possible and can be used as reference to the CAN data lines. If you do not include such a dedicated ground, then your main supply ground becomes the reference. This is very unfortunate since you might have all manner of noise there. Particularly in applications with high current motors, valves, contactors etc, where all manner of spikes and surges may exist.There's also the fundamental electronics aspect: if you connect one end of CAN bus to electronics ultimately supplied from the mains VAC or a laptop battery, and the other end of the bus to a Faraday cage on rubber tires (a.k.a. a car), they will have nowhere near the same potential and nothing will work. CAN transceivers typically tolerate relatively large voltage differences on the CAN lines (typically some +/-40VDC absolute maximum), but it's fairly naive to believe that everything will always stay within this specified voltage diff tolerance forever. Obviously it will do you no good if the ground potential is several thousand volts off.
That being said, it is also possible and common to use galvanic isolation of the CAN bus (in which case a supply for the secondary needs to be provided among the signals).
-
Too long bus or too long stubs
The maximum recommended bus length depends on baudrate. At 1Mbps, 40m is the maximum (CAN In Automation, CANopen DS303-1 chp. 5.1). Lower baudrates means longer bus without the need for repeaters. The maximum allowed un-terminated stub cable length can be calculated variably according to a formula based on bit timing settings used (propagation segment and sync jump width), but as a rule of thumb it shouldn't be longer than 0.3m (1 foot). -
Wrong baudrate or bad clock
As with any form of data communication, all nodes on the same bus needs to use the same baudrate.
The baudrate also needs to be fairly accurate - the standard (CAN 2.0B chp 9) mandates at least 1.58% clock accuracy from each node. I'd personally recommend to keep it below 1%. Some rule of thumb from the standard is that when you need speeds of 250kbps or faster you should avoid RC oscillators/ceramic resonators and use quartz. If you are using a MCU with built-in RC oscillator, check the specified accuracy.
Furthermore, the CAN controller setting of the various time quanta that each bit consists of, needs to be configured so that the sample point of each bit is as close to the recommended 87.5% of the bit length as possible (CANopen DS301 chp 7.2). Ideally done by having 16 tq total, 14 before the sample point and 2 after it. This too is a consideration that needs to be done by the programmer when picking the clock source for the MCU - otherwise it might be impossible to support all standardized baudrates with sufficient accuracy. Getting this right is not trivial and usually the most complicated part when writing a CAN driver.
-
Bus load
The system needs to be designed so that bus loads are kept down as much as possible, ideally it should remain below 50% at all times. Motion control and mission critical systems typically utilize a real-time scheme where each message is sent periodically (typically once every 10ms or 100ms), sometimes within fixed time windows (Time-Triggered CAN etc). Such systems are designed to work on higher bus loads. While larger industrial networks with lots of nodes and no hard real-time deadlines typically just send messages when there is a change of data, thereby reducing the overall bus load. So what bus load to expect entirely depends on the application.A system with 100% bus load will mean that some nodes are subject to "starvation", never getting to send their lower priority identifiers. This will eventually lead to errors, but error frames cannot be sent out with 100% bus load.
Canonical sources for CAN information:
- The ISO 11898-2 standard, available from ISO. It is also available for free as fairly accurate early draft - look for "Bosch CAN 2.0B".
- CAN in Automation - a non-profit organization that maintains the CANopen application layer standard, among other things. Their site contains many great articles and study material.
- The CANopen standards. The most widely used protocol stack. On top of the CAN standard, it covers everything in the OSI model from physial to application layers. CAN connector pin-outs and similar that can't be found in the ISO CAN standard can be found in CANopen. Available for free if you register.
0 comment threads