First, we have to clarify what you really mean by "embedded systems".
There are many systems where something that is logically a PC is embedded. There is a whole industry around "single board computers" (SBCs). For a few 100 dollars, you can get a board that runs Windows or Linux, but is intended to be powered by and mounted in a larger system. On these systems, you program like you would a PC. While it is legitimate to call these embedded systems, it doesn't seem that's what you're asking about.
What you seem to be referring to are small resource-limited systems. These are usually microcontrollers that cost from a few 10s of cents to a few dollars. They have a fixed amount of memory, built-in peripherals, and most of the pins are dedicated directly to I/O. You usually program the hardware peripherals directly, as apposed to calling OS services to do something for you. These don't have a file system or operating system. In some high-end cases there may be a stripped-down RTOS. That's usually more to handle multiple tasks while maintaining real-time performance than other highly layered OS services. If you needed those, you'd probably use a SBC.
The main skill that is essential for the second type of embedded programming that can be largely ignored these days for other types of programming is to truly understand what the hardware is doing. This is not just knowing what the peripherals do, but also what high level language statements will ask of the hardware on your behalf.
For example, the simple C statement:
circ = dia * 3.141592654;
isn't much to think about on a PC. However, this could be a huge mistake on a small microcontroller. You have to be aware that CIRC and DIA are floating point numbers. Does the micro handle floating point in hardware? Probably not. This one seemingly innocent statement will drag in parts of the floating point library. Suddenly your program memory requirements jump by hundreds of bytes for no apparent reason if you don't understand what's going on under the hood. Also, those variables will probably take up 4 bytes each. Maybe the memory requirements are OK, but what about the speed? It may kill the whole system if you try to do floating point math in a high speed interrupt routine. And what about the numeric constant? Will the compiler consider that a 4-byte "single" precision value, or does it default to 8-byte "double" precision? If the constant is double precision, what precision will the floating point multiply be carried out in?
The above is just one small example showing that you need to know what's going on at the low level. The best way to learn that is to write a few projects in assembler. You can't avoid the low levels that way. Once you've done a few projects in assembler, you can go back to your high level language. But now you'll have a much better idea what is actually going on. You'll also have the skill to write critical sections in assembler when the need arises. Do you need a double-pole filter for your 500k samples/second A/D readings? Even on a 70 MIPS dsPIC, that's only 140 instructions for each A/D reading. I'd want to write that interrupt routine in assembler, then count instructions to make sure there are still enough left for the foreground code. That foreground code can then be written in a HLL since it's probably not so speed-critical.
If you haven't done enough assembler to feel comfortable with it, you'll be more clumsy trying to solve these kinds of problems other ways that probably cost your customer or employer more money.