OpenOCD
|
An optional uncached and unmapped debug segment dseg (EJTAG area) appears in the address range 0xFFFF FFFF FF20 0000 to 0xFFFF FFFF FF3F FFFF. The dseg segment thereby appears in the kseg part of the compatibility segment, and access to kseg is possible with the dseg segment.
The dseg segment is subdivided into dmseg (EJTAG memory) segment and the drseg (EJTAG registers) segment. The dmseg segment is used when the probe services the memory segment. The drseg segment is used when the memory-mapped debug registers are accessed. Table 5-2 shows the subdivision and attributes for the segments.
dseg is divided in :
Because the dseg segment is serviced exclusively by the EJTAG features, there are no physical address per se. Instead the lower 21 bits of the virtual address select the appropriate reference in either EJTAG memory or registers. References are not mapped through the TLB, nor do the accesses appear on the external system memory interface.
Both of this memory segments are Uncached.
On debug exception (break) CPU jumps to the beginning of dmseg. This some kind of memory shared between CPU and EJTAG dongle.
There CPU stops (correct terminology is : stalls, because it stops it's pipeline), and is waiting for some action of dongle.
If the dongle gives it instruction, CPU executes it, augments it's PC to 0xFFFF FFFF FF20 0001 - but it again points to dmseg area, so it stops waiting for next instruction.
This will all become clear later, after reading following prerequisite chapters.
Indicates read or write of a pending processor access:
This value is defined only when a processor access is pending.
Processor will do the action for us : it can for example read internal state (register values), and send us back the information via EJTAG memory (dmseg), or it can take some data from dmseg and write it into the registers or RAM.
Every time when it sees address (i.e. when this address is the part of the opcode it is executing, whether it is instruction or data fetch) that falls into dmseg, processor stalls. That actually means that CPU stops it's pipeline and it is waiting for dongle to take some action.
CPU is now either waiting for dongle to take some data from dmseg (if we requested for CPU do give us internal state, for example), or it will wait for some data from dongle (if it needs following instruction because it did previous, or if the operand address of the currently executed opcode falls somewhere (anywhere) in dmseg (0xff..ff20000 - 0xff..ff2fffff)).
Bit PNnW describes character of CPU access to EJTAG memory (the memory where dongle puts/takes data) - CPU can either READ for it (PNnW == 0) or WRITE to it (PNnW == 1).
By reading PNnW bit OpenOCD will know if it has to send (PNnW == 0) or to take (PNnW == 1) data (from dmseg, via dongle).
Indicates a pending processor access and controls finishing of a pending processor access.
When read:
A write of 0 finishes a processor access if pending; otherwise operation of the processor is UNDEFINED if the bit is written to 0 when no processor access is pending. A write of 1 is ignored.
A successful FASTDATA access will clear this bit.
As noted above, on any access to dmseg, processor will stall. It waits for dongle to do some action - either to take or put some data. OpenOCD can figure out which action has to be taken by reading PrAcc bit.
Once action from dongle has been done, i.e. after the data is taken/put, OpenOCD can signal to CPU to proceed with executing the instruction. This can be the next instruction (if previous was finished before pending), or the same instruction - if for example CPU was waiting on dongle to give it an operand, because it saw in the instruction opcode that operand address is somewhere in dmseg. That provoked the CPU to stall (it tried operand fetch to dmseg and stopped), and PNnW bit is 0 (CPU does read from dmseg), and PrAcc is 1 (CPU is pending on dmseg access).
Shifting in a zero value requests completion of the Fastdata access.
The PrAcc bit in the EJTAG Control register is overwritten with zero when the access succeeds. (The access succeeds if PrAcc is one and the operation address is in the legal dmseg segment Fastdata area.)
When successful, a one is shifted out. Shifting out a zero indicates a Fastdata access failure. Shifting in a one does not complete the Fastdata access and the PrAcc bit is unchanged. Shifting out a one indicates that the access would have been successful if allowed to complete and a zero indicates the access would not have successfully completed.
The width of the Fastdata register is 1 bit.
During a Fastdata access, the Fastdata register is written and read, i.e., a bit is shifted in and a bit is shifted out.
Also during a Fastdata access, the Fastdata register value shifted in specifies whether the Fastdata access should be completed or not. The value shifted out is a flag that indicates whether the Fastdata access was successful or not (if completion was requested).
OpenOCD reads/writes data to JTAG via mips_m4k_read_memory() and mips_m4k_write_memory() functions defined in src/target/mips_m4k.c. Internally, these functions call mips32_pracc_read_mem() and mips32_pracc_write_mem() defined in src/target/mips32_pracc.c
Let's take for example function mips32_pracc_read_mem32() which describes CPU reads (fetches) from dmseg (EJTAG memory) :
We have to pass this code to CPU via dongle via dmseg.
After debug exception CPU will find itself stalling at the beginning of the dmseg. It waits for the first instruction from dongle. This is MIPS32_MTC0(15,31,0), so CPU saves C0 and continues to addr 0xFF20 0001, which falls also to dmseg, so it stalls. Dongle proceeds giving to CPU one by one instruction in this manner.
However, things are not so simple. If you take a look at the program, you will see that some instructions take operands. If it has to take operand from the address in dmseg, CPU will stall waiting for the dongle to do the action of passing the operand and signal this by putting PrAcc to 0. If this operand is somewhere in RAM, CPU will not stall (it stalls only on dmseg), but it will just take it and proceed to next instruction. But since PC for next instruction points to dmseg, it will stall, so that dongle can pass next instruction.
Some instructions are jumps (if these are jumps in dmseg addr, CPU will jump and then stall. If this is jump to some address in RAM, CPU will jump and just proceed - will not stall on addresses in RAM).
To have information about CPU is currently (does it stalls wanting on operand or it jumped somewhere waiting for next instruction), OpenOCD has to call TAP ADDRESS instruction, which will ask CPU to give us his address within EJTAG memory :
And then, upon the results, we can conclude where it is in our code so far, so we can give it what it wants next :
i.e. if CPU is stalling on addresses in dmseg that are reserved for input parameters, we can conclude that it actually tried to take (read) parameter from there, and saw that address of parameter falls in dmseg, so it stopped. Obviously, now dongle have to give to it operand.
Similarly, mips32_pracc_exec_write() describes CPU writes into EJTAG memory (dmseg). Obviously, code is RO, and CPU can change only parameters :
CPU loops here :
and using presented R (mips32_pracc_exec_read()) and W (mips32_pracc_exec_write()) functions it reads in the code (RO) and reads and writes operands (RW).
OpenOCD FASTDATA write function, mips32_pracc_fastdata_xfer() is called from bulk_write_memory callback, which writes a count items of 4 bytes to the memory of a target at the an address given. Because it operates only on whole words, this should be faster than target_write_memory().
In order to implement FASTDATA write, mips32_pracc_fastdata_xfer() uses the following handler :
In the beginning and the end of the handler we have function prologue (save the regs that will be clobbered) and epilogue (restore regs), and in the very end, after all the xfer have been done, we do jump to the MIPS32_PRACC_TEXT address, i.e. Debug Exception Vector location. We will use this fact (that we came back to MIPS32_PRACC_TEXT) to verify later if all the handler is executed (because when in RAM, processor do not stall - it executes all instructions until one of them do not demand access to dmseg (if one of it's operands is there)).
This handler is put into the RAM and executed from there, and not instruction by instruction, like in previous simple write (mips_m4k_write_memory()) and read (mips_m4k_read_memory()) functions.
N.B. When it is executing this code in RAM, CPU will not stall on instructions, but execute all until it comes to the :
and there it will stall - because it will see that one of the operands have to be fetched from dmseg (EJTAG memory, in this case FASTDATA memory segment).
This handler is loaded in the RAM, at the reserved location "work_area". This work_area is configured in OpenOCD configuration script and should be selected in that way that it is not clobbered (overwritten) by data we want to write-in using FASTDATA.
What is executed instruction by instruction which is passed by dongle (via EJATG memory) is small jump code, which jumps at the handler in RAM. CPU stalls on dmseg when receiving these jmp_code instructions, but once it jumps in RAM, CPU do not stall anymore and executes bunch of handler instructions. Until it comes to the first instruction which has an operand in FASTDATA area. There it stalls and waits on action from probe. It happens actually when CPU comes to this loop :
and then it stalls because operand in r8 points to FASTDATA area.
OpenOCD first verifies that CPU came to this place by :
and then passes to CPU start and end address of the loop region for handler in RAM.
In the loop in handler, CPU sees that it has to take and operand from FSTDATA area (to write it to the dst in RAM after), and so it stalls, putting PrAcc to "1". OpenOCD fills the data via this loop :
Each time when OpenOCD fills data to CPU (via dongle, via dmseg), CPU takes it and proceeds to execute the handler. However, since the handler is in an assembly loop, CPU comes to next instruction which also fetches data from FASTDATA area. So it stalls. Then OpenOCD fills the data again, from its (OpenOCD's) loop. And this game continues until all the data has been filled.
After the last data has been given to CPU it sees that it reached the end address, so it proceeds with next instruction. However, this instruction do not point into dmseg, so CPU executes bunch of handler instructions (all prologue) and in the end jumps to MIPS32_PRACC_TEXT address.
On its side, OpenOCD checks in CPU has jumped back to MIPS32_PRACC_TEXT, which is the confirmation that it correctly executed all the rest of the handler in RAM, and that is not stuck somewhere in the RAM, or stalling on some access in dmesg - that would be an error:
The width of the Fastdata register is 1 bit. During a Fastdata access, the Fastdata register is written and read, i.e., a bit is shifted in and a bit is shifted out. During a Fastdata access, the Fastdata register value shifted in specifies whether the Fastdata access should be completed or not. The value shifted out is a flag that indicates whether the Fastdata access was successful or not (if completion was requested).
The FASTDATA access is used for efficient block transfers between dmseg (on the probe) and target memory (on the processor). An "upload" is defined as a sequence of processor loads from target memory and stores to dmseg. A "download" is a sequence of processor loads from dmseg and stores to target memory. The "Fastdata area" specifies the legal range of dmseg addresses (0xFF20.0000 - 0xFF20.000F) that can be used for uploads and downloads. The Data + Fastdata registers (selected with the FASTDATA instruction) allow efficient completion of pending Fastdata area accesses. During Fastdata uploads and downloads, the processor will stall on accesses to the Fastdata area. The PrAcc (processor access pending bit) will be 1 indicating the probe is required to complete the access. Both upload and download accesses are attempted by shifting in a zero SPrAcc value (to request access completion) and shifting out SPrAcc to see if the attempt will be successful (i.e., there was an access pending and a legal Fastdata area address was used). Downloads will also shift in the data to be used to satisfy the load from dmseg’s Fastdata area, while uploads will shift out the data being stored to dmseg’s Fastdata area. As noted above, two conditions must be true for the Fastdata access to succeed. These are:
Basically, because FASTDATA area in dmseg is 16 bytes, we transfer (0xFF20.0000 - 0xFF20.000F) FASTDATA scan TAP instruction selects the Data and the Fastdata registers at once.
They come in order : TDI -> | Data register| -> | Fastdata register | -> TDO
FASTDATA register is 1-bit width register. It takes in SPrAcc bit which should be shifted first, followed by 32 bit of data.
Scan width of FASTDTA is 33 bits in total : 33 bits are shifted in and 33 bits are shifted out.
First bit that is shifted out is SPrAcc that comes out of Fastdata register and should give us status on FATSDATA write we want to do.
Download flow (probe -> target block transfer) :
1) Probe transfer target execution to a loop in target memory doing a fixed number of "loads" to fastdata area of dmseg (and stores to the target download destination.)
2) Probe loops attempting to satisfy the loads "expected" from the target. On FASTDATA access "successful" move on to next "load". On FASTDATA access "failure" repeat until "successful" or timeout. (A "failure" is an attempt to satisfy an access when none are pending.)
Note: A failure may have a recoverable (and even expected) cause like slow target execution of the load loop. Other failures may be due to unexpected more troublesome causes like an exception while in debug mode or a target hang on a bad target memory access.
Shifted out SPrAcc bit inform us that there was CPU access pending and that it can be complete.
Basically, we should do following procedure :
If SPrAcc is 0 then no access was pending to the fastdata area. (Repeat attempt to complete the access you expect for this data word. Timeout if you think the access is "long overdue" as something unexpected has happened.)
If SPrAcc is 0 then no access was pending to the fastdata area. (Repeat attempt to complete the access you expect for this data word. Timeout if you think the access is "long overdue" as something unexpected has happened.)
Basically, if checking first (before scan) if CPU is pending on FASTDATA access (PrAcc is "1"), like this
which is inefficient, we should do it like this :
by checking SPrAcc that we shifted out.
If some FASTDATA write fails, OpenOCD will continue with it's loop (on the host side), but CPU will rest pending (on the target side) waiting for correct FASTDATA write.
Since OpenOCD goes ahead, it will eventually finish it's loop, and proceed to check if CPU took all the data. But since CPU did not took all the data, it is still turns in handler's loop in RAM, stalling on Fastdata area so this check :
fails, and that gives us enough information of the failure.
In this case, we can lower the JTAG frequency and try again, because most probable reason of this failure is that we tried FASTDATA upload before CPU arrived to rise PrAcc (i.e. before it was pending on access). However, the reasons for failure might be numerous : reset, exceptions which can occur in debug mode, bus hangs, etc.
If lowering the JTAG freq does not work either, we can fall back to more robust solution with patch posted below.
To summarize, FASTDATA communication goes as following :