CPU bug?

This section is for discussing everything about Next hardware and latest updates.
slingshot
Posts: 40
Joined: Mon Mar 22, 2021 12:21 pm

CPU bug?

Postby slingshot » Mon Mar 22, 2021 12:30 pm

During porting to MiST (not MiSTer), I discovered that in the game, The Train: Escape from Normandy, if you arrive at a bridge, a crash happens.
DannyPPC confirmed it's happening on a Next hardware, too.
Is it known? I have an .SNA snapshot (created in Next OS), which works on a real machine (and also in other FPGA cores).

Ped7g
Posts: 256
Joined: Mon Jul 16, 2018 7:11 pm

Re: CPU bug?

Postby Ped7g » Mon Mar 22, 2021 1:53 pm

Not known.
(CPU bug as in "Z80 bug" is quite unlikely, more likely it does output to some I/O port which triggers some unexpected Next-specific functionality, or it hits one of the extended Next instructions by accident, using reserved ED opcode wrongly ... but anything is possible in the end).

If you have the SNA, you can debug it and figure it out, or share it somewhere.

slingshot
Posts: 40
Joined: Mon Mar 22, 2021 12:21 pm

Re: CPU bug?

Postby slingshot » Mon Mar 22, 2021 2:16 pm

Yeah, that was my first thought that it hits some extended ED opcode, just wanted to ask if anybody meet with this before. I didn't debug yet (just discovered the issue), but here's the SNA to try.

BTW, the CPU used is still not 100% perfect, counting the undocumented flags, memptr, address left on the bus in non-bus cycles - the later is affecting the contention circuit. The one used for MiST cores has all of these fixed, and available here: https://github.com/mist-devel/T80. I wonder if is there an interest to merge these two forks (I can do some work in this direction if desired.)
Attachments
train.zip
(30.48 KiB) Downloaded 65 times

Ped7g
Posts: 256
Joined: Mon Jul 16, 2018 7:11 pm

Re: CPU bug?

Postby Ped7g » Mon Mar 22, 2021 2:31 pm

yeah, I'm pretty sure AA will read this and check the T80 from your repo.

I don't know VHDL, so I have no idea... I know the Next core has some fixes to make external peripherals work, fixing some cycle-order/stuff compared to the T80 source they did start with, but I don't know which one it was.

Also I know AA is not completely happy about the way how the Next extended opcodes were added to it, and he dreams about refactoring it one day to make it more coherent and integral with the rest of T80, but there's lot of more important stuff to do first, so this is more like "dream" ... maybe one day... (I'm mentioning it as a hint that he may be interested to check and incorporate your fixes, the T80 part of Next core is something what could get more attention later - it's rather just question of available dev-time, not question whether it should be done :) ).

I will check the SNA in emulators if I can spot something weird.

Alcoholics Anonymous
Posts: 775
Joined: Mon May 29, 2017 7:00 pm

Re: CPU bug?

Postby Alcoholics Anonymous » Mon Mar 22, 2021 8:14 pm

Yeah it's very unlikely there is a bug in the cpu - thousands of programs have been tested.

There is a bug in this game where it copies a few bytes on top of its isr address to change behaviour:

Code: Select all

F7F7:  C3 87 8A   JP 8A87
       FB         EI
       ED 4D      RETI

6BC4:  F3         DI
       2A CD 6B   LD HL,(6BCD)
       22 F7 F7   LD (F7F7),HL
       FB         EI
       C9         RET

6BCD:  FB         EI
       ED 4D      RETI
What it wants to do is occasionally overwrite the JP instruction at F7F7 with an "EI; RETI" but the programmer has failed to realize that RETI takes two bytes so in total three bytes need copying. Since he only copies two bytes, the ISR becomes EI then "ED 87" followed by "EI; RETI", the latter he probably added to stop the game from crashing. Many undefined opcodes are hit in the ED 8X opcode space depending on what the JP instruction destination address was. On fpga z80s these are effectively NOPs, on real nmos z80s this will be the case too.

However on the Next, ED 8A is a new instruction: "PUSH NNNN". And this is being created just as the bridge is crossed.

To fix this game, "EI RET" can be stored at 6BCD so that a two byte copy is sufficient to return from the interrupt.

A patched SNA is attached.

This is the first time I have seen a program fail because it hit on a new instruction. All the new instructions are tucked away in safe opcode locations that should only be used deliberately. Here we have a random bug that manages to use one.

We would need to introduce a mode for the z80 when the new instructions are active to fix this. This is something I would do if I rewrote the z80 implementation from scratch but as the T80, changes are fairly low priority at this time except to fix clear bugs. I dislike the T80 not just because of how it's implemented but also because introducing small changes often affects whether synthesis is going to be successful or not.
Attachments
train-fix.zip
(30.23 KiB) Downloaded 64 times
Last edited by Alcoholics Anonymous on Tue Mar 23, 2021 2:52 pm, edited 1 time in total.

slingshot
Posts: 40
Joined: Mon Mar 22, 2021 12:21 pm

Re: CPU bug?

Postby slingshot » Mon Mar 22, 2021 8:54 pm

Thanks for debugging it!

Yeah, it would be good to have a signal which disables the extra instructions for some more compatibilty, like the lock bit on 128k Spectrum.

I think it's not the T80's fault which makes the core unstable, but how it's used instead. The async top-level is not meant to be used in a SOC-like FPGA configuration, but for an application to replace an original chip. It's not a good design to use an async module, with switchable clocks in a middle of a large design. Using clock-enables is the way to go. Originally there's not a top-level for T80, which mimics fully the original Z80, e.g. to allow changing singnals on both edge of the clock, but years ago it was created for the MiST(er) FPGAs by Sorgelig, called T80pa, using dual clock-enables.
It's used in many cores already, and can achieve ~64MHz on the low-end Altera FPGAs. The drawback using it is that it requires a 2x master clock because of the dual CE signals, e.g. 56 MHz for replacing a 28 MHz T80a. And lot of other code must be rewritten to use the clock enables (which currently using the cpu_clk).

BTW, I've sent a PR to eliminate the latch used for the NEXTREG instruction:
https://gitlab.com/SpectrumNext/ZX_Spec ... requests/4
Last edited by slingshot on Mon Mar 22, 2021 9:00 pm, edited 1 time in total.

Alcoholics Anonymous
Posts: 775
Joined: Mon May 29, 2017 7:00 pm

Re: CPU bug?

Postby Alcoholics Anonymous » Mon Mar 22, 2021 8:59 pm

slingshot wrote:
Mon Mar 22, 2021 2:16 pm
BTW, the CPU used is still not 100% perfect, counting the undocumented flags, memptr, address left on the bus in non-bus cycles - the later is affecting the contention circuit.
The address bus is always driven -- there shouldn't be any time when it's not.

We know about the undocumented flags and memptr; these differences don't seem to affect anything except for the testing programs that look for them. The emulation community has taken them on to behave exactly like a particular version of z80 silicon and only (relatively) recently. I am not against implementing them, I just recognize the low importance.

The real issue is that we are at the limits of the fpga we have so we have to be careful about making changes that affect the size of the cpu on the fpga and that might affect propagation times. I personally dislike the T80 because it is not an efficient implementation in terms of gate size and that's what I feel like I am fighting all day long :D
The one used for MiST cores has all of these fixed, and available here: https://github.com/mist-devel/T80. I wonder if is there an interest to merge these two forks (I can do some work in this direction if desired.)
I am aware of that and if you are interested in trying to improve the implementation we have, that is welcome. The size and speed constraints we have are real though so keep in mind that even if a better implementation comes about, it may not be possible to incorporate.

There are some known issues:

1. Proper bus cycles are very important as the cpu signals are used to drive real hardware. Several fixes have already been applied to the T80 to fix some problems. Signals must transition on the correct clock edges.

2. Memory read cycles for some instructions are known to be incorrect, "BIT n,(IX+d)" possibly being one of them. This forced us to introduce one wait state for all memory read cycles at 28MHz and not just for the planned instruction fetch cycle. It's possible some instructions are read data from the bus on the wrong clock edge.

3. The interrupt response in the T80 is not correct. I don't think IM0 is even there (but it's not essential) and the interrupt vector in im2 mode is not read correctly. I suspect a couple of things: one is the IM2 response does not take 19 cycles and the other is the vector is read in the wrong T state during IM2 response. You can see in the microcode that LDZ is used to load the vector but LDZ gets its value from the bus at the end of T3 whereas the interrupt vector is supplied at the end of T2. I haven't had a chance to look into this yet as interrupt vectors are a new feature in the Next.

4. I know the changes to the T80 have made it asynchronous. Synchronous would be better as I am not sure if additional pressure on routing is going to come about and we have problems in that already.

slingshot
Posts: 40
Joined: Mon Mar 22, 2021 12:21 pm

Re: CPU bug?

Postby slingshot » Mon Mar 22, 2021 9:17 pm

Alcoholics Anonymous wrote:
Mon Mar 22, 2021 8:59 pm
slingshot wrote:
Mon Mar 22, 2021 2:16 pm
BTW, the CPU used is still not 100% perfect, counting the undocumented flags, memptr, address left on the bus in non-bus cycles - the later is affecting the contention circuit.
The address bus is always driven -- there shouldn't be any time when it's not.
Yes, but it does matter what's on the bus - because the Speccy's contention handler spys the bus in non-bus cycles actually.
It's not correct in the T80, but fixed on the one in the MiST repo. Actually the last address is left on the bus in a non-bus cycle, and it includes the refresh address, too. T80 sometimes throws an internal value to the bus, confusing the ULA (and make timing tests fail).
We know about the undocumented flags and memptr; these differences don't seem to affect anything except for the testing programs that look for them. The emulation community has taken them on to behave exactly like a particular version of z80 silicon and only (relatively) recently. I am not against implementing them, I just recognize the low importance.
It's quite understandable.
The real issue is that we are at the limits of the fpga we have so we have to be careful about making changes that affect the size of the cpu on the fpga and that might affect propagation times. I personally dislike the T80 because it is not an efficient implementation in terms of gate size and that's what I feel like I am fighting all day long :D
Maybe, but I'm not aware of any other really good Z80 core :) Maybe A-Z80, but it's also not fully accurate.
The one used for MiST cores has all of these fixed, and available here: https://github.com/mist-devel/T80. I wonder if is there an interest to merge these two forks (I can do some work in this direction if desired.)
I am aware of that and if you are interested in trying to improve the implementation we have, that is welcome. The size and speed constraints we have are real though so keep in mind that even if a better implementation comes about, it may not be possible to incorporate.
I think there are simple changes which doesn't make the core bigger.
There are some known issues:

1. Proper bus cycles are very important as the cpu signals are used to drive real hardware. Several fixes have already been applied to the T80 to fix some problems. Signals must transition on the correct clock edges.
Yes, true, T80pa does it correctly.
2. Memory read cycles for some instructions are known to be incorrect, "BIT n,(IX+d)" possibly being one of them. This forced us to introduce one wait state for all memory read cycles at 28MHz and not just for the planned instruction fetch cycle. It's possible some instructions are read data from the bus on the wrong clock edge.
AFAIK these are also fixed. If a bus cycle happens in a wrong MState, then the Timing_Tests-48k_v1.0 fails. And it passes all the 37 test (the last one needs even the AY chip removed from the machine).
3. The interrupt response in the T80 is not correct. I don't think IM0 is even there (but it's not essential) and the interrupt vector in im2 mode is not read correctly. I suspect a couple of things: one is the IM2 response does not take 19 cycles and the other is the vector is read in the wrong T state during IM2 response. You can see in the microcode that LDZ is used to load the vector but LDZ gets its value from the bus at the end of T3 whereas the interrupt vector is supplied at the end of T2. I haven't had a chance to look into this yet as interrupt vectors are a new feature in the Next.
Some interrupt-releated things are surely fixed. The number of cycles should be accurate. Actually the first bug I debugged was the NMI cycle length in T80. ZX81 has a visual glitch when it's incorrect. And given that a load of demos on CPC passes (which has a kind of raster interrupt), it should be OK.
4. I know the changes to the T80 have made it asynchronous. Synchronous would be better as I am not sure if additional pressure on routing is going to come about and we have problems in that already.
The only change which made it async (except for the top-level) seems to be that latch. It was easy to fix it.

Alcoholics Anonymous
Posts: 775
Joined: Mon May 29, 2017 7:00 pm

Re: CPU bug?

Postby Alcoholics Anonymous » Mon Mar 22, 2021 9:21 pm

slingshot wrote:
Mon Mar 22, 2021 8:54 pm
I think it's not the T80's fault which makes the core unstable, but how it's used instead. The async top-level is not meant to be used in a SOC-like FPGA configuration, but for an application to replace an original chip. It's not a good design to use an async module, with switchable clocks in a middle of a large design. Using clock-enables is the way to go. Originally there's not a top-level for T80, which mimics fully the original Z80, e.g. to allow changing singnals on both edge of the clock, but years ago it was created for the MiST(er) FPGAs by Sorgelig, called T80pa, using dual clock-enables.
It's used in many cores already, and can achieve ~64MHz on the low-end Altera FPGAs. The drawback using it is that it requires a 2x master clock because of the dual CE signals, e.g. 56 MHz for replacing a 28 MHz T80a. And lot of other code must be rewritten to use the clock enables (which currently using the cpu_clk).
Yes I agree one system clock with clock enables is the way to go. When I refactored the core a few years ago, I stopped short of that because there were so many changes taking place that there was a chance nothing would work even without a major step like changing the clocking. But we're taking that step in the next refactor because it's needed to tackle the issues we have with hdmi. I am hoping the effort to move to a single clock domain design will also result in better synthesis and a little more space to play with.

I am not sure if a 56MHz system clock is going to work; right now the synthesis is saying 40MHz is the max clock. But a lot of that could be margins to accommodate skew between clock domains and related issues.
BTW, I've sent a PR to eliminate the latch used for the NEXTREG instruction:
https://gitlab.com/SpectrumNext/ZX_Spec ... requests/4
Cheers!

slingshot
Posts: 40
Joined: Mon Mar 22, 2021 12:21 pm

Re: CPU bug?

Postby slingshot » Mon Mar 22, 2021 9:36 pm

Alcoholics Anonymous wrote:
Mon Mar 22, 2021 9:21 pm

Yes I agree one system clock with clock enables is the way to go. When I refactored the core a few years ago, I stopped short of that because there were so many changes taking place that there was a chance nothing would work even without a major step like changing the clocking. But we're taking that step in the next refactor because it's needed to tackle the issues we have with hdmi. I am hoping the effort to move to a single clock domain design will also result in better synthesis and a little more space to play with.

I am not sure if a 56MHz system clock is going to work; right now the synthesis is saying 40MHz is the max clock. But a lot of that could be margins to accommodate skew between clock domains and related issues.
But two things will happen with a switch to a single 56MHz clock:
- The clock mux for the CPU will eliminated, and no need for check several clock-domain crossing routes. It's a nightmare to optimize if a path can do a 28MHz->28MHz, then a 28MHz->3.5MHz transition. If it's optimized for speed, setup times will OK, but hold times will suffer, if it's slowed down, then hold times will meet, but setup times will suck.
- Currently both edges of the 28MHz clock is used, which is effectively an 56MHz clock in some path, when a signal issued in one edge and latched in the other.
BTW, I've sent a PR to eliminate the latch used for the NEXTREG instruction:
https://gitlab.com/SpectrumNext/ZX_Spec ... requests/4
Cheers!
You're welcome!


Who is online

Users browsing this forum: No registered users and 12 guests