Some suggestions for core 2.00.xx changes

This section is for discussing everything about Next hardware and latest updates.
Ped7g
Posts: 72
Joined: Mon Jul 16, 2018 7:11 pm

Some suggestions for core 2.00.xx changes

Postby Ped7g » Sun Dec 30, 2018 3:44 am

Hi, after spending few days with the 2.00.xx cores theoretically (reading docs and updating wiki), I have few suggestions (not truly expecting anything of it to happen, but just in case some of that will catch up with core team and to let it off my mind):

* IMO probably doable (from backward compatibility point of view and manual fixing, and maybe HW complexity)

Clip window control register $1C:
1) reads in L/S/U order (new feat of 2.00.xx), but writes in U/S/L order (OCD trigger)
2) for write only clip-register-index is reset, how about adding another bit per layer to reset actual coordinates to default (full screen)?
(at least then read vs write will be 6 vs 6 bits used)

For clip sprites in over-border mode: making sure invalid window 0,255,0,191 gracefully operates as 0,159,0,191 (i.e. extra range defined by too large X2 will not break the stuff)

Raster Line MSB $1E - reads in top bits the current horizontal position in terms of copper coordinates (actually how does it work now, when does the line number switch? Is it like copper, that the left border area belongs to previous line and the new line is returned after pixel 0 rendering starts? - that would be a bit unfortunate (lot more difficult to use), without that horizontal info)
EDIT - thinking about it, as the read of $1E and $1F is not atomic, polluting $1E with h-pos bits would probably make the reading too complicated, so rather adding one more reg with h-pos data (could have the bottom bit of $1E mirrored to save one read)?

With regards to previous item... when does line interrupt happens exactly? At the beginning of pixel area too? Beginning of HBLANK would be lot more simpler to use (allowing also for border effects.

Global Transparency $14 - make the test 9 bit with blue LSB always zero, so only one of 512 colours is transparent (this is more like question, isn't it already like this, allowing for classic bright-magenda paper to survive the test against $E3?)

* IMO impossible due to breaking legacy SW, but just in case...

Sprite palette upload port $xx53 - does this one even exist in latest cores? There's now generic NextReg $4x mechanism to handle all palette needs, port $53 just adds confusion to beginners, plus occupies 256 ports of I/O space.

ULANext Ink Color Mask $42:
the $00 value could have been used as general ULANext disabled flag (instead of the "disable FLASH" bit in $43) - feels to me more consistent, as enabled ULANext mode without setting ink-mask is sort of pointless (unless you rely on the default "15")

* IMO posing technical challenge

Timex 512x192 mode - having extra bit-flag to reshuffle this mode: that not whole bytes are interleaved (per 8 pixels), but single bits (pixels) are interleaved. Eventually having two variants of this, one reshuffling always ABAB..., other reshuffling ABAB.. on odd lines and BABA.. on even lines.

Clip window behaviours for X2 < X1 and Y2 < Y1: making the clipping inverted for that axis, i.e. the middle X2..X1 area is hidden (wrote that already in other response here in forum, including math proposal for that, can provide it again on request)

LDWS in "dec" flavour (i.e.: (de)=(hl), dec L, dec D) - that would set flag S=1 when using over-ROM mapping and reaching top border => making the need-bank-change test a simple `jp m,...`/`call m,...`) ;) :)

New barrel shifts affecting carry (from last bit going "out")

Adding SP' register for interrupt-only stack (im 3) (making push/pop abused for data manipulation interrupt safe)

I did notice in z88dk forum that `bool hl` is of considerable help to C compiler (not really sure why, but if there's any will to help CC devs, this one seems to be on top)

User avatar
sol_hsa
Posts: 78
Joined: Fri Jun 02, 2017 10:10 am

Re: Some suggestions for core 2.00.xx changes

Postby sol_hsa » Sun Dec 30, 2018 8:08 am

I did notice in z88dk forum that `bool hl` is of considerable help to C compiler (not really sure why, but if there's any will to help CC devs, this one seems to be on top)
In c, booleans are defined as 0=false, any other value=true, so there's a lot of checking if integers are zero or not.

Ped7g
Posts: 72
Joined: Mon Jul 16, 2018 7:11 pm

Re: Some suggestions for core 2.00.xx changes

Postby Ped7g » Sun Dec 30, 2018 11:37 am

Actually that would be `test hl,hl`, but they literally praise (!!hl), i.e. normalization to 0/1 (but including flags, that's an important point and probably the most challenging one with Z80N current design). Anyway, if you write C as asm, you shouldn't depend so much on the true 0/1 conversion, at least I rarely need it in assembly code.. then again those C compilers are based on quite legacy designs and they optimizers are nowhere near modern x86-64, and one can not even expect that (considering circumstances), so they may need that conversion often enough, fair point.

User avatar
sol_hsa
Posts: 78
Joined: Fri Jun 02, 2017 10:10 am

Re: Some suggestions for core 2.00.xx changes

Postby sol_hsa » Sun Dec 30, 2018 8:38 pm

Code: Select all

foo = (bar && baz);
.. assumes foo to be 0 or 1, whatever bar and baz were.

Alcoholics Anonymous
Posts: 459
Joined: Mon May 29, 2017 7:00 pm

Re: Some suggestions for core 2.00.xx changes

Postby Alcoholics Anonymous » Mon Dec 31, 2018 5:42 pm

You can read about "bool hl" in the rabbit 2k manual, including uses from assembly language:
https://github.com/z88dk/techdocs/raw/m ... 0069_p.pdf

Section 3.4, page 31. Part of its utility comes from the fact they made it a first class instruction, as in it is single byte so takes only 2 cycles to execute in the rabbit architecture. Whether it still makes sense in the zx next I'm not sure. However an "add sp,d" instruction would work wonders for high level language compilers. That one showed up about 15000 times in regression tests. There are many other instructions that showed up in the z80 line that would help everyone, including things like loading a register pair from (ix+d), and so on.

Ped7g
Posts: 72
Joined: Mon Jul 16, 2018 7:11 pm

Re: Some suggestions for core 2.00.xx changes

Postby Ped7g » Mon Dec 31, 2018 6:01 pm

Alcoholics Anonymous wrote:
Mon Dec 31, 2018 5:42 pm
...
Yup, having it as 2T vs 8T would make huge difference. And any direct `sp` manipulation would always help stack-based calling conventions (although I never really liked those, usually I find compiler which allows me to use register based calling-conv or I simply just write assembly (where also you need true (x!=0) -> 0/1 conversion super rarely, as usually you need only flags, or even non-bool result is usable, when one needs to test down to final 0/1 value, it most of the time signals there's room for optimization on higher abstraction level.

Anyway, this instruction mention sort of side-tracked the whole post, that was supposed to be buried on bottom as one of the least important things. I'm personally lot more worried about those raster interrupts...

Alcoholics Anonymous
Posts: 459
Joined: Mon May 29, 2017 7:00 pm

Re: Some suggestions for core 2.00.xx changes

Postby Alcoholics Anonymous » Mon Dec 31, 2018 6:34 pm

Ped7g wrote:
Sun Dec 30, 2018 3:44 am
For clip sprites in over-border mode: making sure invalid window 0,255,0,191 gracefully operates as 0,159,0,191 (i.e. extra range defined by too large X2 will not break the stuff)
The hardware can't draw outside its area (256x192 in border mode, 320x256 in over-border mode) so no need to worry there.
Clip window control register $1C:
This one can't be changed without talking to others first as it's likely baked into nextzxos now.
Raster Line MSB $1E - reads in top bits the current horizontal position in terms of copper coordinates (actually how does it work now, when does the line number switch? Is it like copper, that the left border area belongs to previous line and the new line is returned after pixel 0 rendering starts? - that would be a bit unfortunate (lot more difficult to use), without that horizontal info)
EDIT - thinking about it, as the read of $1E and $1F is not atomic, polluting $1E with h-pos bits would probably make the reading too complicated, so rather adding one more reg with h-pos data (could have the bottom bit of $1E mirrored to save one read)?
The position is numbered the same as the copper.

Kev Brady wrote some comprehensive documentation about video timing here:
https://gitlab.com/thesmog358/tbblue/bl ... -v0.1c.TXT

Line 263 has a table that shows horizontal timing. "Standard" means pixel number, "Compare" means byte number (groups of 8 pixels) which is what the copper "wait" instruction waits for and is how the ula works (fetching a byte at a time).

Vertical timing is shown on line 333. The tbblue register returns video line (so vertical position) not horizontal position. The horizontal position changes at a 7MHz rate so even if the cpu runs at 14MHz, the value has changed by several pixels by the time you've read it. I think for precise horizontal positioning you still have to rely on timed instructions like nirvana, etc, do. Or use the copper - it was designed for that anyway :)
With regards to previous item... when does line interrupt happens exactly? At the beginning of pixel area too? Beginning of HBLANK would be lot more simpler to use (allowing also for border effects.
The line interrupt happens during pixels 256-319 and is asserted on the line previous to the one programmed for. So the interrupt happens just after the previous line's display area is drawn. And I think I see a bug here when setting the line interrupt on line 0 :P
Global Transparency $14 - make the test 9 bit with blue LSB always zero, so only one of 512 colours is transparent (this is more like question, isn't it already like this, allowing for classic bright-magenda paper to survive the test against $E3?)
It's an 8-bit comparison and likely has to stay that way. Bright magenta is not $E3 in the next; nextzxos changes it to something else but I'm not sure off-hand what it is.
Sprite palette upload port $xx53 - does this one even exist in latest cores?
No that was removed when 9-bit palettes appeared.
ULANext Ink Color Mask $42:
the $00 value could have been used as general ULANext disabled flag (instead of the "disable FLASH" bit in $43) - feels to me more consistent, as enabled ULANext mode without setting ink-mask is sort of pointless (unless you rely on the default "15")
The naming on some of these things is pretty bad; "disable FLASH" should really be "enable ULANext mode".
The default is 4/4 when something other than the indicated values is written to $42. In hindsight maybe this could have been done but again making such changes for sw and hw that has been out for a while is disruptive. Maybe with $42 kept open other meanings could be attached to values other than the recommended in future.
Timex 512x192 mode - having extra bit-flag to reshuffle this mode: that not whole bytes are interleaved (per 8 pixels), but single bits (pixels) are interleaved. Eventually having two variants of this, one reshuffling always ABAB..., other reshuffling ABAB.. on odd lines and BABA.. on even lines.
It seems it would be hard to use Timex hi-res in this mode? As it is now, text printing is easy (its primary purpose I think) and pixel plotting is fairly easy too.
Clip window behaviours for X2 < X1 and Y2 < Y1: making the clipping inverted for that axis, i.e. the middle X2..X1 area is hidden (wrote that already in other response here in forum, including math proposal for that, can provide it again on request)
Yes I saw this one.
The instructions are probably not going to be touched for a while so maybe save those for a little later :)

Alcoholics Anonymous
Posts: 459
Joined: Mon May 29, 2017 7:00 pm

Re: Some suggestions for core 2.00.xx changes

Postby Alcoholics Anonymous » Mon Dec 31, 2018 6:46 pm

Ped7g wrote:
Mon Dec 31, 2018 6:01 pm
(although I never really liked those, usually I find compiler which allows me to use register based calling-conv or I simply just write assembly (where also you need true (x!=0) -> 0/1 conversion super rarely, as usually you need only flags, or even non-bool result is usable, when one needs to test down to final 0/1 value, it most of the time signals there's room for optimization on higher abstraction level.
Testing for zero is frequent in asm as well. Usually you're doing "ld a,r1; or r2" which uses up the accumulator, takes two instructions and wipes out the carry flag. Being able to do that quickly in "bool hl" does free things up and will alter how the asm is written. In C you can use the result of a comparison as 0/1 in arithmetic which is also useful. Compilers won't typically go to the trouble of generating a 0,1 except for these cases and here "bool hl" does it one shot.

The z80 does not have many registers so it doesn't take long for argument lists to have to go on the stack. They're usually better off there too for long lists of params as using up all the z80 registers to pass params means it's paralyzed to do the actual function and may have to temporarily write things to the stack anyway. The optimizing C compilers on the z80 do allow some params to be passed by register but the way they optimize is to break the code into basic blocks and determine the best register allocation for the basic block. Anything not deemed best for the block will be in memory outside it so that the most frequently used vars are in registers and the less frequently used ones are referred to in memory less frequently. It's the same way good asm programmers operate but of course expert humans are better than the compilers at it.

Ped7g
Posts: 72
Joined: Mon Jul 16, 2018 7:11 pm

Re: Some suggestions for core 2.00.xx changes

Postby Ped7g » Mon Dec 31, 2018 7:43 pm

Alcoholics Anonymous wrote:
Mon Dec 31, 2018 6:34 pm
Timex 512x192 mode - having extra bit-flag to reshuffle this mode: that not whole bytes are interleaved (per 8 pixels), but single bits (pixels) are interleaved. Eventually having two variants of this, one reshuffling always ABAB..., other reshuffling ABAB.. on odd lines and BABA.. on even lines.
It seems it would be hard to use Timex hi-res in this mode? As it is now, text printing is easy (its primary purpose I think) and pixel plotting is fairly easy too.
It's actually simpler for some tasks.. for example:
- you can treat it as a bit side-shifted 256x384 and draw line0 into $4000 and line1 into $6000 .. if you will do the sub-pixel accuracy, you will get chessboard dithering "for free" on lines with near-horizontal slopes.
- you can quickly create darkening chessboard over particular area by writing only into one half of the VRAM (fill with $FF vs $00 in other)
- you can actually ignore second half of VRAM and get something printed, although it will be "in holes", so crude rendering done on single half will be simpler
- the custom-size proportional font rendering may be more difficult.. then again, flipping between screen0/1 every 8 pixel isn't exactly convenient either, so in the end the difficulty may be quite similar.

Where the current Timex mode clearly wins is the fixed size 8x8 UDG/char-like graphics, that would be lot more difficult. Then again my suggestion is to check if there's way to add this as "extra", not to modify current one, as backward compatibility makes that off topic any way.

But it keeps nagging me, because from my experience the hi-res modes add so much to the feel... for example I went for 1024x768 in my last DOS 256B intro, including fake blur-dithering, etc...: http://www.pouet.net/prod.php?which=74653

So I do believe, that ultimately SW relying on lot of text (adventure games, editors), need proportional custom-size font fine-tuned for the 512x192 pixels, it will improve the reading of text a lot, just like the sub-pixel rendering did with fonts on LCDs. (and the proposed extra variants do NOT provide any significant advantage for that, they provide advantage only for some very specialized cases of certain effects)

EDIT: been thinking about it some more, and I'm less and less convinced the benefits of such modes are that great (to be worth of some serious development or sacrifice of other features), although I realized there is one more subtle advantage, the `pixelad` instruction is more "accurate" with bit-interleaved variant. So these are like "low priority suggestion", only if it's really simple to implement (like some few more lines in VHDL in some unobtrusive way).

Anyway, you answered some important bits to me, so I'm now updating wiki :) ... in my feel it's getting to a point where it is almost on par with 2.00.xx cores real state, but maybe I'm just overestimating it (and there are clearly still outdates pages, but I mean somebody digging deep for info should find most of the important bits now there - somewhere). If you ever notice something in wiki being wrong and it's not a quick change to do it on spot, just poke me about it, I will try to fix it.

Ped7g
Posts: 72
Joined: Mon Jul 16, 2018 7:11 pm

Re: Some suggestions for core 2.00.xx changes

Postby Ped7g » Tue Jan 01, 2019 2:47 pm

Alcoholics Anonymous wrote:
Mon Dec 31, 2018 6:34 pm
Raster Line MSB $1E + $1F ...
The position is numbered the same as the copper.

Kev Brady wrote some comprehensive documentation about video timing here:
https://gitlab.com/thesmog358/tbblue/bl ... -v0.1c.TXT

Vertical timing is shown on line 333. The tbblue register returns video line (so vertical position) not horizontal position. The horizontal position changes at a 7MHz rate so even if the cpu runs at 14MHz, the value has changed by several pixels by the time you've read it. I think for precise horizontal positioning you still have to rely on timed instructions like nirvana, etc, do. Or use the copper - it was designed for that anyway :)
The thing is, that sometimes you may want to wait for some particular line even in less accurate way, to do things which copper is not capable off (*note at end). On classic ZX you had to count timing precisely since VBLANK, so seeing the "scanline-read register" raised my hope for simple (rough) wait for h-blank code.

But now the $1F does read different value when the left border is already out and pixel 0 is being processed, which means one would have to use either line interrupt, or wait for (line-1), then delay for some more time until h-blank starts.

Having mirror of $1E providing also the horizontal copper-compare in upper 6 bits would allow such crude wait code at least a bit more focused on particular horizontal area of display.

(EDIT: thinking about it, with these reading variables, it's actually better to have MSBs together and LSB in other port = in this case it would make waiting for lines 0, 2, 4, ... to check only single register, and waiting for lines like 5 can wait for 4 first, and then read only LSB register (I mean if you are guaranteed to start the wait ahead of desired lines and not in the middle).)

IMHO this copper coordinate scheme is a bit unfortunate, as the dynamic timing causes the other visible parts (left and top borders) to have floating coordinates. I'm afraid those would be lot more simpler to use when mapped to blank areas and starting/ending zones of border (similar how sprites [0,0 ... 319,255] is precisely centred +-32 pixels around pixel area - this scheme will be simple enough to explain even to newcomers to the platform).

Anyway, it is, what it is, the line interrupt happening near h-blank start is very good and should cover in the end all needs (after you make sure line 0 works as expected :) ) - once I will figure out how to use it precisely and in simple way, but it feels good.

It's just the non-interrupt reading of $1E + $1F which seems to me being defined in a bit less helpful way than I would expect by hearing about it.

*note, just occurred to me: the copper code may modify particular palette index (or some other nextreg which has no side-effect for current code), and code may instead read and wait over that one, creating the horizontal-precision scanline wait code possible... It's a bit convoluted scheme and involves extra work to set up the copper, but in some cases it may be simple workaround (if you are programming copper anyway and in similar fashion that the extra signalling move would fit nicely).


Who is online

Users browsing this forum: No registered users and 5 guests