C API for Next hardware sprites

If you like transforming your statements into code, this is the place for you

Moderator: Programming Moderators

Alcoholics Anonymous
Posts: 513
Joined: Mon May 29, 2017 7:00 pm

Re: C API for Next hardware sprites

Post by Alcoholics Anonymous » Mon Jul 17, 2017 12:20 am

Stefan123 wrote:
Fri Jul 07, 2017 9:55 pm
Yes, inlining z80_otir() would be great :-)
Looking at inlining some of these instructions is still on the to-do list but a couple of new intrinsics have been added tonight. These require a rebuild of zsdcc. I'm not sure when the win32 and macosx nightlies will be updated with new binaries but up to date win32 binaries are always available in the patch file: https://github.com/z88dk/z88dk/blob/mas ... _patch.zip

Anyway:

Code: Select all

#include <intrinsic.h>

void main(void)
{
   intrinsic_outi(0x8000, 0x5b, 256);  // (src address, port number evaluates to 8-bit constant, number of bytes 0-64,128,256 evaluates to constant)
   intrinsic ldi(0x4000, 0x8000, 32);  // (dst address, src address, number of bytes 0-64,128,256 evaluates to constant)
}
These call or jump into a wall of 64 outi / ldi for fast transfer. The 128/256 cases are taken care of specially but otherwise any number outside the range 0-64 will cause a crash. This is implemented in the post-processing step so the number can't be checked for out of bounds.

I think we can do a better job cleaning up the code around calls to ldi especially but I'll wait to do that until some real code examples come out.

Stefan123
Posts: 102
Joined: Mon Jun 05, 2017 9:38 pm

Re: C API for Next hardware sprites

Post by Stefan123 » Tue Jul 18, 2017 6:21 pm

Thanks for the update :-)

Is intrinsic_outi() faster than z80_otir() when, for example, transferring 256 bytes? Is it faster even if z80_otir() would be inlined? Will using intrinsic_outi() make the code size larger?

I assume that intrinsic_ldi() would be a good fit for implementing a blit function on the layer 2 screen, at least for some edge cases like copying whole 256-pixel lines. What is the performance of using intrinsic_ldi() for copying data compared to memcpy()? I guess that memcpy() is also quite optimized?

Alcoholics Anonymous
Posts: 513
Joined: Mon May 29, 2017 7:00 pm

Re: C API for Next hardware sprites

Post by Alcoholics Anonymous » Wed Jul 19, 2017 5:30 am

Stefan123 wrote:
Tue Jul 18, 2017 6:21 pm
Is intrinsic_outi() faster than z80_otir() when, for example, transferring 256 bytes? Is it faster even if z80_otir() would be inlined? Will using intrinsic_outi() make the code size larger?
Yes it's faster. intrinsic_outi() approaches 16 cycles per output byte whereas the best an otir instruction can do is 21. Options in the library will also allow z80_otir() to use something like intrinsic_outi() and this will likely get it to 17/18/19 cycles per out. The reason why it's a bit slower is because z80_otir() will not know ahead of time how many outs are being done so it will have looping considerations to take care of.

And, yes, it will also make the output binary larger but not too much. For now, it causes a 64 x outi block to be incorporated in the binary which amounts to 128 bytes.
I assume that intrinsic_ldi() would be a good fit for implementing a blit function on the layer 2 screen, at least for some edge cases like copying whole 256-pixel lines. What is the performance of using intrinsic_ldi() for copying data compared to memcpy()? I guess that memcpy() is also quite optimized?
intrinsic_ldi() is also faster, again approaching 16 cycles per copied byte. This is as fast as you can get without resorting to stack tricks or the dma to move bytes around. The default behaviour for memcpy is the compiler will inline an ldir instruction which is 21 cycles per byte. You will be able to turn off this inlining and instead get memcpy to jump into the ldi block where it should see 17/18/19 cycles per byte with the slowdown compared to intrinsic_di due to looping considerations.

For layer 2 software sprites, what's being considered is customizations of ldi blocks to copy pixels and skip over pixels that are transparent in the sprite. So sprites may occupy three bytes per pixel (ldi = 2 bytes + colour) instead of one (colour). For more restricted copying, there will be dma task lists that can be fed to the dma to simulate copies of 2d shapes on screen. The dma cannot skip transparent areas though and neither can it perform logical operations on source or destination data.

Stefan123
Posts: 102
Joined: Mon Jun 05, 2017 9:38 pm

Re: C API for Next hardware sprites

Post by Stefan123 » Wed Jul 19, 2017 10:09 am

Thanks for the detailed explanations. I will download the latest z88dk snapshot and test the intrinsic_outi() and intrinsic_ldi() functions for hardware sprites and layer 2 screen manipulations.

Stefan123
Posts: 102
Joined: Mon Jun 05, 2017 9:38 pm

Re: C API for Next hardware sprites

Post by Stefan123 » Tue Jul 25, 2017 2:56 pm

The C API for Next hardware sprites at https://github.com/stefanbylund/zxnext_sprite has now been updated to use the revised sprite pattern port 0x5B and now uses the otir instruction in the set_sprite_pattern() function.

maniccyberdog
Posts: 14
Joined: Mon May 29, 2017 7:23 pm

Re: C API for Next hardware sprites

Post by maniccyberdog » Thu Aug 17, 2017 2:34 pm

Hi all, just started playing with c and the above example; I only get a black square, not a sprite moving around on the screen.
I'm using;
Windows 10
z88dk build from last night
build cmd: zcc +zx -vn -O3 -startup=1 -clib=new zxnext_sprite_demo.c -o zxnext_sprite_demo -lzxnext_sprite -create-app -Cz"--sna"
Cspect version 0.6

and help welcome :-)

Stefan123
Posts: 102
Joined: Mon Jun 05, 2017 9:38 pm

Re: C API for Next hardware sprites

Post by Stefan123 » Thu Aug 17, 2017 3:01 pm

I used ZEsarUX when developing and testing the sprite API. I have just started to use CSpect of lately when playing with layer 2 screen features not yet available in ZEsarUX. I will test the sprite API with CSpect tonight or tomorrow. If you only see a black rectangle, I would suspect that the problem is with the set_sprite_pattern() function and its use of z80_otir(). A workaround in the meantime is to use ZEsarUX 5.1 beta 2017-07-24. I will get back as soon as I know more.

maniccyberdog
Posts: 14
Joined: Mon May 29, 2017 7:23 pm

Re: C API for Next hardware sprites

Post by maniccyberdog » Thu Aug 17, 2017 3:22 pm

I can confirm it works with ZEsarUX :-)

maniccyberdog
Posts: 14
Joined: Mon May 29, 2017 7:23 pm

Re: C API for Next hardware sprites

Post by maniccyberdog » Thu Aug 17, 2017 3:24 pm

Still, it shouldn't be too much longer and we can use real hardware!

Stefan123
Posts: 102
Joined: Mon Jun 05, 2017 9:38 pm

Re: C API for Next hardware sprites

Post by Stefan123 » Thu Aug 17, 2017 8:37 pm

Now it works in both ZEsarUX and CSpect :) I have updated the source code and the prebuilt libraries on GitHub.

The problem was in the set_sprite_pattern() function. This function used z80_otir() for passing the sprite pattern to the sprite pattern port 0x5B. This works fine in ZEsarUX but not in CSpect. So I had to revert to outputting each byte in the sprite pattern separately to port 0x5B.

There are basically four ways of setting the sprite pattern as exemplified by the set_sprite_pattern_v1() to set_sprite_pattern_v4() functions below, where v1 is the slowest version and v4 is the fastest version. All four versions work fine in ZEsarUX but only the first two versions work in CSpect. Either there is a problem with the z80_outp() and intrinsic_outi() functions in Z88DK or CSpect fails to execute them correctly. The latter reason is more likely. Maybe Alvin (Z88DK lead developer) can shed some light on this issue?

Code: Select all

void set_sprite_pattern_v1(const void *sprite_pattern)
{
    uint16_t i;

    for (i = 0; i < 256; i++)
    {
        z80_outp(SPRITE_PATTERN_PORT, ((uint8_t *) sprite_pattern)[i]);
    }
}

Code: Select all

__sfr __at SPRITE_PATTERN_PORT IO_SPRITE_PATTERN_PORT;

void set_sprite_pattern_v2(const void *sprite_pattern)
{
    uint16_t i;

    for (i = 0; i < 256; i++)
    {
        IO_SPRITE_PATTERN_PORT = ((uint8_t *) sprite_pattern)[i];
    }
}

Code: Select all

void set_sprite_pattern_v3(const void *sprite_pattern)
{
    z80_otir((void *) sprite_pattern, SPRITE_PATTERN_PORT, 0);
}

Code: Select all

void set_sprite_pattern_v4(const void *sprite_pattern)
{
    intrinsic_outi((void *) sprite_pattern, SPRITE_PATTERN_PORT, 256);
}

Post Reply