Fork me on GitHub


Monday, May 18, 2015

Experimenting with web workers

Last week I was experimenting with web workers, trying to separate rendering related tasks from the cpu emulation itself. It was a bittersweet experience.

A bit of context:

Webworkers allows you to do multithreading without all the complexities of traditional threads: mutex, semaphores, locks. In fact, do proper multithreading is hard and prone to errors, and the worst errors: errors that happens sometimes and are not consistently reproduced. Also sometimes you can just reproduce them when the system is overloaded and thus the worst moment for the bug to trigger.

So open web's workers, actionscript workers or dart's isolates try to avoid most of those problems. How? There is no shared memory. Two workers can't access the same memory. Each one has its own heap and they are isolated. So you are passing messages between each other or even transferring data withot copying; in the case of web workers, you can transfer ArrayBuffer objects without copying them.

Even when each worker can work on its own, and emit messages whenever they want. It is always preferred to have a main worker and send tasks to others, because of simplicity. Ideally you are using workers for intensive tasks like decompression, decoding, and so on. MediaEngine is ideally for this. You can decode audio and video there in a request/response/task fashion. Also you can do that for crypto and zlib compression.

Much more info after the jump.

Monday, May 11, 2015

Next version: HUGE graphics speedup: Valhalla Knights at 60fps + ffmpeg based media engine

I have been working on some optimizations. Basically I have mostly rewritten the gpu engine and have improved a lot the performance. In a i5@2.2ghz with a Intel HD Graphics 5000, Valhalla Knights works most of the time at full speed. Also I have been working on a ffmpeg based media engine for decoding audio and video (WIP).

More information after the jump.

Monday, May 4, 2015

New added optimizations

I have already done some optimizations I planed some time ago.

Now the compiler is creating bigger functions. Also I have reduced the native function calling overhead, and the overhead of function lookup.

Here the results:

Loops and function calls are now much much faster than before, and that will mean faster running.
In chrome the improvement is not that huge like with firefox. Firefox was performing so bad because of the function lookup that was lightning fast in chrome. Now that the function lookup is cached and not that frequent, there is not so much difference between chrome and firefox (regarding to cpu). Still the overall chrome is faster than firefox.

Also I have added a new benchmark with this (the v2):
You can still access the previous (v1):

Thursday, April 9, 2015

Next cpu optimizations

I have resumed the work of the emulator. Now I plan to make some optimizations to be able to reach a good speed in mobile.

Now most generated functions are like this:

function state(state) {
var expectedRA = state.getRA();
/*08804338*/ /*    lui */  state.gpr[8] = 0x08820000;
/*0880433c*/ /*   addu */  state.gpr[6] = (0 + 0);
/*08804340*/ /*   addu */  state.gpr[3] = (0 + 0);
/*08804344*/ /*  addiu */  state.gpr[8] = (state.gpr[8] + -20752);
/*08804348*/ /*  addiu */  state.gpr[7] = (0 + 5);
/*0880434c*/ /*   addu */  state.gpr[4] = (state.gpr[8] + 0);
/*08804350*/ /*    lui */  state.gpr[2] = 0x00500000;

while (true) {
/*08804354*/ /*     lh */  state.gpr[5] = state.lh(((state.gpr[4] + 0) | 0));
/*08804358*/ /*  addiu */  state.gpr[2] = (state.gpr[2] + -1);
/*0880435c*/ /*  addiu */  state.gpr[4] = (state.gpr[4] + 2);
/*08804360*/ /*    bne */  state.BRANCHFLAG = (state.gpr[2] != 0);
state.BRANCHPC = 0x08804354;
/*08804364*/ /*   addu */  state.gpr[3] = (state.gpr[3] + state.gpr[5]);
if (state.BRANCHFLAG) { state.PC = state.BRANCHPC; } else { state.PC = 0x08804368; }
if (!state.BRANCHFLAG) return;

Just a single loop, using the gpr Int32Array directly, and giving up soon. No calling or jumping other functions either. So it is slow. Faster than a pure interpreter, but much slower than a proper dynarec.

How to optimize this?

Benchmark + headless running (using node.js)

I have created a benchmark that allows to test several javascript engines. You can check it here:

Also now you can run the benchmark from node:
node jspspemu_headless.js data/benchmark/benchmark.prx

This allow you to run any PSP program and see the output it produces using the stdout, also the host process exists when the guest exists. This should run with any node version in any supported host architecture including arm, mips or x86/x64.


Tuesday, July 22, 2014

Embedding + New GUI + encrypted executable loading + travis-ci

Since my last post, I have worked a bit more in the emulator. Now it has a new GUI. Also I have been working in decoding encrypted executables. And today I put jspspemu on travis-ci. I am using mocha-phantomjs for running tests. You can see the status of the build with a badge included in github and this blog.

Also I have included a share button that allows you to embed easily demos and games:

Thursday, May 22, 2014

Valhalla Knights working

Since last week I have been working on sasCore, gpu and vfpu. Today's version is capable of running Valhalla Knights. Still not running at full speed but it is getting closer. After lazily updating webgl state, and making some improvements in vfpu, it should start running at full speed on modern computers.
  • Implemented sasCore (sound in games like Valhalla Knights and Lumines)
  • Gpu speedups and fixes
  • Vfpu fixes