Progress on PIC32MX Video Player Performance

BenchmarkI recently got a few inquiries regarding an old project about a AVI Video Player for the PIC32. I had done the initial work in 2009 on one of the earliest prototypes of the PIC32 Multimedia board and later updated it for the PIC32 Mikromedia in 2011 (blog).  At the time I achieved a refresh rate of 15 frames per second with a 16-bit colour resolution in a 320*240 screen (upscaled from an un-compressed AVI stream of 160*120), 8-bit audio mono.  Considering I was using a “Mobile” display (COG) and an SD card in SPI mode and an 80MHz processor with 32K of RAM, this was pretty much the best that could be obtained thanks to some significant sw hacks (including mods to the MDD File System lib and the Graphics lib).

The bottleneck of the system was easily identified in the SD card interface used to read the AVI file. The SPI port at the maximum operating speed (given the processor 80MHz clock) was limiting the  peak data transfer to  20Mbit/s  but the SD card itself (seek time)  and the (MDD) file system were adding overhead keeping the effective throughput below 9 Mbit/s.

With a very simple back of the envelope calculation it can be found that a (160*120) RGB video stream needs 8Mbit/s and then the audio stream must be interspersed.  Upscaling the stream to 320*240 was all that the PIC32 could do in the few ms left after each frame.

Replacing the SD card media with a more ‘parallel’ Compact Flash media was an obvious path to higher performance and one that I had been dreaming of for a while. In the past I had used Compact Flash cards extensively, in fact that was the original media I had used (with a PIC18) to develop what eventually became the File System presented in my books for the PIC24 and PIC32. This was eventually  adopted and greatly expanded in the MLA libraries as the MDD File System but, in recent years CF cards lost the favours of the embedded control community as SD cards become so inexpensive and their form factor being so much more convenient. A Compact Flash card can be accessed using an 8-bit parallel bus or a 16-bit parallel bus, and that theoretically would directly increase the bandwidth available to a clock limited processor by a factor of 8x or 16x over the serial (SPI) interface used on the SD cards.

To test this assumption I used a board developed to my specifications by Peter Szilard, a master of PIC32 applications and a friend. The board features a 4.2 inches mobile display (using a COG controller: IL9326) and most importantly has a Compact Flash slot. Both are interfaced to a PIC32MX795 via a shared 16-bit Parallel Master Port. I quickly ported the MLA display Driver module to the IL9326 controller so to take advantage of the graphics libraries (see picture above).

The first benchmark came from the graphics controller itself. Using sequential pixel access mode for fast line drawing, I could demonstrate  that filling the 432 x 240 pixel screen (16-bit color depth)  required only 10 ms.

Porting the MDD File System driver for CF cards to the new hardware proved trickier than expected. All my previous examples had used only 8-bit mode, and in fact did not use the PMP either. After spending a few hours pouring over the old CF card specs still in my possession, I ended up rewriting quite extensively the low level sector read/write routines to perform full 16-bit access to the control register set and to get the fastest transfer possible. The results were quite rewarding: a random 512 byte sector could be read in less than 300 us on average, almost a 5x improvement over the SD card speed, but still far from the theoretical 8x/16x.

The next step in performance was achieved once I realised that the majority of the time spent when reading a sector from the media was waiting for the CF card to process the request (something comparable to the seek time of a hard drive) rather than the row data transfer. The standard ATA SectorRead command does  in fact allow for multiple sequential sectors to be requested at once and, when that is done, the seek time is incurred in only once, at the beginning.  Combining this notion with the understanding of how a FAT16/32 file system works, that is using contiguous groups of sectors, known as ‘clusters‘, to fragment files across the media, it follows that the best performance can be achieved by reading back a file by clusters rather than by individual sectors. This could be easily verified by benchmarking the time to read a cluster of 8kBytes  of data at a time, which took on average less than 800us. In other words, we are talking of a 10 Mbyte/s  throughput (see in the figure above the 79ms indication for 100 repeats).

With these results, we can see how a full 432 * 240 pixel frame (16-bit colour depth) consisting of 207 KBytes, would require  less than 20ms to load , which added to the 10ms for the display update would  allow for up 30 frames per second (fps) continuous video (and audio) playback!

Next, putting it all together and actually playing back a full colour and resolution (432 x 240 x 16) 25/30 fps video…

P.S. If you are wondering why the MLA (MDD File System) does not use a cluster buffering mechanism by default, here is the quick answer: RAM scarcity, determinism. Each open file needs a buffer and its size is dictated by the smallest block of data that can be retrieved from the media. For most media that number is 512 bytes if a single sector is fetched at a time. Using cluster buffering makes the buffers much larger (4K bytes, 8K bytes .. 64K bytes…) but even worse, makes their size media dependent as each disk/card can be formatted with a different cluster size

 

 

This entry was posted in Graphics, PIC32, Tips and Tricks. Bookmark the permalink.