Building Blocks - Applications#

Here are some examples that use the basic PRU building blocks.

The following are resources used in this chapter.

Memory Allocation#

Problem#

I want to control where my variables are stored in memory.

Todo

Include a section on accessing DDR.

Solution#

Each PRU has is own 8KB of data memory (Data Mem0 and Mem1) and 12KB of shared memory (Shared RAM) as shown in PRU Block Diagram.

PRU Block diagram

Fig. 734 PRU Block Diagram#

Each PRU accesses its own DRAM starting at location 0x0000_0000. Each PRU can also access the other PRU’s DRAM starting at 0x0000_2000. Both PRUs access the shared RAM at 0x0001_0000. The compiler can control where each of these memories variables are stored.

shared.pro0.c - Examples of Using Different Memory Locations shows how to allocate seven variable in six different locations.

Listing 102 shared.pro0.c - Examples of Using Different Memory Locations#
 1// From: http://git.ti.com/pru-software-support-package/pru-software-support-package/blobs/master/examples/am335x/PRU_access_const_table/PRU_access_const_table.c
 2#include <stdint.h>
 3#include <pru_cfg.h>
 4#include <pru_ctrl.h>
 5#include "resource_table_empty.h"
 6
 7#define PRU_SRAM  __far __attribute__((cregister("PRU_SHAREDMEM", near)))
 8#define PRU_DMEM0 __far __attribute__((cregister("PRU_DMEM_0_1",  near)))
 9#define PRU_DMEM1 __far __attribute__((cregister("PRU_DMEM_1_0",  near)))
10
11/* NOTE:  Allocating shared_x to PRU Shared Memory means that other PRU cores on
12 *        the same subsystem must take care not to allocate data to that memory.
13 *		  Users also cannot rely on where in shared memory these variables are placed
14 *        so accessing them from another PRU core or from the ARM is an undefined behavior.
15 */
16volatile uint32_t shared_0;
17PRU_SRAM  volatile uint32_t shared_1;
18PRU_DMEM0 volatile uint32_t shared_2;
19PRU_DMEM1 volatile uint32_t shared_3;
20#pragma DATA_SECTION(shared_4, ".bss")
21volatile uint32_t shared_4;
22
23/* NOTE:  Here we pick where in memory to store shared_5.  The stack and
24 *		  heap take up the first 0x200 words, so we must start after that.
25 *		  Since we are hardcoding where things are stored we can share
26 *		  this between the PRUs and the ARM.
27*/
28#define PRU0_DRAM		0x00000			// Offset to DRAM
29// Skip the first 0x200 bytes of DRAM since the Makefile allocates
30// 0x100 for the STACK and 0x100 for the HEAP.
31volatile unsigned int *shared_5 = (unsigned int *) (PRU0_DRAM + 0x200);
32
33
34int main(void)
35{
36	volatile uint32_t shared_6;
37	volatile uint32_t shared_7;
38	/*****************************************************************/
39	/* Access PRU peripherals using Constant Table & PRU header file */
40	/*****************************************************************/
41
42	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
43	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
44
45	/*****************************************************************/
46	/* Access PRU Shared RAM using Constant Table                    */
47	/*****************************************************************/
48
49	/* C28 defaults to 0x00000000, we need to set bits 23:8 to 0x0100 in order to have it point to 0x00010000	 */
50	PRU0_CTRL.CTPPR0_bit.C28_BLK_POINTER = 0x0100;
51
52	shared_0 =  0xfeef;
53	shared_1 = 0xdeadbeef;
54	shared_2 = shared_2 + 0xfeed;
55	shared_3 = 0xdeed;
56	shared_4 = 0xbeed;
57	shared_5[0] = 0x1234;
58	shared_6 = 0x4321;
59	shared_7 = 0x9876;
60
61	/* Halt PRU core */
62	__halt();
63}

shared.pru0.c

Discussion#

Here’s the line-by-line

Table 147 Line-byline for shared.pru0.c#

Line

Explanation

7

PRU_SRAM is defined here. It will be used later to declare variables in the Shared RAM location of memory. Section 5.5.2 on page 75 of the PRU Optimizing C/C++ Compiler, v2.2, User’s Guide gives details of the command. The PRU_SHAREDMEM refers to the memory section defined in am335x_pru.cmd on line 26.

8, 9

These are like the previous line except for the DMEM sections.

16

Variables declared outside of main() are put on the heap.

17

Adding PRU_SRAM has the variable stored in the shared memory.

18, 19

These are stored in the PRU’s local RAM.

20, 21

These lines are for storing in the .bss section as declared on line 74 of am335x_pru.cmd.

28-31

All the previous examples direct the compiler to an area in memory and the compilers figures out what to put where. With these lines we specify the exact location. Here are start with the PRU_DRAM starting address and add 0x200 to it to avoid the stack and the heap. The advantage of this technique is you can easily share these variables between the ARM and the two PRUs.

36, 37

Variable declared inside main() go on the stack.

Caution

Using the technique of line 28-31 you can put variables anywhere, even where the compiler has put them. Be careful, it’s easy to overwrite what the compiler has done

Compile and run the program.

bone$ *source shared_setup.sh*
TARGET=shared.pru0
Black Found
P9_31
Current mode for P9_31 is:     pruout
Current mode for P9_31 is:     pruout
P9_29
Current mode for P9_29 is:     pruout
Current mode for P9_29 is:     pruout
P9_30
Current mode for P9_30 is:     pruout
Current mode for P9_30 is:     pruout
P9_28
Current mode for P9_28 is:     pruout
Current mode for P9_28 is:     pruout
bone$ *make*
/opt/source/pru-cookbook-code/common/Makefile:29: MODEL=TI_AM335x_BeagleBone_Black,TARGET=shared.pru0
-    Stopping PRU 0
-     copying firmware file /tmp/vsx-examples/shared.pru0.out to /lib/firmware/am335x-pru0-fw
write_init_pins.sh
-    Starting PRU 0
MODEL   = TI_AM335x_BeagleBone_Black
PROC    = pru
PRUN    = 0
PRU_DIR = /sys/class/remoteproc/remoteproc1

Now check the symbol table to see where things are allocated.

bone $ *grep shared /tmp/vsx-examples/shared.pru0.map*
....
1     0000011c  shared_0
2     00010000  shared_1
1     00000000  shared_2
1     00002000  shared_3
1     00000118  shared_4
1     00000120  shared_5

We see, shared_0 had no directives and was places in the heap that is 0x100 to 0x1ff. shared_1 was directed to go to the SHAREDMEM, shared_2 to the start of the local DRAM (which is also the top of the stack). shared_3 was placed in the DRAM of PRU 1, shared_4 was placed in the .bss section, which is in the heap. Finally shared_5 is a pointer to where the value is stored.

Where are shared_6 and shared_7? They are declared inside main() and are therefore placed on the stack at run time. The shared.map file shows the compile time allocations. We have to look in the memory itself to see what happen at run time.

Let’s fire up prudebug (prudebug - A Simple Debugger for the PRU) to see where things are.

bone$ *sudo ./prudebug*
PRU Debugger v0.25
(C) Copyright 2011, 2013 by Arctica Technologies.  All rights reserved.
Written by Steven Anderson

Using /dev/mem device.
Processor type                AM335x
PRUSS memory address  0x4a300000
PRUSS memory length   0x00080000

        offsets below are in 32-bit byte addresses (not ARM byte addresses)
        PRU            Instruction    Data         Ctrl
        0              0x00034000     0x00000000   0x00022000
        1              0x00038000     0x00002000   0x00024000

PRU0> *d 0*
Absolute addr = 0x0000, offset = 0x0000, Len = 16
[0x0000] 0x0000feed 0x00000000 0x00000000 0x00000000
[0x0010] 0x00000000 0x00000000 0x00000000 0x00000000
[0x0020] 0x00000000 0x00000000 0x00000000 0x00000000
[0x0030] 0x00000000 0x00000000 0x00000000 0x00000000

The value of shared_2 is in memory location 0.

PRU0> *dd 0x100*
Absolute addr = 0x0100, offset = 0x0000, Len = 16
[0x0100] 0x00000000 0x00000001 0x00000000 0x00000000
[0x0110] 0x00000000 0x00000000 0x0000beed 0x0000feef
[0x0120] 0x00000200 0x3ec71de3 0x1a013e1a 0xbf2a01a0
[0x0130] 0x111110b0 0x3f811111 0x55555555 0xbfc55555

There are shared_0 and shared_4 in the heap, but where is shared_6 and shared_7? They are supposed to be on the stack that starts at 0.

PRU0> dd *0xc0*
Absolute addr = 0x00c0, offset = 0x0000, Len = 16
[0x00c0] 0x00000000 0x00000000 0x00000000 0x00000000
[0x00d0] 0x00000000 0x00000000 0x00000000 0x00000000
[0x00e0] 0x00000000 0x00000000 0x00000000 0x00000000
[0x00f0] 0x00000000 0x00000000 0x00004321 0x00009876

There they are; the stack grows from the top. (The heap grows from the bottom.)

PRU0> dd *0x2000*
Absolute addr = 0x2000, offset = 0x0000, Len = 16
[0x2000] 0x0000deed 0x00000001 0x00000000 0x557fcfb5
[0x2010] 0xce97bd0f 0x6afb2c8f 0xc7f35df4 0x5afb6dcb
[0x2020] 0x8dec3da3 0xe39a6756 0x642cb8b8 0xcb6952c0
[0x2030] 0x2f22ebda 0x548d97c5 0x9241786f 0x72dfeb86

And there is PRU 1’s memory with shared_3. And finally the shared memory.

PRU0> *dd 0x10000*
Absolute addr = 0x10000, offset = 0x0000, Len = 16
[0x10000] 0xdeadbeef 0x0000feed 0x00000000 0x68c44f8b
[0x10010] 0xc372ba7e 0x2ffa993b 0x11c66da5 0xfbf6c5d7
[0x10020] 0x5ada3fcf 0x4a5d0712 0x48576fb7 0x1004796b
[0x10030] 0x2267ebc6 0xa2793aa1 0x100d34dc 0x9ca06d4a

The compiler offers great control over where variables are stored. Just be sure if you are hand picking where things are put, not to put them in places used by the compiler.

Auto Initialization of built-in LED Triggers#

Problem#

I see the built-in LEDs blink to their own patterns. How do I turn this off? Can this be automated?

Solution#

Each built-in LED has a default action (trigger) when the Bone boots up. This is controlled by /sys/class/leds.

bone$ *cd /sys/class/leds*
bone$ *ls*
beaglebone:green:usr0  beaglebone:green:usr2
beaglebone:green:usr1  beaglebone:green:usr3

Here you see a directory for each of the LEDs. Let’s pick USR1.

bone$ *cd beaglebone\:green\:usr1*
bone$ *ls*
brightness  device  max_brightness  power  subsystem  trigger  uevent
bone$ *cat trigger*
none usb-gadget usb-host rfkill-any rfkill-none kbd-scrolllock kbd-numlock
kbd-capslock kbd-kanalock kbd-shiftlock kbd-altgrlock kbd-ctrllock kbd-altlock
kbd-shiftllock kbd-shiftrlock kbd-ctrlllock kbd-ctrlrlock *[mmc0]* timer
oneshot disk-activity disk-read disk-write ide-disk mtd nand-disk heartbeat
backlight gpio cpu cpu0 activity default-on panic netdev phy0rx phy0tx
phy0assoc phy0radio rfkill0

Notice [mmc0] is in brackets. This means it’s the current trigger; it flashes when the built-in flash memory is in use. You can turn this off using:

bone$ *echo none > trigger*
bone$ *cat trigger*
*[none]* usb-gadget usb-host rfkill-any rfkill-none kbd-scrolllock kbd-numlock
kbd-capslock kbd-kanalock kbd-shiftlock kbd-altgrlock kbd-ctrllock kbd-altlock
kbd-shiftllock kbd-shiftrlock kbd-ctrlllock kbd-ctrlrlock mmc0 timer
oneshot disk-activity disk-read disk-write ide-disk mtd nand-disk heartbeat
backlight gpio cpu cpu0 activity default-on panic netdev phy0rx phy0tx
phy0assoc phy0radio rfkill0

Now it is no longer flashing.

How can this be automated so when code is run that needs the trigger off, it’s turned off automatically? Here’s a trick. Include the following in your code.

1#pragma DATA_SECTION(init_pins, ".init_pins")
2#pragma RETAIN(init_pins)
3const char init_pins[] =
4        "/sys/class/leds/beaglebone:green:usr3/trigger\0none\0" \
5        "\0\0";

Lines 3 and 4 declare the array init_pins to have an entry which is the path to trigger and the value that should be ‘echoed’ into it. Both are NULL terminated. Line 1 says to put this in a section called .init_pins and line 2 says to RETAIN it. That is don’t throw it away if it appears to be unused.

Discussion#

The above code stores this array in the .out file thats created, but that’s not enough. You need to run write_init_pins.sh on the .out file to make the code work. Fortunately the Makefile always runs it.

Listing 103 write_init_pins.sh#
1#!/bin/bash
2init_pins=$(readelf -x .init_pins $1 | grep 0x000 | cut -d' ' -f4-7 | xxd -r -p | tr '\0' '\n' | paste - -)
3while read -a line; do
4    if [ ${#line[@]} == 2 ]; then
5        echo writing \"${line[1]}\" to \"${line[0]}\"
6        echo ${line[1]} > ${line[0]}
7        sleep 0.1
8    fi
9done <<< "$init_pins"

write_init_pins.sh

The readelf command extracts the path and value from the .out file.

bone$ *readelf -x .init_pins /tmp/pru0-gen/shared.out*

Hex dump of section '.init_pins':
  0x000000c0 2f737973 2f636c61 73732f6c 6564732f /sys/class/leds/
  0x000000d0 62656167 6c65626f 6e653a67 7265656e beaglebone:green
  0x000000e0 3a757372 332f7472 69676765 72006e6f :usr3/trigger.no
  0x000000f0 6e650000 0000                       ne....

The rest of the command formats it. Finally line 6 echos the none into the path.

This can be generalized to initialize other things. The point is, the .out file contains everything needed to run the executable.

PWM Generator#

One of the simplest things a PRU can to is generate a simple signal starting with a single channel PWM that has a fixed frequency and duty cycle and ending with a multi channel PWM that the ARM can change the frequency and duty cycle on the fly.

Problem#

I want to generate a PWM signal that has a fixed frequency and duty cycle.

Solution#

The solution is fairly easy, but be sure to check the Discussion section for details on making it work.

pwm1.pru0.c shows the code.

Warning

This code is for the BeagleBone Black. See pwm1.pru1_1.c for an example that works on the AI.

Listing 104 pwm1.pru0.c#
 1#include <stdint.h>
 2#include <pru_cfg.h>
 3#include "resource_table_empty.h"
 4#include "prugpio.h"
 5
 6volatile register uint32_t __R30;
 7volatile register uint32_t __R31;
 8
 9void main(void)
10{
11	uint32_t gpio = P9_31;	// Select which pin to toggle.;
12
13	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
14	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
15
16	while(1) {
17		__R30 |= gpio;		// Set the GPIO pin to 1
18		__delay_cycles(100000000);
19		__R30 &= ~gpio;		// Clear the GPIO pin
20		__delay_cycles(100000000);
21	}
22}

pwm1.pru0.c

To run this code you need to configure the pin muxes to output the PRU. If you are on the Black run

bone$ config-pin P9_31 pruout

On the Pocket run

bone$ config-pin P1_36 pruout

Note

See Configuring pins on the AI via device trees for configuring pins on the AI.

Then, tell Makefile which PRU you are compiling for and what your target file is

bone$ export TARGET=pwm1.pru0

Now you are ready to compile

bone$ make
/opt/source/pru-cookbook-code/common/Makefile:29: MODEL=TI_AM335x_BeagleBone_Black,TARGET=pwm1.pru0
-    Stopping PRU 0
-     copying firmware file /tmp/vsx-examples/pwm1.pru0.out to /lib/firmware/am335x-pru0-fw
write_init_pins.sh
-    Starting PRU 0
MODEL   = TI_AM335x_BeagleBone_Black
PROC    = pru
PRUN    = 0
PRU_DIR = /sys/class/remoteproc/remoteproc1

Now attach an LED (or oscilloscope) to P9_31 on the Black or P1.36 on the Pocket. You should see a squarewave.

Discussion#

Since this is our first example we’ll discuss the many parts in detail.

Listing 105 pwm1.pru0.c#
 1#include <stdint.h>
 2#include <pru_cfg.h>
 3#include "resource_table_empty.h"
 4#include "prugpio.h"
 5
 6volatile register uint32_t __R30;
 7volatile register uint32_t __R31;
 8
 9void main(void)
10{
11	uint32_t gpio = P9_31;	// Select which pin to toggle.;
12
13	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
14	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
15
16	while(1) {
17		__R30 |= gpio;		// Set the GPIO pin to 1
18		__delay_cycles(100000000);
19		__R30 &= ~gpio;		// Clear the GPIO pin
20		__delay_cycles(100000000);
21	}
22}

pwm1.pru0.c

Line-by-line of pwm1.pru0.c is a line-by-line expanation of the c code.

Table 148 Line-by-line of pwm1.pru0.c#

Line

Explanation

1

Standard c-header include

2

Include for the PRU. The compiler knows where to find this since the Makefile says to look for includes in /usr/lib/ti/pru-software-support-package

3

The file resource_table_empty.h is used by the PRU loader. Generally we’ll use the same file, and don’t need to modify it.

4

This include has addresses for the GPIO ports and some bit positions for some of the headers.

Here’s what’s in resource_table_empty.h

Listing 106 resource_table_empty.c#
 1/*
 2 *  ======== resource_table_empty.h ========
 3 *
 4 *  Define the resource table entries for all PRU cores. This will be
 5 *  incorporated into corresponding base images, and used by the remoteproc
 6 *  on the host-side to allocated/reserve resources.  Note the remoteproc
 7 *  driver requires that all PRU firmware be built with a resource table.
 8 *
 9 *  This file contains an empty resource table.  It can be used either as:
10 *
11 *        1) A template, or
12 *        2) As-is if a PRU application does not need to configure PRU_INTC
13 *                  or interact with the rpmsg driver
14 *
15 */
16
17#ifndef _RSC_TABLE_PRU_H_
18#define _RSC_TABLE_PRU_H_
19
20#include <stddef.h>
21#include <rsc_types.h>
22
23struct my_resource_table {
24	struct resource_table base;
25
26	uint32_t offset[1]; /* Should match 'num' in actual definition */
27};
28
29#pragma DATA_SECTION(pru_remoteproc_ResourceTable, ".resource_table")
30#pragma RETAIN(pru_remoteproc_ResourceTable)
31struct my_resource_table pru_remoteproc_ResourceTable = {
32	1,	/* we're the first version that implements this */
33	0,	/* number of entries in the table */
34	0, 0,	/* reserved, must be zero */
35	0,	/* offset[0] */
36};
37
38#endif /* _RSC_TABLE_PRU_H_ */

resource_table_empty.c

Table 149 Line-by-line (continuted)#

Line

Explanation

6-7

__R30 and __R31 are two variables that refer to the PRU output (__R30) and input (__R31) registers. When you write something to __R30 it will show up on the corresponding output pins. When you read from __R31 you read the data on the input pins. NOTE: Both names begin with two underscore’s. Section 5.7.2 of the PRU Optimizing C/C++ Compiler, v2.2, User’s Guide gives more details.

11

This line selects which GPIO pin to toggle. The table below shows which bits in __R30 map to which pins

14

CT_CFG.SYSCFG_bit.STANDBY_INIT is set to 0 to enable the OCP master port. More details on this and thousands of other regesters see the TI AM335x TRM. Section 4 is on the PRU and section 4.5 gives details for all the registers.

Bit 0 is the LSB.

Todo

fill in Blue

Table 150 Mapping bit positions to pin names#

PRU

Bit

Black pin

Pocket pin

0

0

P9_31

P1.36

0

1

P9_29

P1.33

0

2

P9_30

P2.32

0

3

P9_28

P2.30

0

4

P9_42b

P1.31

0

5

P9_27

P2.34

0

6

P9_41b

P2.28

0

7

P9_25

P1.29

0

14

P8_12(out) P8_16(in)

P2.24

0

15

P8_11(out) P8_15(in)

P2.33

1

0

P8_45

1

1

P8_46

1

2

P8_43

1

3

P8_44

1

4

P8_41

1

5

P8_42

1

6

P8_39

1

7

P8_40

1

8

P8_27

P2.35

1

9

P8_29

P2.01

1

10

P8_28

P1.35

1

11

P8_30

P1.04

1

12

P8_21

1

13

P8_20

1

14

P1.32

1

15

P1.30

1

16

P9_26(in)|

Note

See Configuring pins on the AI via device trees for all the PRU pins on the AI.

Since we are running on PRU 0, and we’re using 0x0001, that is bit 0, we’ll be toggling P9_31.

Table 151 Line-by-line (continued again)#

Line

Explanation

17

Here is where the action is. This line reads __R30 and then ORs it with gpio, setting the bits where there is a 1 in gpio and leaving the bits where there is a 0. Thus we are setting the bit we selected. Finally the new value is written back to __R30.

18

__delay_cycles is an ((intrinsic function)) that delays with number of cycles passed to it. Each cycle is 5ns, and we are delaying 100,000,000 cycles which is 500,000,000ns, or 0.5 seconds.

19

This is like line 17, but ~gpio inverts all the bits in gpio so that where we had a 1, there is now a 0. This 0 is then ANDed with __R30 setting the corresponding bit to 0. Thus we are clearing the bit we selected.

Tip

You can read more about intrinsics in section 5.11 of the (PRU Optimizing C/C++ Compiler, v2.2, User’s Guide.)

When you run this code and look at the output you will see something like the following figure.

pwm1.pru0.c output

Fig. 735 Output of pwm1.pru0.c with 100,000,000 delays cycles giving a 1s period#

Notice the on time (+Width(1)) is 500ms, just as we predicted. The off time is 498ms, which is only 2ms off from our prediction. The standard deviation is 0, or only 380as, which is 380 * 10^-18^!.

You can see how fast the PRU can run by setting both of the __delay_cycles to 0. This results in the next figure.

pwm1.pru0.c output with 0 delay

Fig. 736 Output of pwm1.pru0c with 0 delay cycles#

Notice the period is 15ns which gives us a frequency of about 67MHz. At this high frequency the breadboard that I’m using distorts the waveform so it’s no longer a squarewave. The on time is 5.3ns and the off time is 9.8ns. That means __R30 |= gpio took only one 5ns cycle and __R30 &= ~gpio also only took one cycle, but there is also an extra cycle needed for the loop. This means the compiler was able to implement the while loop in just three 5ns instructions! Not bad.

We want a square wave, so we need to add a delay to correct for the delay of looping back.

Here’s the code that does just that.

Listing 107 pwm2.pru0.c#
 1#include <stdint.h>
 2#include <pru_cfg.h>
 3#include "resource_table_empty.h"
 4#include "prugpio.h"
 5
 6volatile register uint32_t __R30;
 7volatile register uint32_t __R31;
 8
 9void main(void)
10{
11	uint32_t gpio = P9_31;	// Select which pin to toggle.;
12
13	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
14	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
15
16	while (1) {
17		__R30 |= gpio;		// Set the GPIO pin to 1
18		__delay_cycles(1);	// Delay one cycle to correct for loop time
19		__R30 &= ~gpio;		// Clear the GPIO pin
20		__delay_cycles(0);
21	}
22}

pwm2.pru0.c

The output now looks like:

pwm2.c corrected delay

Fig. 737 Output of pwm2.pru0.c corrected delay#

It’s not hard to adjust the two __delay_cycles to get the desired frequency and duty cycle.

Controlling the PWM Frequency#

Problem#

You would like to control the frequency and duty cycle of the PWM without recompiling.

Solution#

Have the PRU read the on and off times from a shared memory location. Each PRU has is own 8KB of data memory (DRAM) and 12KB of shared memory (SHAREDMEM) that the ARM processor can also access. See PRU Block Diagram.

The DRAM 0 address is 0x0000 for PRU 0. The same DRAM appears at address 0x4A300000 as seen from the ARM processor.

Tip

See page 184 of the AM335x TRM (184).

We take the previous PRU code and add the lines

#define PRU0_DRAM             0x00000                 // Offset to DRAM
volatile unsigned int *pru0_dram = PRU0_DRAM;

to define a pointer to the DRAM.

Note

The volatile keyword is used here to tell the compiler the value this points to may change, so don’t make any assumptions while optimizing.

Later in the code we use

pru0_dram[ch] = on[ch];             // Copy to DRAM0 so the ARM can change it
pru0_dram[ch+MAXCH] = off[ch];  // Copy after the on array

to write the on and off times to the DRAM. Then inside the while loop we use

onCount[ch] = pru0_dram[2*ch];          // Read from DRAM0
offCount[ch]= pru0_dram[2*ch+1];

to read from the DRAM when resetting the counters. Now, while the PRU is running, the ARM can write values into the DRAM and change the PWM on and off times. pwm4.pru0.c is the whole code.

Listing 108 pwm4.pru0.c#
 1// This code does MAXCH parallel PWM channels.
 2// It's period is 3 us
 3#include <stdint.h>
 4#include <pru_cfg.h>
 5#include "resource_table_empty.h"
 6
 7#define PRU0_DRAM		0x00000			// Offset to DRAM
 8// Skip the first 0x200 byte of DRAM since the Makefile allocates
 9// 0x100 for the STACK and 0x100 for the HEAP.
10volatile unsigned int *pru0_dram = (unsigned int *) (PRU0_DRAM + 0x200);
11
12#define MAXCH	4	// Maximum number of channels per PRU
13
14volatile register uint32_t __R30;
15volatile register uint32_t __R31;
16
17void main(void)
18{
19	uint32_t ch;
20	uint32_t on[]  = {1, 2, 3, 4};	// Number of cycles to stay on
21	uint32_t off[] = {4, 3, 2, 1};	// Number to stay off
22	uint32_t onCount[MAXCH];		// Current count
23	uint32_t offCount[MAXCH];
24
25	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
26	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
27
28	// Initialize the channel counters.
29	for(ch=0; ch<MAXCH; ch++) {
30		pru0_dram[2*ch  ] = on[ch];		// Copy to DRAM0 so the ARM can change it
31		pru0_dram[2*ch+1] = off[ch];	// Interleave the on and off values
32		onCount[ch] = on[ch];
33		offCount[ch]= off[ch];
34	}
35
36	while (1) {
37		for(ch=0; ch<MAXCH; ch++) {
38			if(onCount[ch]) {
39				onCount[ch]--;
40				__R30 |= 0x1<<ch;		// Set the GPIO pin to 1
41			} else if(offCount[ch]) {
42				offCount[ch]--;
43				__R30 &= ~(0x1<<ch);	// Clear the GPIO pin
44			} else {
45				onCount[ch] = pru0_dram[2*ch];		// Read from DRAM0
46				offCount[ch]= pru0_dram[2*ch+1];
47			}
48		}
49	}
50}

pwm4.pru0.c

Here is code that runs on the ARM side to set the on and off time values.

Listing 109 pwm-test.c#
 1/* 
 2 *
 3 *  pwm tester
 4 *	The on cycle and off cycles are stored in each PRU's Data memory
 5 *
 6 */
 7
 8#include <stdio.h>
 9#include <fcntl.h>
10#include <sys/mman.h>
11
12#define MAXCH 4
13
14#define PRU_ADDR		0x4A300000		// Start of PRU memory Page 184 am335x TRM
15#define PRU_LEN			0x80000			// Length of PRU memory
16#define PRU0_DRAM		0x00000			// Offset to DRAM
17#define PRU1_DRAM		0x02000
18#define PRU_SHAREDMEM	0x10000			// Offset to shared memory
19
20unsigned int	*pru0DRAM_32int_ptr;		// Points to the start of local DRAM
21unsigned int	*pru1DRAM_32int_ptr;		// Points to the start of local DRAM
22unsigned int	*prusharedMem_32int_ptr;	// Points to the start of the shared memory
23
24/*******************************************************************************
25* int start_pwm_count(int ch, int countOn, int countOff)
26* 
27* Starts a pwm pulse on for countOn and off for countOff to a single channel (ch)
28*******************************************************************************/
29int start_pwm_count(int ch, int countOn, int countOff) {
30	unsigned int *pruDRAM_32int_ptr = pru0DRAM_32int_ptr;
31	
32	printf("countOn: %d, countOff: %d, count: %d\n", 
33		countOn, countOff, countOn+countOff);
34	// write to PRU shared memory
35	pruDRAM_32int_ptr[2*(ch)+0] = countOn;	// On time
36	pruDRAM_32int_ptr[2*(ch)+1] = countOff;	// Off time
37	return 0;
38}
39
40int main(int argc, char *argv[])
41{
42	unsigned int	*pru;		// Points to start of PRU memory.
43	int	fd;
44	printf("Servo tester\n");
45	
46	fd = open ("/dev/mem", O_RDWR | O_SYNC);
47	if (fd == -1) {
48		printf ("ERROR: could not open /dev/mem.\n\n");
49		return 1;
50	}
51	pru = mmap (0, PRU_LEN, PROT_READ | PROT_WRITE, MAP_SHARED, fd, PRU_ADDR);
52	if (pru == MAP_FAILED) {
53		printf ("ERROR: could not map memory.\n\n");
54		return 1;
55	}
56	close(fd);
57	printf ("Using /dev/mem.\n");
58	
59	pru0DRAM_32int_ptr =     pru + PRU0_DRAM/4 + 0x200/4;	// Points to 0x200 of PRU0 memory
60	pru1DRAM_32int_ptr =     pru + PRU1_DRAM/4 + 0x200/4;	// Points to 0x200 of PRU1 memory
61	prusharedMem_32int_ptr = pru + PRU_SHAREDMEM/4;	// Points to start of shared memory
62
63	int i;
64	for(i=0; i<MAXCH; i++) {
65		start_pwm_count(i, i+1, 20-(i+1));
66	}
67	
68	if(munmap(pru, PRU_LEN)) {
69		printf("munmap failed\n");
70	} else {
71		printf("munmap succeeded\n");
72	}
73}

pwm-test.c

A quick check on the ‘scope shows Four Channel PWM with ARM control.

pwm4.png

Fig. 738 Four Channel PWM with ARM control#

From the ‘scope you see a 1 cycle on time results in a 450ns wide pulse and a 3.06us period is 326KHz, much slower than the 10ns pulse we saw before. But it may be more than fast enough for many applications. For example, most servos run at 50Hz.

But we can do better.

Loop Unrolling for Better Performance#

Problem#

The ARM controlled PRU code runs too slowly.

Solution#

Simple loop unrolling can greatly improve the speed. pwm5.pru0.c is our unrolled version.

Listing 110 pwm5.pru0.c Unrolled#
 1// This code does MAXCH parallel PWM channels.
 2// It's period is 510ns.
 3#include <stdint.h>
 4#include <pru_cfg.h>
 5#include "resource_table_empty.h"
 6
 7#define PRU0_DRAM		0x00000			// Offset to DRAM
 8// Skip the first 0x200 byte of DRAM since the Makefile allocates
 9// 0x100 for the STACK and 0x100 for the HEAP.
10volatile unsigned int *pru0_dram = (unsigned int *) (PRU0_DRAM + 0x200);
11
12#define MAXCH	4	// Maximum number of channels per PRU
13
14#define update(ch) \
15			if(onCount[ch]) {			\
16				onCount[ch]--;			\
17				__R30 |= 0x1<<ch;		\
18			} else if(offCount[ch]) {	\
19				offCount[ch]--;			\
20				__R30 &= ~(0x1<<ch);	\
21			} else {					\
22				onCount[ch] = pru0_dram[2*ch];	\
23				offCount[ch]= pru0_dram[2*ch+1];	\
24			}
25
26volatile register uint32_t __R30;
27volatile register uint32_t __R31;
28
29void main(void)
30{
31	uint32_t ch;
32	uint32_t on[]  = {1, 2, 3, 4};
33	uint32_t off[] = {4, 3, 2, 1};
34	uint32_t onCount[MAXCH], offCount[MAXCH];
35
36	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
37	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
38
39#pragma UNROLL(MAXCH)
40	for(ch=0; ch<MAXCH; ch++) {
41		pru0_dram[2*ch  ] = on[ch];		// Copy to DRAM0 so the ARM can change it
42		pru0_dram[2*ch+1] = off[ch];	// Interleave the on and off values
43		onCount[ch] = on[ch];
44		offCount[ch]= off[ch];
45	}
46
47	while (1) {
48		update(0)
49		update(1)
50		update(2)
51		update(3)
52	}
53}

pwm5.pru0.c

The output of pwm5.pru0.c is in the figure below.

pwm5.pru0.c Unrolled version of pwm4.pru0.c

Fig. 739 pwm5.pru0.c Unrolled version of pwm4.pru0.c#

It’s running about 6 times faster than pwm4.pru0.c.

Table 152 pwm4.pru0.c vs. pwm5.pru0.c#

Measure

pwm4.pru0.c time

pwm5.pru0.c time

Speedup

pwm5.pru0.c w/o UNROLL

Speedup

Period

3.06&mu;s

510ns

6x

1.81&mu;s

~1.7x

Width+

450ns

70ns

~6x

1.56&mu;s

~.3x

Not a bad speed up for just a couple of simple changes.

Discussion#

Here’s how it works. First look at line 39. You see #pragma UNROLL(MAXCH) which is a pragma that tells the compiler to unroll the loop that follows. We are unrolling it MAXCH times (four times in this example). Just removing the pragma causes the speedup compared to the pwm4.pru0.c case to drop from 6x to only 1.7x.

We also have our for loop inside the while loop that can be unrolled. Unfortunately UNROLL() doesn’t work on it, therefore we have to do it by hand. We could take the loop and just copy it three times, but that would make it harder to maintain the code. Instead I converted the loop into a #define (lines 14-24) and invoked update() as needed (lines 48-51). This is not a function call. Whenever the preprocessor sees the update() it copies the code an then it’s compiled.

This unrolling gets us an impressive 6x speedup.

Making All the Pulses Start at the Same Time#

Problem#

I have a mutlichannel PWM working, but the pulses aren’t synchronized, that is they don’t all start at the same time.

Solution#

pwm5.pru0 Zoomed In is a zoomed in version of the previous figure. Notice the pulse in each channel starts about 15ns later than the channel above it.

pwm5.pru0 zoomed.png

Fig. 740 pwm5.pru0 Zoomed In#

The solution is to declare Rtmp (line 35) which holds the value for __R30.

Listing 111 pwm6.pru0.c Sync’ed Version of pwm5.pru0.c#
 1// This code does MAXCH parallel PWM channels.
 2// All channels start at the same time. It's period is 510ns
 3#include <stdint.h>
 4#include <pru_cfg.h>
 5#include "resource_table_empty.h"
 6
 7#define PRU0_DRAM		0x00000			// Offset to DRAM
 8// Skip the first 0x200 byte of DRAM since the Makefile allocates
 9// 0x100 for the STACK and 0x100 for the HEAP.
10volatile unsigned int *pru0_dram = (unsigned int *) (PRU0_DRAM + 0x200);
11
12#define MAXCH	4	// Maximum number of channels per PRU
13
14#define update(ch) \
15			if(onCount[ch]) {			\
16				onCount[ch]--;			\
17				Rtmp |= 0x1<<ch;		\
18			} else if(offCount[ch]) {	\
19				offCount[ch]--;			\
20				Rtmp &= ~(0x1<<ch);	\
21			} else {					\
22				onCount[ch] = pru0_dram[2*ch];	\
23				offCount[ch]= pru0_dram[2*ch+1];	\
24			}
25
26volatile register uint32_t __R30;
27volatile register uint32_t __R31;
28
29void main(void)
30{
31	uint32_t ch;
32	uint32_t on[]  = {1, 2, 3, 4};
33	uint32_t off[] = {4, 3, 2, 1};
34	uint32_t onCount[MAXCH], offCount[MAXCH];
35	register uint32_t Rtmp;
36
37	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
38	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
39
40#pragma UNROLL(MAXCH)
41	for(ch=0; ch<MAXCH; ch++) {
42		pru0_dram[2*ch  ] = on[ch];		// Copy to DRAM0 so the ARM can change it
43		pru0_dram[2*ch+1] = off[ch];	// Interleave the on and off values
44		onCount[ch] = on[ch];
45		offCount[ch]= off[ch];
46	}
47	Rtmp = __R30;
48
49	while (1) {
50		update(0)
51		update(1)
52		update(2)
53		update(3)
54		__R30 = Rtmp;
55	}
56}

pwm6.pru0.c Sync'ed Version of pwm5.pru0.c

Each channel writes it’s value to Rtmp (lines 17 and 20) and then after each channel has updated, Rtmp is copied to __R30 (line 54).

Discussion#

The following figure shows the channel are sync’ed. Though the period is slightly longer than before.

pwm6.pru0 Synchronized Channels

Fig. 741 pwm6.pru0 Synchronized Channels#

Adding More Channels via PRU 1#

Problem#

You need more output channels, or you need to shorten the period.

Solution#

PRU 0 can output up to eight output pins (see Mapping bit positions to pin names). The code presented so far can be easily extended to use the eight output pins.

But what if you need more channels? You can always use PRU1, it has 14 output pins.

Or, what if four channels is enough, but you need a shorter period. Everytime you add a channel, the overall period gets longer. Twice as many channels means twice as long a period. If you move half the channels to PRU 1, you will make the period half as long.

Here’s the code (pwm7.pru0.c)

Listing 112 pwm7.pru0.c Using Both PRUs#
 1// This code does MAXCH parallel PWM channels on both PRU 0 and PRU 1
 2// All channels start at the same time. But the PRU 1 ch have a difference period
 3// It's period is 370ns
 4#include <stdint.h>
 5#include <pru_cfg.h>
 6#include "resource_table_empty.h"
 7
 8#define PRUNUM 0
 9
10#define PRU0_DRAM		0x00000			// Offset to DRAM
11// Skip the first 0x200 byte of DRAM since the Makefile allocates
12// 0x100 for the STACK and 0x100 for the HEAP.
13volatile unsigned int *pru0_dram = (unsigned int *) (PRU0_DRAM + 0x200);
14
15#define MAXCH	2	// Maximum number of channels per PRU
16
17#define update(ch) \
18			if(onCount[ch]) {			\
19				onCount[ch]--;			\
20				Rtmp |= 0x1<<ch;		\
21			} else if(offCount[ch]) {	\
22				offCount[ch]--;			\
23				Rtmp &= ~(0x1<<ch);	\
24			} else {					\
25				onCount[ch] = pru0_dram[2*ch];	\
26				offCount[ch]= pru0_dram[2*ch+1];	\
27			}
28
29volatile register uint32_t __R30;
30volatile register uint32_t __R31;
31
32void main(void)
33{
34	uint32_t ch;
35	uint32_t on[]  = {1, 2, 3, 4};
36	uint32_t off[] = {4, 3, 2, 1};
37	uint32_t onCount[MAXCH], offCount[MAXCH];
38	register uint32_t Rtmp;
39
40	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
41	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
42
43#pragma UNROLL(MAXCH)
44	for(ch=0; ch<MAXCH; ch++) {
45		pru0_dram[2*ch  ] = on [ch+PRUNUM*MAXCH];	// Copy to DRAM0 so the ARM can change it
46		pru0_dram[2*ch+1] = off[ch+PRUNUM*MAXCH];	// Interleave the on and off values
47		onCount[ch] = on [ch+PRUNUM*MAXCH];
48		offCount[ch]= off[ch+PRUNUM*MAXCH];
49	}
50	Rtmp = __R30;
51
52	while (1) {
53		update(0)
54		update(1)
55		__R30 = Rtmp;
56	}
57}

pwm7.pru0.c Using Both PRUs

Be sure to run pwm7_setup.sh to get the correct pins configured.

Listing 113 pwm7_setup.sh#
 1#!/bin/bash
 2#
 3export TARGET=pwm7.pru0
 4echo TARGET=$TARGET
 5
 6# Configure the PRU pins based on which Beagle is running
 7machine=$(awk '{print $NF}' /proc/device-tree/model)
 8echo -n $machine
 9if [ $machine = "Black" ]; then
10    echo " Found"
11    pins="P9_31 P9_29 P8_45 P8_46"
12elif [ $machine = "Blue" ]; then
13    echo " Found"
14    pins=""
15elif [ $machine = "PocketBeagle" ]; then
16    echo " Found"
17    pins="P1_36 P1_33"
18else
19    echo " Not Found"
20    pins=""
21fi
22
23for pin in $pins
24do
25    echo $pin
26    config-pin $pin pruout
27    config-pin -q $pin
28done

pw7_setup.sh

This makes sure the PRU 1 pins are properly configured.

Here we have a second pwm7 file. pwm7.pru1.c is identical to pwm7.pru0.c except PRUNUM is set to 1, instead of 0.

Compile and run the two files with:

bone$ *make TARGET=pwm7.pru0; make TARGET=pwm7.pru1*
/opt/source/pru-cookbook-code/common/Makefile:29: MODEL=TI_AM335x_BeagleBone_Black,TARGET=pwm7.pru0
-    Stopping PRU 0
-     copying firmware file /tmp/vsx-examples/pwm7.pru0.out to /lib/firmware/am335x-pru0-fw
write_init_pins.sh
-    Starting PRU 0
MODEL   = TI_AM335x_BeagleBone_Black
PROC    = pru
PRUN    = 0
PRU_DIR = /sys/class/remoteproc/remoteproc1
/opt/source/pru-cookbook-code/common/Makefile:29: MODEL=TI_AM335x_BeagleBone_Black,TARGET=pwm7.pru1
-    Stopping PRU 1
-     copying firmware file /tmp/vsx-examples/pwm7.pru1.out to /lib/firmware/am335x-pru1-fw
write_init_pins.sh
-    Starting PRU 1
MODEL   = TI_AM335x_BeagleBone_Black
PROC    = pru
PRUN    = 1
PRU_DIR = /sys/class/remoteproc/remoteproc2

This will first stop, compile and start PRU 0, then do the same for PRU 1.

Moving half of the channels to PRU1 dropped the period from 510ns to 370ns, so we gained a bit.

Discussion#

There weren’t many changes to be made. Line 15 we set MAXCH to 2. Lines 44-48 is where the big change is.

pru0_dram[2*ch  ] = on [ch+PRUNUN*MAXCH];       // Copy to DRAM0 so the ARM can change it
pru0_dram[2*ch+1] = off[ch+PRUNUN*MAXCH];       // Interleave the on and off values
onCount[ch] = on [ch+PRUNUN*MAXCH];
offCount[ch]= off[ch+PRUNUN*MAXCH];

If we are compiling for PRU 0, on[ch+PRUNUN*MAXCH] becomes on[ch+0*2] which is on[ch] which is what we had before. But now if we are on PRU 1 it becomes on[ch+1*2] which is on[ch+2]. That means we are picking up the second half of the on and off arrays. The first half goes to PRU 0, the second to PRU 1. So the same code can be used for both PRUs, but we get slightly different behavior.

Running the code you will see the next figure.

pwm7.pru0 Two PRUs running

Fig. 742 pwm7.pru0 Two PRUs running#

What’s going on there, the first channels look fine, but the PRU 1 channels are blurred. To see what’s happening, let’s stop the oscilloscope.

pwm7 Two PRUs stopped

Fig. 743 pwm7.pru0 Two PRUs stopped#

The stopped display shows that the four channels are doing what we wanted, except The PRU 0 channels have a period of 370ns while the PRU 1 channels at 330ns. It appears the compiler has optimied the two PRUs slightly differently.

Synchronizing Two PRUs#

Problem#

I need to synchronize the two PRUs so they run together.

Solution#

Use the Interrupt Controller (INTC). It allows one PRU to signal the other. Page 225 of the AM335x TRM 225 has details of how it works. Here’s the code for PRU 0, which at the end of the while loop signals PRU 1 to start(pwm8.pru0.c).

Listing 114 pwm8.pru0.c PRU 0 using INTC to send a signal to PRU 1#
 1// This code does MAXCH parallel PWM channels on both PRU 0 and PRU 1
 2// All channels start at the same time. 
 3// It's period is 430ns
 4#include <stdint.h>
 5#include <pru_cfg.h>
 6#include <pru_intc.h>
 7#include <pru_ctrl.h>
 8#include "resource_table_empty.h"
 9
10#define PRUNUM 0
11
12#define PRU0_DRAM		0x00000			// Offset to DRAM
13// Skip the first 0x200 byte of DRAM since the Makefile allocates
14// 0x100 for the STACK and 0x100 for the HEAP.
15volatile unsigned int *pru0_dram = (unsigned int *) (PRU0_DRAM + 0x200);
16
17#define MAXCH	2	// Maximum number of channels per PRU
18
19#define update(ch) \
20			if(onCount[ch]) {			\
21				onCount[ch]--;			\
22				Rtmp |= 0x1<<ch;		\
23			} else if(offCount[ch]) {	\
24				offCount[ch]--;			\
25				Rtmp &= ~(0x1<<ch);	\
26			} else {					\
27				onCount[ch] = pru0_dram[2*ch];	\
28				offCount[ch]= pru0_dram[2*ch+1];	\
29			}
30
31volatile register uint32_t __R30;
32volatile register uint32_t __R31;
33
34// Initialize interrupts so the PRUs can be syncronized.
35// PRU1 is started first and then waits for PRU0
36// PRU0 is then started and tells PRU1 when to start going
37void configIntc(void) {	
38	__R31 = 0x00000000;					// Clear any pending PRU-generated events
39	CT_INTC.CMR4_bit.CH_MAP_16 = 1;		// Map event 16 to channel 1
40	CT_INTC.HMR0_bit.HINT_MAP_1 = 1;	// Map channel 1 to host 1
41	CT_INTC.SICR = 16;					// Ensure event 16 is cleared
42	CT_INTC.EISR = 16;					// Enable event 16
43	CT_INTC.HIEISR |= (1 << 0);			// Enable Host interrupt 1
44	CT_INTC.GER = 1; 					// Globally enable host interrupts
45}
46
47void main(void)
48{
49	uint32_t ch;
50	uint32_t on[]  = {1, 2, 3, 4};
51	uint32_t off[] = {4, 3, 2, 1};
52	uint32_t onCount[MAXCH], offCount[MAXCH];
53	register uint32_t Rtmp;
54
55	CT_CFG.GPCFG0 = 0x0000;				// Configure GPI and GPO as Mode 0 (Direct Connect)
56	configIntc();						// Configure INTC
57
58	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
59	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
60
61#pragma UNROLL(MAXCH)
62	for(ch=0; ch<MAXCH; ch++) {
63		pru0_dram[2*ch  ] = on [ch+PRUNUM*MAXCH];	// Copy to DRAM0 so the ARM can change it
64		pru0_dram[2*ch+1] = off[ch+PRUNUM*MAXCH];	// Interleave the on and off values
65		onCount[ch] = on [ch+PRUNUM*MAXCH];
66		offCount[ch]= off[ch+PRUNUM*MAXCH];
67	}
68	Rtmp = __R30;
69
70	while (1) {
71		__R30 = Rtmp;
72		update(0)
73		update(1)
74#define PRU0_PRU1_EVT 16
75		__R31 = (PRU0_PRU1_EVT-16) | (0x1<<5);	//Tell PRU 1 to start
76		__delay_cycles(1);
77	}
78}

pwm8.pru0.c  PRU 0 using INTC to send a signal to PRU 1

PRU 2’s code waits for PRU 0 before going.

Listing 115 pwm8.pru1.c PRU 1 waiting for INTC from PRU 0#
 1// This code does MAXCH parallel PWM channels on both PRU 0 and PRU 1
 2// All channels start at the same time. 
 3// It's period is 430ns
 4#include <stdint.h>
 5#include <pru_cfg.h>
 6#include <pru_intc.h>
 7#include <pru_ctrl.h>
 8#include "resource_table_empty.h"
 9
10#define PRUNUM 1
11
12#define PRU0_DRAM		0x00000			// Offset to DRAM
13// Skip the first 0x200 byte of DRAM since the Makefile allocates
14// 0x100 for the STACK and 0x100 for the HEAP.
15volatile unsigned int *pru0_dram = (unsigned int *) (PRU0_DRAM + 0x200);
16
17#define MAXCH	2	// Maximum number of channels per PRU
18
19#define update(ch) \
20			if(onCount[ch]) {			\
21				onCount[ch]--;			\
22				Rtmp |= 0x1<<ch;		\
23			} else if(offCount[ch]) {	\
24				offCount[ch]--;			\
25				Rtmp &= ~(0x1<<ch);	\
26			} else {					\
27				onCount[ch] = pru0_dram[2*ch];	\
28				offCount[ch]= pru0_dram[2*ch+1];	\
29			}
30
31volatile register uint32_t __R30;
32volatile register uint32_t __R31;
33
34// Initialize interrupts so the PRUs can be syncronized.
35// PRU1 is started first and then waits for PRU0
36// PRU0 is then started and tells PRU1 when to start going
37
38void main(void)
39{
40	uint32_t ch;
41	uint32_t on[]  = {1, 2, 3, 4};
42	uint32_t off[] = {4, 3, 2, 1};
43	uint32_t onCount[MAXCH], offCount[MAXCH];
44	register uint32_t Rtmp;
45
46	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
47	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
48
49#pragma UNROLL(MAXCH)
50	for(ch=0; ch<MAXCH; ch++) {
51		pru0_dram[2*ch  ] = on [ch+PRUNUM*MAXCH];	// Copy to DRAM0 so the ARM can change it
52		pru0_dram[2*ch+1] = off[ch+PRUNUM*MAXCH];	// Interleave the on and off values
53		onCount[ch] = on [ch+PRUNUM*MAXCH];
54		offCount[ch]= off[ch+PRUNUM*MAXCH];
55	}
56	Rtmp = __R30;
57
58	while (1) {
59		while((__R31 & (0x1<<31))==0) {		// Wait for PRU 0
60		}
61		CT_INTC.SICR = 16;					// Clear event 16
62		__R30 = Rtmp;
63		update(0)
64		update(1)
65	}
66}

pwm8.pru1.c PRU 1 waiting for INTC from PRU 0

In pwm8.pru0.c PRU 1 waits for a signal from PRU 0, so be sure to start PRU 1 first.

bone$ *make TARGET=pwm8.pru0; make TARGET=pwm8.pru1*

Discussion#

The figure below shows the two PRUs are synchronized, though there is some extra overhead in the process so the period is longer.

pwm8.pru0 PRUs synced

Fig. 744 pwm8.pru0 PRUs synced#

This isn’t much different from the previous examples.

Table 153 pwm8.pru0.c changes from pwm7.pru0.c#

PRU

Line

Change

0

37-45

For PRU 0 these define configInitc() which initializes the interrupts. See page 226 of the AM335x TRM for a diagram explaining events, channels, hosts, etc.

0

55-56

Set a configuration register and call configInitc.

1

59-61

PRU 1 then waits for PRU 0 to signal it. Bit 31 of __R31 corresponds to the Host-1 channel which configInitc() set up. We also clear event 16 so PRU 0 can set it again.

0

74-75

On PRU 0 this generates the interrupt to send to PRU 1. I found PRU 1 was slow to respond to the interrupt, so I put this code at the end of the loop to give time for the signal to get to PRU 1.

This ends the multipart pwm example.

Reading an Input at Regular Intervals#

Problem#

You have an input pin that needs to be read at regular intervals.

Solution#

You can use the __R31 register to read an input pin. Let’s use the following pins.

Table 154 Input/Output pins#

Direction

Bit number

Black

AI (ICSS2)

Pocket

out

0

P9_31

P8_44

P1.36

in

7

P9_25

P8_36

P1.29

These values came from Mapping bit positions to pin names.

Configure the pins with input_setup.sh.

Listing 116 input_setup.sh#
 1#!/bin/bash
 2#
 3export TARGET=input.pru0
 4echo TARGET=$TARGET
 5
 6# Configure the PRU pins based on which Beagle is running
 7machine=$(awk '{print $NF}' /proc/device-tree/model)
 8echo -n $machine
 9if [ $machine = "Black" ]; then
10    echo " Found"
11    config-pin P9_31 pruout
12    config-pin -q P9_31
13    config-pin P9_25 pruin
14    config-pin -q P9_25
15elif [ $machine = "Blue" ]; then
16    echo " Found"
17    pins=""
18elif [ $machine = "PocketBeagle" ]; then
19    echo " Found"
20    config-pin P1_36 pruout
21    config-pin -q P1_36
22    config-pin P1_29 pruin
23    config-pin -q P1_29
24else
25    echo " Not Found"
26    pins=""
27fi

input_setup.sh

The following code reads the input pin and writes its value to the output pin.

Listing 117 input.pru0.c#
 1#include <stdint.h>
 2#include <pru_cfg.h>
 3#include "resource_table_empty.h"
 4
 5volatile register uint32_t __R30;
 6volatile register uint32_t __R31;
 7
 8void main(void)
 9{
10	uint32_t led;
11	uint32_t sw;
12
13	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
14	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
15
16	led = 0x1<<0;	// P9_31 or P1_36
17	sw  = 0x1<<7;	// P9_25 or P1_29
18		
19	while (1) {
20		if((__R31&sw) == sw) {
21			__R30 |= led;		// Turn on LED
22		} else
23			__R30 &= ~led;		// Turn off LED
24	}
25}
26

input.pru0.c

Discussion#

Just remember that __R30 is for outputs and __R31 is for inputs.

Analog Wave Generator#

Problem#

I want to generate an analog output, but only have GPIO pins.

Solution#

The Beagle doesn’t have a built-in analog to digital converter. You could get a USB Audio Dongle which are under $10. But here we’ll take another approach.

Earlier we generated a PWM signal. Here we’ll generate a PWM whose duty cycle changes with time. A small duty cycle for when the output signal is small and a large duty cycle for when it is large.

This example was inspired by A PRU Sin Wave Generator in chapter 13 of Exploring BeagleBone by Derek Molloy.

Here’s the code.

Listing 118 sine.pru0.c#
 1// Generate an analog waveform and use a filter to reconstruct it.
 2#include <stdint.h>
 3#include <pru_cfg.h>
 4#include "resource_table_empty.h"
 5#include <math.h>
 6
 7#define MAXT	100	// Maximum number of time samples
 8#define SAWTOOTH	// Pick which waveform
 9
10volatile register uint32_t __R30;
11volatile register uint32_t __R31;
12
13void main(void)
14{
15	uint32_t onCount;		// Current count for 1 out
16	uint32_t offCount;		// count for 0 out
17	uint32_t i;
18	uint32_t waveform[MAXT]; // Waveform to be produced
19
20	// Generate a periodic wave in an array of MAXT values
21#ifdef SAWTOOTH
22	for(i=0; i<MAXT; i++) {
23		waveform[i] = i*100/MAXT;
24	}
25#endif
26#ifdef TRIANGLE
27	for(i=0; i<MAXT/2; i++) {
28		waveform[i]        = 2*i*100/MAXT;
29		waveform[MAXT-i-1] = 2*i*100/MAXT;
30	}
31#endif
32#ifdef SINE
33	float gain = 50.0f;
34	float bias = 50.0f;
35	float freq = 2.0f * 3.14159f / MAXT;
36	for (i=0; i<MAXT; i++){
37		waveform[i] = (uint32_t)(bias+gain*sin(i*freq));
38	}
39#endif
40
41	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
42	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
43
44	while (1) {
45		// Generate a PWM signal whose duty cycle matches
46		// the amplitude of the signal.
47		for(i=0; i<MAXT; i++) {
48			onCount = waveform[i];
49			offCount = 100 - onCount;
50			while(onCount--) {
51				__R30 |= 0x1;		// Set the GPIO pin to 1
52			}
53			while(offCount--) {
54				__R30 &= ~(0x1);	// Clear the GPIO pin
55			}
56		}
57	}
58}

sine.pru0.c

Set the #define at line 7 to the number of samples in one cycle of the waveform and set the #define at line 8 to which waveform and then run make.

Discussion#

The code has two parts. The first part (lines 21 to 39) generate the waveform to be output. The #define``s let you select which waveform you want to generate.  Since the output is a percent duty cycle, the values in ``waveform[] must be between 0 and 100 inclusive. The waveform is only generated once, so this part of the code isn’t time critical.

The second part (lines 44 to 54) uses the generated data to set the duty cycle of the PWM on a cycle-by-cycle basis. This part is time critical; the faster we can output the values, the higher the frequency of the output signal.

Suppose you want to generate a sawtooth waveform like the one shown in Continuous Sawtooth Waveform.

Continuous Sawtooth Waveform

Fig. 745 Continuous Sawtooth Waveform#

You need to sample the waveform and store one cycle. Sampled Sawtooth Waveform shows a sampled version of the sawtooth. You need to generate MAXT samples; here we show 20 samples, which may be enough. In the code MAXT is set to 100.

Sampled Sawtooth Waveform

Fig. 746 Sampled Sawtooth Waveform#

There’s a lot going on here; let’s take it line by line.

Table 155 Line-by-line of sine.pru0.c#

Line

Explanation

2-5

Standard c-header includes

7

Number for samples in one cycle of the analog waveform

8

Which waveform to use. We’ve defined SAWTOOTH, TRIANGLE and SINE, but you can define your own too.

10-11

Declaring registers pass:[__R30] and pass:[__R31].

15-16

onCount counts how many cycles the PWM should be 1 and offCount counts how many it should be off.

18

waveform[] stores the analog waveform being output.

21-24

SAWTOOTH is the simplest of the waveforms. Each sample is the duty cycle at that time and must therefore be between 0 and 100.

26-31

TRIANGLE is also a simple waveform.

32-39

SINE generates a sine wave and also introduces floating point. Yes, you can use floating point, but the PRUs don’t have floating point hardware, rather, it’s all done in software. This mean using floating point will make your code much bigger and slower. Slower doesn’t matter in this part, and bigger isn’t bigger than our instruction memory, so we’re OK.

47

Here the for loop looks up each value of the generated waveform.

48,49

onCount is the number of cycles to be at 1 and offCount is the number of cycles to be 0. The two add to 100, one full cycle.

50-52

Stay on for onCount cycles.

53-55

Now turn off for offCount cycles, then loop back and look up the next cycle count.

Unfiltered Sawtooth Waveform shows the output of the code.

Unfiltered Sawtooth Waveform

Fig. 747 Unfiltered Sawtooth Waveform#

It doesn’t look like a sawtooth; but if you look at the left side you will see each cycle has a longer and longer on time. The duty cycle is increasing. Once it’s almost 100% duty cycle, it switches to a very small duty cycle. Therefore it’s output what we programmed, but what we want is the average of the signal. The left hand side has a large (and increasing) average which would be for top of the sawtooth. The right hand side has a small average, which is what you want for the start of the sawtooth.

A simple low-pass filter, built with one resistor and one capacitor will do it. Low-Pass Filter Wiring Diagram shows how to wire it up.

Low-Pass Filter Wiring Diagram

Fig. 748 Low-Pass Filter Wiring Diagram#

Note

I used a 10K variable resistor and a 0.022uF capacitor. Probe the circuit between the resistor and the capacitor and adjust the resistor until you get a good looking waveform.

Reconstructed Sawtooth Waveform shows the results for filtered the SAWTOOTH.

Reconstructed Sawtooth Waveform

Fig. 749 Reconstructed Sawtooth Waveform#

Now that looks more like a sawtooth wave. The top plot is the time-domain plot of the output of the low-pass filter. The bottom plot is the FFT of the top plot, therefore it’s the frequency domain. We are getting a sawtooth with a frequency of about 6.1KHz. You can see the fundamental frequency on the bottom plot along with several harmonics.

The top looks like a sawtooth wave, but there is a high freqnecy superimposed on it. We are only using a simple first-order filter. You could lower the cutoff freqnecy by adjusting the resistor. You’ll see something like Reconstructed Sawtooth Waveform with Lower Cutoff Frequency.

Reconstructed Sawtooth Waveform with Lower Cutoff Frequency

Fig. 750 Reconstructed Sawtooth Waveform with Lower Cutoff Frequency#

The high frequencies have been reduced, but the corner of the waveform has been rounded. You can also adjust the cutoff to a higher frequency and you’ll get a sharper corner, but you’ll also get more high frequencies. See Reconstructed Sawtooth Waveform with Higher Cutoff Frequency

Reconstructed Sawtooth Waveform with Higher Cutoff Frequency

Fig. 751 Reconstructed Sawtooth Waveform with Higher Cutoff Frequency#

Adjust to taste, though the real solution is to build a higher order filter. Search for _second order filter and you’ll find some nice circuits.

You can adjust the frequency of the signal by adjusting MAXT. A smaller MAXT will give a higher frequency. I’ve gotten good results with MAXT as small as 20.

You can also get a triangle waveform by setting the #define. Reconstructed Triangle Waveform shows the output signal.

Reconstructed Triangle Waveform

Fig. 752 Reconstructed Triangle Waveform#

And also the sine wave as shown in Reconstructed Sinusoid Waveform.

Reconstructed Sinusoid Waveform

Fig. 753 Reconstructed Sinusoid Waveform#

Notice on the bottom plot the harmonics are much more suppressed.

Generating the sine waveform uses floats. This requires much more code. You can look in /tmp/vsx-examples/sine.pru0.map to see how much memory is being used. /tmp/vsx-examples/sine.pru0.map for Sine Wave shows the first few lines for the sine wave.

Listing 119 /tmp/vsx-examples/sine.pru0.map for Sine Wave#
  1******************************************************************************
  2PRU Linker Unix v2.1.5                    
  3******************************************************************************
  4>> Linked Fri Jun 29 13:58:08 2018
  5
  6OUTPUT FILE NAME:   </tmp/pru0-gen/sine1.out>
  7ENTRY POINT SYMBOL: "_c_int00_noinit_noargs_noexit"  address: 00000000
  8
  9
 10MEMORY CONFIGURATION
 11
 12         name            origin    length      used     unused   attr    fill
 13----------------------  --------  ---------  --------  --------  ----  --------
 14PAGE 0:
 15  PRU_IMEM              00000000   00002000  000018c0  00000740  RWIX
 16
 17PAGE 1:
 18  PRU_DMEM_0_1          00000000   00002000  00000154  00001eac  RWIX
 19  PRU_DMEM_1_0          00002000   00002000  00000000  00002000  RWIX
 20
 21PAGE 2:
 22  PRU_SHAREDMEM         00010000   00003000  00000000  00003000  RWIX
 23  PRU_INTC              00020000   00001504  00000000  00001504  RWIX
 24  PRU_CFG               00026000   00000044  00000044  00000000  RWIX
 25  PRU_UART              00028000   00000038  00000000  00000038  RWIX
 26  PRU_IEP               0002e000   0000031c  00000000  0000031c  RWIX
 27  PRU_ECAP              00030000   00000060  00000000  00000060  RWIX
 28  RSVD27                00032000   00000100  00000000  00000100  RWIX
 29  RSVD21                00032400   00000100  00000000  00000100  RWIX
 30  L3OCMC                40000000   00010000  00000000  00010000  RWIX
 31  MCASP0_DMA            46000000   00000100  00000000  00000100  RWIX
 32  UART1                 48022000   00000088  00000000  00000088  RWIX
 33  UART2                 48024000   00000088  00000000  00000088  RWIX
 34  I2C1                  4802a000   000000d8  00000000  000000d8  RWIX
 35  MCSPI0                48030000   000001a4  00000000  000001a4  RWIX
 36  DMTIMER2              48040000   0000005c  00000000  0000005c  RWIX
 37  MMCHS0                48060000   00000300  00000000  00000300  RWIX
 38  MBX0                  480c8000   00000140  00000000  00000140  RWIX
 39  SPINLOCK              480ca000   00000880  00000000  00000880  RWIX
 40  I2C2                  4819c000   000000d8  00000000  000000d8  RWIX
 41  MCSPI1                481a0000   000001a4  00000000  000001a4  RWIX
 42  DCAN0                 481cc000   000001e8  00000000  000001e8  RWIX
 43  DCAN1                 481d0000   000001e8  00000000  000001e8  RWIX
 44  PWMSS0                48300000   000002c4  00000000  000002c4  RWIX
 45  PWMSS1                48302000   000002c4  00000000  000002c4  RWIX
 46  PWMSS2                48304000   000002c4  00000000  000002c4  RWIX
 47  RSVD13                48310000   00000100  00000000  00000100  RWIX
 48  RSVD10                48318000   00000100  00000000  00000100  RWIX
 49  TPCC                  49000000   00001098  00000000  00001098  RWIX
 50  GEMAC                 4a100000   0000128c  00000000  0000128c  RWIX
 51  DDR                   80000000   00000100  00000000  00000100  RWIX
 52
 53
 54SECTION ALLOCATION MAP
 55
 56 output                                  attributes/
 57section   page    origin      length       input sections
 58--------  ----  ----------  ----------   ----------------
 59.text:_c_int00* 
 60*          0    00000000    00000014     
 61                  00000000    00000014     rtspruv3_le.lib : boot_special.obj (.text:_c_int00_noinit_noargs_noexit)
 62
 63.text      0    00000014    000018ac     
 64                  00000014    00000374     rtspruv3_le.lib : sin.obj (.text:sin)
 65                  00000388    00000314                     : frcmpyd.obj (.text:__TI_frcmpyd)
 66                  0000069c    00000258                     : frcaddd.obj (.text:__TI_frcaddd)
 67                  000008f4    00000254                     : mpyd.obj (.text:__pruabi_mpyd)
 68                  00000b48    00000248                     : addd.obj (.text:__pruabi_addd)
 69                  00000d90    000001c8                     : mpyf.obj (.text:__pruabi_mpyf)
 70                  00000f58    00000100                     : modf.obj (.text:modf)
 71                  00001058    000000b4                     : gtd.obj (.text:__pruabi_gtd)
 72                  0000110c    000000b0                     : ged.obj (.text:__pruabi_ged)
 73                  000011bc    000000b0                     : ltd.obj (.text:__pruabi_ltd)
 74                  0000126c    000000b0     sine1.obj (.text:main)
 75                  0000131c    000000a8     rtspruv3_le.lib : frcmpyf.obj (.text:__TI_frcmpyf)
 76                  000013c4    000000a0                     : fixdu.obj (.text:__pruabi_fixdu)
 77                  00001464    0000009c                     : round.obj (.text:__pruabi_nround)
 78                  00001500    00000090                     : eqld.obj (.text:__pruabi_eqd)
 79                  00001590    0000008c                     : renormd.obj (.text:__TI_renormd)
 80                  0000161c    0000008c                     : fixdi.obj (.text:__pruabi_fixdi)
 81                  000016a8    00000084                     : fltid.obj (.text:__pruabi_fltid)
 82                  0000172c    00000078                     : cvtfd.obj (.text:__pruabi_cvtfd)
 83                  000017a4    00000050                     : fltuf.obj (.text:__pruabi_fltuf)
 84                  000017f4    0000002c                     : asri.obj (.text:__pruabi_asri)
 85                  00001820    0000002c                     : subd.obj (.text:__pruabi_subd)
 86                  0000184c    00000024                     : mpyi.obj (.text:__pruabi_mpyi)
 87                  00001870    00000020                     : negd.obj (.text:__pruabi_negd)
 88                  00001890    00000020                     : trunc.obj (.text:__pruabi_trunc)
 89                  000018b0    00000008                     : exit.obj (.text:abort)
 90                  000018b8    00000008                     : exit.obj (.text:loader_exit)
 91
 92.stack     1    00000000    00000100     UNINITIALIZED
 93                  00000000    00000004     rtspruv3_le.lib : boot.obj (.stack)
 94                  00000004    000000fc     --HOLE--
 95
 96.cinit     1    00000000    00000000     UNINITIALIZED
 97
 98.fardata   1    00000100    00000040     
 99                  00000100    00000040     rtspruv3_le.lib : sin.obj (.fardata:R$1)
100
101.resource_table 
102*          1    00000140    00000014     
103                  00000140    00000014     sine1.obj (.resource_table:retain)
104
105.creg.PRU_CFG.noload.near 
106*          2    00026000    00000044     NOLOAD SECTION
107                  00026000    00000044     sine1.obj (.creg.PRU_CFG.noload.near)
108
109.creg.PRU_CFG.near 
110*          2    00026044    00000000     UNINITIALIZED
111
112.creg.PRU_CFG.noload.far 
113*          2    00026044    00000000     NOLOAD SECTION
114
115.creg.PRU_CFG.far 
116*          2    00026044    00000000     UNINITIALIZED
117
118
119SEGMENT ATTRIBUTES
120
121    id tag      seg value
122    -- ---      --- -----
123     0 PHA_PAGE 1   1    
124     1 PHA_PAGE 2   1    
125
126
127GLOBAL SYMBOLS: SORTED ALPHABETICALLY BY Name 
128
129page  address   name                         
130----  -------   ----                         
1310     000018b8  C$$EXIT                      
1322     00026000  CT_CFG                       
133abs   481cc000  __PRU_CREG_BASE_DCAN0        
134abs   481d0000  __PRU_CREG_BASE_DCAN1        
135abs   80000000  __PRU_CREG_BASE_DDR          
136abs   48040000  __PRU_CREG_BASE_DMTIMER2     
137abs   4a100000  __PRU_CREG_BASE_GEMAC        
138abs   4802a000  __PRU_CREG_BASE_I2C1         
139abs   4819c000  __PRU_CREG_BASE_I2C2         
140abs   40000000  __PRU_CREG_BASE_L3OCMC       
141abs   480c8000  __PRU_CREG_BASE_MBX0         
142abs   46000000  __PRU_CREG_BASE_MCASP0_DMA   
143abs   48030000  __PRU_CREG_BASE_MCSPI0       
144abs   481a0000  __PRU_CREG_BASE_MCSPI1       
145abs   48060000  __PRU_CREG_BASE_MMCHS0       
146abs   00026000  __PRU_CREG_BASE_PRU_CFG      
147abs   00000000  __PRU_CREG_BASE_PRU_DMEM_0_1 
148abs   00002000  __PRU_CREG_BASE_PRU_DMEM_1_0 
149abs   00030000  __PRU_CREG_BASE_PRU_ECAP     
150abs   0002e000  __PRU_CREG_BASE_PRU_IEP      
151abs   00020000  __PRU_CREG_BASE_PRU_INTC     
152abs   00010000  __PRU_CREG_BASE_PRU_SHAREDMEM
153abs   00028000  __PRU_CREG_BASE_PRU_UART     
154abs   48300000  __PRU_CREG_BASE_PWMSS0       
155abs   48302000  __PRU_CREG_BASE_PWMSS1       
156abs   48304000  __PRU_CREG_BASE_PWMSS2       
157abs   48318000  __PRU_CREG_BASE_RSVD10       
158abs   48310000  __PRU_CREG_BASE_RSVD13       
159abs   00032400  __PRU_CREG_BASE_RSVD21       
160abs   00032000  __PRU_CREG_BASE_RSVD27       
161abs   480ca000  __PRU_CREG_BASE_SPINLOCK     
162abs   49000000  __PRU_CREG_BASE_TPCC         
163abs   48022000  __PRU_CREG_BASE_UART1        
164abs   48024000  __PRU_CREG_BASE_UART2        
165abs   0000000e  __PRU_CREG_DCAN0             
166abs   0000000f  __PRU_CREG_DCAN1             
167abs   0000001f  __PRU_CREG_DDR               
168abs   00000001  __PRU_CREG_DMTIMER2          
169abs   00000009  __PRU_CREG_GEMAC             
170abs   00000002  __PRU_CREG_I2C1              
171abs   00000011  __PRU_CREG_I2C2              
172abs   0000001e  __PRU_CREG_L3OCMC            
173abs   00000016  __PRU_CREG_MBX0              
174abs   00000008  __PRU_CREG_MCASP0_DMA        
175abs   00000006  __PRU_CREG_MCSPI0            
176abs   00000010  __PRU_CREG_MCSPI1            
177abs   00000005  __PRU_CREG_MMCHS0            
178abs   00000004  __PRU_CREG_PRU_CFG           
179abs   00000018  __PRU_CREG_PRU_DMEM_0_1      
180abs   00000019  __PRU_CREG_PRU_DMEM_1_0      
181abs   00000003  __PRU_CREG_PRU_ECAP          
182abs   0000001a  __PRU_CREG_PRU_IEP           
183abs   00000000  __PRU_CREG_PRU_INTC          
184abs   0000001c  __PRU_CREG_PRU_SHAREDMEM     
185abs   00000007  __PRU_CREG_PRU_UART          
186abs   00000012  __PRU_CREG_PWMSS0            
187abs   00000013  __PRU_CREG_PWMSS1            
188abs   00000014  __PRU_CREG_PWMSS2            
189abs   0000000a  __PRU_CREG_RSVD10            
190abs   0000000d  __PRU_CREG_RSVD13            
191abs   00000015  __PRU_CREG_RSVD21            
192abs   0000001b  __PRU_CREG_RSVD27            
193abs   00000017  __PRU_CREG_SPINLOCK          
194abs   0000001d  __PRU_CREG_TPCC              
195abs   0000000b  __PRU_CREG_UART1             
196abs   0000000c  __PRU_CREG_UART2             
1971     00000100  __TI_STACK_END               
198abs   00000100  __TI_STACK_SIZE              
1990     0000069c  __TI_frcaddd                 
2000     00000388  __TI_frcmpyd                 
2010     0000131c  __TI_frcmpyf                 
2020     00001590  __TI_renormd                 
203abs   ffffffff  __binit__                    
204abs   ffffffff  __c_args__                   
2050     00000b48  __pruabi_addd                
2060     000017f4  __pruabi_asri                
2070     0000172c  __pruabi_cvtfd               
2080     00001500  __pruabi_eqd                 
2090     0000161c  __pruabi_fixdi               
2100     000013c4  __pruabi_fixdu               
2110     000016a8  __pruabi_fltid               
2120     000017a4  __pruabi_fltuf               
2130     0000110c  __pruabi_ged                 
2140     00001058  __pruabi_gtd                 
2150     000011bc  __pruabi_ltd                 
2160     000008f4  __pruabi_mpyd                
2170     00000d90  __pruabi_mpyf                
2180     0000184c  __pruabi_mpyi                
2190     00001870  __pruabi_negd                
2200     00001464  __pruabi_nround              
2210     00001820  __pruabi_subd                
2220     00001890  __pruabi_trunc               
2230     00000000  _c_int00_noinit_noargs_noexit
2241     00000000  _stack                       
2250     000018b0  abort                        
226abs   ffffffff  binit                        
2270     0000126c  main                         
2280     00000f58  modf                         
2291     00000140  pru_remoteproc_ResourceTable 
2300     00000014  sin                          
231
232
233GLOBAL SYMBOLS: SORTED BY Symbol Address 
234
235page  address   name                         
236----  -------   ----                         
2370     00000000  _c_int00_noinit_noargs_noexit
2380     00000014  sin                          
2390     00000388  __TI_frcmpyd                 
2400     0000069c  __TI_frcaddd                 
2410     000008f4  __pruabi_mpyd                
2420     00000b48  __pruabi_addd                
2430     00000d90  __pruabi_mpyf                
2440     00000f58  modf                         
2450     00001058  __pruabi_gtd                 
2460     0000110c  __pruabi_ged                 
2470     000011bc  __pruabi_ltd                 
2480     0000126c  main                         
2490     0000131c  __TI_frcmpyf                 
2500     000013c4  __pruabi_fixdu               
2510     00001464  __pruabi_nround              
2520     00001500  __pruabi_eqd                 
2530     00001590  __TI_renormd                 
2540     0000161c  __pruabi_fixdi               
2550     000016a8  __pruabi_fltid               
2560     0000172c  __pruabi_cvtfd               
2570     000017a4  __pruabi_fltuf               
2580     000017f4  __pruabi_asri                
2590     00001820  __pruabi_subd                
2600     0000184c  __pruabi_mpyi                
2610     00001870  __pruabi_negd                
2620     00001890  __pruabi_trunc               
2630     000018b0  abort                        
2640     000018b8  C$$EXIT                      
2651     00000000  _stack                       
2661     00000100  __TI_STACK_END               
2671     00000140  pru_remoteproc_ResourceTable 
2682     00026000  CT_CFG                       
269abs   00000000  __PRU_CREG_BASE_PRU_DMEM_0_1 
270abs   00000000  __PRU_CREG_PRU_INTC          
271abs   00000001  __PRU_CREG_DMTIMER2          
272abs   00000002  __PRU_CREG_I2C1              
273abs   00000003  __PRU_CREG_PRU_ECAP          
274abs   00000004  __PRU_CREG_PRU_CFG           
275abs   00000005  __PRU_CREG_MMCHS0            
276abs   00000006  __PRU_CREG_MCSPI0            
277abs   00000007  __PRU_CREG_PRU_UART          
278abs   00000008  __PRU_CREG_MCASP0_DMA        
279abs   00000009  __PRU_CREG_GEMAC             
280abs   0000000a  __PRU_CREG_RSVD10            
281abs   0000000b  __PRU_CREG_UART1             
282abs   0000000c  __PRU_CREG_UART2             
283abs   0000000d  __PRU_CREG_RSVD13            
284abs   0000000e  __PRU_CREG_DCAN0             
285abs   0000000f  __PRU_CREG_DCAN1             
286abs   00000010  __PRU_CREG_MCSPI1            
287abs   00000011  __PRU_CREG_I2C2              
288abs   00000012  __PRU_CREG_PWMSS0            
289abs   00000013  __PRU_CREG_PWMSS1            
290abs   00000014  __PRU_CREG_PWMSS2            
291abs   00000015  __PRU_CREG_RSVD21            
292abs   00000016  __PRU_CREG_MBX0              
293abs   00000017  __PRU_CREG_SPINLOCK          
294abs   00000018  __PRU_CREG_PRU_DMEM_0_1      
295abs   00000019  __PRU_CREG_PRU_DMEM_1_0      
296abs   0000001a  __PRU_CREG_PRU_IEP           
297abs   0000001b  __PRU_CREG_RSVD27            
298abs   0000001c  __PRU_CREG_PRU_SHAREDMEM     
299abs   0000001d  __PRU_CREG_TPCC              
300abs   0000001e  __PRU_CREG_L3OCMC            
301abs   0000001f  __PRU_CREG_DDR               
302abs   00000100  __TI_STACK_SIZE              
303abs   00002000  __PRU_CREG_BASE_PRU_DMEM_1_0 
304abs   00010000  __PRU_CREG_BASE_PRU_SHAREDMEM
305abs   00020000  __PRU_CREG_BASE_PRU_INTC     
306abs   00026000  __PRU_CREG_BASE_PRU_CFG      
307abs   00028000  __PRU_CREG_BASE_PRU_UART     
308abs   0002e000  __PRU_CREG_BASE_PRU_IEP      
309abs   00030000  __PRU_CREG_BASE_PRU_ECAP     
310abs   00032000  __PRU_CREG_BASE_RSVD27       
311abs   00032400  __PRU_CREG_BASE_RSVD21       
312abs   40000000  __PRU_CREG_BASE_L3OCMC       
313abs   46000000  __PRU_CREG_BASE_MCASP0_DMA   
314abs   48022000  __PRU_CREG_BASE_UART1        
315abs   48024000  __PRU_CREG_BASE_UART2        
316abs   4802a000  __PRU_CREG_BASE_I2C1         
317abs   48030000  __PRU_CREG_BASE_MCSPI0       
318abs   48040000  __PRU_CREG_BASE_DMTIMER2     
319abs   48060000  __PRU_CREG_BASE_MMCHS0       
320abs   480c8000  __PRU_CREG_BASE_MBX0         
321abs   480ca000  __PRU_CREG_BASE_SPINLOCK     
322abs   4819c000  __PRU_CREG_BASE_I2C2         
323abs   481a0000  __PRU_CREG_BASE_MCSPI1       
324abs   481cc000  __PRU_CREG_BASE_DCAN0        
325abs   481d0000  __PRU_CREG_BASE_DCAN1        
326abs   48300000  __PRU_CREG_BASE_PWMSS0       
327abs   48302000  __PRU_CREG_BASE_PWMSS1       
328abs   48304000  __PRU_CREG_BASE_PWMSS2       
329abs   48310000  __PRU_CREG_BASE_RSVD13       
330abs   48318000  __PRU_CREG_BASE_RSVD10       
331abs   49000000  __PRU_CREG_BASE_TPCC         
332abs   4a100000  __PRU_CREG_BASE_GEMAC        
333abs   80000000  __PRU_CREG_BASE_DDR          
334abs   ffffffff  __binit__                    
335abs   ffffffff  __c_args__                   
336abs   ffffffff  binit                        
337
338[100 symbols]

lines=1..22

Notice line 15 shows 0x18c0 bytes are being used for instructions. That’s 6336 in decimal.

Now compile for the sawtooth and you see only 444 byes are used. Floating-point requires over 5K more bytes. Use with care. If you are short on instruction space, you can move the table generation to the ARM and just copy the table to the PRU.

WS2812 (NeoPixel) driver#

Problem#

You have an Adafruit NeoPixel LED string or Adafruit NeoPixel LED matrix and want to light it up.

Solution#

NeoPixel is Adafruit’s name for the WS2812 Intelligent control LED. Each NeoPixel contains a Red, Green and Blue LED with a PWM controller that can dim each one individually making a rainbow of colors possible. The NeoPixel is driven by a single serial line. The timing on the line is very sensesitive, which make the PRU a perfect candidate for driving it.

Wire the input to P9_29 and power to 3.3V and ground to ground as shown in NeoPixel Wiring.

NeoPixel Wiring

Fig. 754 NeoPixel Wiring#

Test your wiring with the simple code in neo1.pru0.c - Code to turn all NeoPixels’s white which to turns all pixels white.

Listing 120 neo1.pru0.c - Code to turn all NeoPixels’s white#
 1// Control a ws2812 (NeoPixel) display, All on or all off
 2#include <stdint.h>
 3#include <pru_cfg.h>
 4#include "resource_table_empty.h"
 5#include "prugpio.h"
 6
 7#define STR_LEN 24
 8#define	oneCyclesOn		700/5	// Stay on 700ns
 9#define oneCyclesOff	800/5
10#define zeroCyclesOn	350/5
11#define zeroCyclesOff	600/5
12#define resetCycles		60000/5	// Must be at least 50u, use 60u
13#define gpio P9_29				// output pin
14
15#define ONE
16
17volatile register uint32_t __R30;
18volatile register uint32_t __R31;
19
20void main(void)
21{
22	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
23	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
24
25	uint32_t i;
26	for(i=0; i<STR_LEN*3*8; i++) {
27#ifdef ONE
28		__R30 |= gpio;		// Set the GPIO pin to 1
29		__delay_cycles(oneCyclesOn-1);
30		__R30 &= ~gpio;		// Clear the GPIO pin
31		__delay_cycles(oneCyclesOff-2);
32#else
33		__R30 |= gpio;		// Set the GPIO pin to 1
34		__delay_cycles(zeroCyclesOn-1);
35		__R30 &= ~gpio;		// Clear the GPIO pin
36		__delay_cycles(zeroCyclesOff-2);
37#endif
38	}
39	// Send Reset
40	__R30 &= ~gpio;	// Clear the GPIO pin
41	__delay_cycles(resetCycles);
42	
43	__halt();
44}

neo1.pru0.c

Discussion#

NeoPixel bit sequence (taken from WS2812 Data Sheet) shows the following waveforms are used to send a bit of data.

NeoPixel bit sequence

Fig. 755 NeoPixel bit sequence#

Table 156 Where the times are:#

Label

Time in ns

T0H

350

T0L

800

T1H

700

T1L

600

Treset

>50,000

The code in neo1.pru0.c - Code to turn all NeoPixels’s white define these times in lines 7-10. The /5 is because each instruction take 5ns. Lines 27-30 then set the output to 1 for the desired time and then to 0 and keeps repeating it for the entire string length. NeoPixel zero timing shows the waveform for sending a 0 value. Note the times are spot on.

NeoPixel zero timing

Fig. 756 NeoPixel zero timing#

Each NeoPixel listens for a RGB value. Once a value has arrived all other values that follow are passed on to the next NeoPixel which does the same thing. That way you can individually control all of the NeoPixels.

Lines 38-40 send out a reset pulse. If a NeoPixel sees a reset pulse it will grab the next value for itself and start over again.

Setting NeoPixels to Different Colors#

Problem#

I want to set the LEDs to different colors.

Solution#

Wire your NeoPixels as shown in NeoPixel Wiring then run the code in neo2.pru0.c - Code to turn on green, red, blue.

Listing 121 neo2.pru0.c - Code to turn on green, red, blue#
 1// Control a ws2812 (neo pixel) display, green, red, blue, green, ...
 2#include <stdint.h>
 3#include <pru_cfg.h>
 4#include "resource_table_empty.h"
 5#include "prugpio.h"
 6
 7#define STR_LEN 3
 8#define	oneCyclesOn		700/5	// Stay on 700ns
 9#define oneCyclesOff	800/5
10#define zeroCyclesOn	350/5
11#define zeroCyclesOff	600/5
12#define resetCycles		60000/5	// Must be at least 50u, use 60u
13#define gpio P9_29				// output pin
14
15volatile register uint32_t __R30;
16volatile register uint32_t __R31;
17
18void main(void)
19{
20	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
21	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
22	
23	uint32_t color[STR_LEN] = {0x0f0000, 0x000f00, 0x0000f};	// green, red, blue
24	int i, j;
25
26	for(j=0; j<STR_LEN; j++) {
27		for(i=23; i>=0; i--) {
28			if(color[j] & (0x1<<i)) {
29				__R30 |= gpio;		// Set the GPIO pin to 1
30				__delay_cycles(oneCyclesOn-1);
31				__R30 &= ~gpio;		// Clear the GPIO pin
32				__delay_cycles(oneCyclesOff-2);
33			} else {
34				__R30 |= gpio;		// Set the GPIO pin to 1
35				__delay_cycles(zeroCyclesOn-1);
36				__R30 &= ~gpio;		// Clear the GPIO pin
37				__delay_cycles(zeroCyclesOff-2);
38			}
39		}
40	}
41	// Send Reset
42	__R30 &= ~gpio;	// Clear the GPIO pin
43	__delay_cycles(resetCycles);
44	
45	__halt();
46}

neo2.pru0.c

This will make the first LED green, the second red and the third blue.

Discussion#

NeoPixel data sequence shows the sequence of bits used to control the green, red and blue values.

NeoPixel data sequence

Fig. 757 NeoPixel data sequence#

Note

The usual order for colors is RGB (red, green, blue), but the NeoPixels use GRB (green, red, blue).

Line-by-line for neo2.pru0.c is the line-by-line for neo2.pru0.c.

Table 157 Line-by-line for neo2.pru0.c#

Line 23

Explanation Define the string of colors to be output. Here the ordering of the bits is the same as NeoPixel data sequence, GRB.

26

Loop for each color to output.

27

Loop for each bit in an GRB color.

28

Get the j^th^ color and mask off all but the i^th^ bit. (0x1:ref:`i) takes the value 0x1 and shifts it left i bits. When anded (&) with color[j] it will zero out all but the i^th^ bit. If the result of the operation is 1, the if is done, otherwise the else is done.

29-32

Send a 1.

34-37

Send a 0.

42-43

Send a reset pulse once all the colors have been sent.

Note

This will only change the first STR_LEN LEDs. The LEDs that follow will not be changed.

Controlling Arbitrary LEDs#

Problem#

I want to change the 10^th^ LED and not have to change the others.

Solution#

You need to keep an array of colors for the whole string in the PRU. Change the color of any pixels you want in the array and then send out the whole string to the LEDs. neo3.pru0.c - Code to animate a red pixel running around a ring of blue shows an example animates a red pixel running around a ring of blue background. Neo3 Video shows the code in action.

Listing 122 neo3.pru0.c - Code to animate a red pixel running around a ring of blue#
 1// Control a ws2812 (neo pixel) display, green, red, blue, green, ...
 2#include <stdint.h>
 3#include <pru_cfg.h>
 4#include "resource_table_empty.h"
 5#include "prugpio.h"
 6
 7#define STR_LEN 24
 8#define	oneCyclesOn		700/5	// Stay on 700ns
 9#define oneCyclesOff	800/5
10#define zeroCyclesOn	350/5
11#define zeroCyclesOff	600/5
12#define resetCycles		60000/5	// Must be at least 50u, use 60u
13#define gpio P9_29				// output pin
14
15#define SPEED 20000000/5		// Time to wait between updates
16
17volatile register uint32_t __R30;
18volatile register uint32_t __R31;
19
20void main(void)
21{
22	uint32_t background = 0x00000f;
23	uint32_t foreground = 0x000f00;
24
25	/* Clear SYSCFG[STANDBY_INIT] to enable OCP master port */
26	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
27	
28	uint32_t color[STR_LEN];	// green, red, blue
29	int i, j;
30	int k, oldk = 0;;
31	// Set everything to background
32	for(i=0; i<STR_LEN; i++) {
33		color[i] = background;
34	}
35	
36	while(1) {
37		// Move forward one position
38		for(k=0; k<STR_LEN; k++) {
39			color[oldk] = background;
40			color[k]    = foreground;
41			oldk=k;
42
43			// Output the string
44			for(j=0; j<STR_LEN; j++) {
45				for(i=23; i>=0; i--) {
46					if(color[j] & (0x1<<i)) {
47						__R30 |= gpio;		// Set the GPIO pin to 1
48						__delay_cycles(oneCyclesOn-1);
49						__R30 &= ~gpio;		// Clear the GPIO pin
50						__delay_cycles(oneCyclesOff-2);
51					} else {
52						__R30 |= gpio;		// Set the GPIO pin to 1
53						__delay_cycles(zeroCyclesOn-1);
54						__R30 &= ~gpio;		// Clear the GPIO pin
55						__delay_cycles(zeroCyclesOff-2);
56					}
57				}
58			}
59			// Send Reset
60			__R30 &= ~gpio;	// Clear the GPIO pin
61			__delay_cycles(resetCycles);
62
63			// Wait
64			__delay_cycles(SPEED);
65		}
66	}
67}

neo3.pru0.c

Neo3 Video#

neo3.pru0.c - Simple animation

Discussion#

Table 158 Here’s the highlights.#

Line

Explanation

32,33

Initiallize the array of colors.

38-41

Update the array.

44-58

Send the array to the LEDs.

60-61

Send a reset.

64

Wait a bit.

Controlling NeoPixels Through a Kernel Driver#

Problem#

You want to control your NeoPixels through a kernel driver so you can control it through a /dev interface.

Solution#

The rpmsg_pru driver provides a way to pass data between the ARM processor and the PRUs. It’s already included on current images. neo4.pru0.c - Code to talk to the PRU via rpmsg_pru shows an example.

Listing 123 neo4.pru0.c - Code to talk to the PRU via rpmsg_pru#
  1// Use rpmsg to control the NeoPixels via /dev/rpmsg_pru30
  2#include <stdint.h>
  3#include <stdio.h>
  4#include <stdlib.h>			// atoi
  5#include <string.h>
  6#include <pru_cfg.h>
  7#include <pru_intc.h>
  8#include <rsc_types.h>
  9#include <pru_rpmsg.h>
 10#include "resource_table_0.h"
 11#include "prugpio.h"
 12
 13volatile register uint32_t __R30;
 14volatile register uint32_t __R31;
 15
 16/* Host-0 Interrupt sets bit 30 in register R31 */
 17#define HOST_INT			((uint32_t) 1 << 30)	
 18
 19/* The PRU-ICSS system events used for RPMsg are defined in the Linux device tree
 20 * PRU0 uses system event 16 (To ARM) and 17 (From ARM)
 21 * PRU1 uses system event 18 (To ARM) and 19 (From ARM)
 22 */
 23#define TO_ARM_HOST			16	
 24#define FROM_ARM_HOST		17
 25
 26/*
 27* Using the name 'rpmsg-pru' will probe the rpmsg_pru driver found
 28* at linux-x.y.z/drivers/rpmsg/rpmsg_pru.c
 29*/
 30#define CHAN_NAME			"rpmsg-pru"
 31#define CHAN_DESC			"Channel 30"
 32#define CHAN_PORT			30
 33
 34/*
 35 * Used to make sure the Linux drivers are ready for RPMsg communication
 36 * Found at linux-x.y.z/include/uapi/linux/virtio_config.h
 37 */
 38#define VIRTIO_CONFIG_S_DRIVER_OK	4
 39
 40char payload[RPMSG_BUF_SIZE];
 41
 42#define STR_LEN 24
 43#define	oneCyclesOn		700/5	// Stay on for 700ns
 44#define oneCyclesOff	600/5
 45#define zeroCyclesOn	350/5
 46#define zeroCyclesOff	800/5
 47#define resetCycles		51000/5	// Must be at least 50u, use 51u
 48#define out P9_29				// Bit number to output on
 49
 50#define SPEED 20000000/5		// Time to wait between updates
 51
 52uint32_t color[STR_LEN];	// green, red, blue
 53
 54/*
 55 * main.c
 56 */
 57void main(void)
 58{
 59	struct pru_rpmsg_transport transport;
 60	uint16_t src, dst, len;
 61	volatile uint8_t *status;
 62	
 63	uint8_t r, g, b;
 64	int i, j;
 65	// Set everything to background
 66	for(i=0; i<STR_LEN; i++) {
 67		color[i] = 0x010000;
 68	}
 69
 70	/* Allow OCP master port access by the PRU so the PRU can read external memories */
 71	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
 72
 73	/* Clear the status of the PRU-ICSS system event that the ARM will use to 'kick' us */
 74#ifdef CHIP_IS_am57xx
 75	CT_INTC.SICR_bit.STATUS_CLR_INDEX = FROM_ARM_HOST;
 76#else
 77	CT_INTC.SICR_bit.STS_CLR_IDX = FROM_ARM_HOST;
 78#endif
 79
 80	/* Make sure the Linux drivers are ready for RPMsg communication */
 81	status = &resourceTable.rpmsg_vdev.status;
 82	while (!(*status & VIRTIO_CONFIG_S_DRIVER_OK));
 83
 84	/* Initialize the RPMsg transport structure */
 85	pru_rpmsg_init(&transport, &resourceTable.rpmsg_vring0, &resourceTable.rpmsg_vring1, TO_ARM_HOST, FROM_ARM_HOST);
 86
 87	/* Create the RPMsg channel between the PRU and ARM user space using the transport structure. */
 88	while (pru_rpmsg_channel(RPMSG_NS_CREATE, &transport, CHAN_NAME, CHAN_DESC, CHAN_PORT) != PRU_RPMSG_SUCCESS);
 89	while (1) {
 90		/* Check bit 30 of register R31 to see if the ARM has kicked us */
 91		if (__R31 & HOST_INT) {
 92			/* Clear the event status */
 93#ifdef CHIP_IS_am57xx
 94			CT_INTC.SICR_bit.STATUS_CLR_INDEX = FROM_ARM_HOST;
 95#else
 96			CT_INTC.SICR_bit.STS_CLR_IDX = FROM_ARM_HOST;
 97#endif
 98			/* Receive all available messages, multiple messages can be sent per kick */
 99			while (pru_rpmsg_receive(&transport, &src, &dst, payload, &len) == PRU_RPMSG_SUCCESS) {
100			    char *ret;	// rest of payload after front character is removed
101			    int index;	// index of LED to control
102			    // Input format is:  index red green blue
103			    index = atoi(payload);	
104			    // Update the array, but don't write it out.
105			    if((index >=0) & (index < STR_LEN)) {
106			    	ret = strchr(payload, ' ');	// Skip over index
107				    r = strtol(&ret[1], NULL, 0);
108				    ret = strchr(&ret[1], ' ');	// Skip over r, etc.
109				    g = strtol(&ret[1], NULL, 0);
110				    ret = strchr(&ret[1], ' ');
111				    b = strtol(&ret[1], NULL, 0);
112
113				    color[index] = (g<<16)|(r<<8)|b;	// String wants GRB
114			    }
115			    // When index is -1, send the array to the LED string
116			    if(index == -1) {
117				    // Output the string
118					for(j=0; j<STR_LEN; j++) {
119						// Cycle through each bit
120						for(i=23; i>=0; i--) {
121							if(color[j] & (0x1<<i)) {
122								__R30 |= out;		// Set the GPIO pin to 1
123								__delay_cycles(oneCyclesOn-1);
124								__R30 &= ~out;	// Clear the GPIO pin
125								__delay_cycles(oneCyclesOff-14);
126							} else {
127								__R30 |= out;		// Set the GPIO pin to 1
128								__delay_cycles(zeroCyclesOn-1);
129								__R30 &= ~(out);	// Clear the GPIO pin
130								__delay_cycles(zeroCyclesOff-14);
131							}
132						}
133					}
134					// Send Reset
135					__R30 &= ~out;	// Clear the GPIO pin
136					__delay_cycles(resetCycles);
137		
138					// Wait
139					__delay_cycles(SPEED);
140			    }
141
142			}
143		}
144	}
145}

neo4.pru0.c

Run the code as usual.

bone$ make TARGET=neo4.pru0
/opt/source/pru-cookbook-code/common/Makefile:29: MODEL=TI_AM335x_BeagleBone_Black,TARGET=neo4.pru0
-    Stopping PRU 0
-     copying firmware file /tmp/vsx-examples/neo4.pru0.out to /lib/firmware/am335x-pru0-fw
write_init_pins.sh
-    Starting PRU 0
MODEL   = TI_AM335x_BeagleBone_Black
PROC    = pru
PRUN    = 0
PRU_DIR = /sys/class/remoteproc/remoteproc1

bone$ echo 0 0xff 0 127 > /dev/rpmsg_pru30
bone$ echo -1 > /dev/rpmsg_pru30

Todo

get this working on the 5.10 kernel

/dev/rpmsg_pru30 is a device driver that lets the ARM talk to the PRU. The first echo says to set the 0^th^ LED to RGB value 0xff 0 127. (Note: you can mix hex and decimal.) The second echo tells the driver to send the data to the LEDs. Your 0^th^ LED should now be lit.

Discussion#

There’s a lot here. I’ll just hit some of the highlights in Line-by-line for neo4.pru0.c.

Table 159 Line-by-line for neo4.pru0.c#

Line

Explanation

30

The CHAN_NAME of rpmsg-pru matches that prmsg_pru driver that is is already installed. This connects this PRU to the driver.

32

The CHAN_PORT tells it to use port 30. That’s why we use /dev/rpmsg_pru30

40

payload[] is the buffer that receives the data from the ARM.

42-48

Same as the previous NeoPixel examples.

52

color[] is the state to be sent to the LEDs.

66-68

color[] is initialized.

70-85

Here are a number of details needed to set up the channel between the PRU and the ARM.

88

Here we wait until the ARM sends us some numbers.

99

Receive all the data from the ARM, store it in payload[].

101-111

The data sent is: index red green blue. Pull off the index. If it’s in the right range, pull off the red, green and blue values.

113

The NeoPixels want the data in GRB order. Shift and OR everything together.

116-133

If the index = -1, send the contents of color to the LEDs. This code is same as before.

You can now use programs running on the ARM to send colors to the PRU.

neo-rainbow.py - A python program using /dev/rpmsg_pru30 shows an example.

Listing 124 neo-rainbow.py - A python program using /dev/rpmsg_pru30#
 1#!/usr/bin/python3
 2from time import sleep
 3import math
 4
 5len = 24
 6amp = 12
 7f = 25
 8shift = 3
 9phase = 0
10
11# Open a file
12fo = open("/dev/rpmsg_pru30", "wb", 0)
13
14while True:
15    for i in range(0, len):
16        r = (amp * (math.sin(2*math.pi*f*(i-phase-0*shift)/len) + 1)) + 1;
17        g = (amp * (math.sin(2*math.pi*f*(i-phase-1*shift)/len) + 1)) + 1;
18        b = (amp * (math.sin(2*math.pi*f*(i-phase-2*shift)/len) + 1)) + 1;
19        fo.write(b"%d %d %d %d\n" % (i, r, g, b))
20        # print("0 0 127 %d" % (i))
21
22    fo.write(b"-1 0 0 0\n");
23    phase = phase + 1
24    sleep(0.05)
25
26# Close opened file
27fo.close()

neo-rainbow.py

Line 19 writes the data to the PRU. Be sure to have a newline, or space after the last number, or you numbers will get blurred together.

Switching from pru0 to pru1 with rpmsg_pru#

There are three things you need to change when switching from pru0 to pru1 when using rpmsg_pru.

  1. The include on line 10 is switched to #include "resource_table_1.h" (0 is switched to a 1)

  2. Line 17 is switched to #define HOST_INT ((uint32_t) 1 << 31) (30 is switched to 31.)

  3. Lines 23 and 24 are switched to:

#define TO_ARM_HOST                   18
#define FROM_ARM_HOST         19

These changes switch to the proper channel numbers to use pru1 instead of pru0.

RGB LED Matrix - No Integrated Drivers#

Problem#

You have a RGB LED matrix (RGB LED Matrix – No Integrated Drivers (Falcon Christmas)) and want to know at a low level how the PRU works.

Solution#

Here is the datasheet, but the best description I’ve found for the RGB Matrix is from Adafruit. I’ve reproduced it here, with adjustments for the 64x32 matrix we are using.

information

There’s zero documentation out there on how these matrices work, and no public datasheets or spec sheets so we are going to try to document how they work.

First thing to notice is that there are 2048 RGB LEDs in a 64x32 matrix. Like pretty much every matrix out there, you can’t drive all 2048 at once. One reason is that would require a lot of current, another reason is that it would be really expensive to have so many pins. Instead, the matrix is divided into 16 interleaved sections/strips. The first section is the 1^st^ ‘line’ and the 17^th^ ‘line’ (64 x 2 RGB LEDs = 128 RGB LEDs), the second is the 2^nd^ and 18^th^ line, etc until the last section which is the 16^th^ and 32^nd^ line. You might be asking, why are the lines paired this way? wouldn’t it be nicer to have the first section be the 1^st^ and 2^nd^ line, then 3^rd^ and 4^th^, until the 15^th^ and 16^th^? The reason they do it this way is so that the lines are interleaved and look better when refreshed, otherwise we’d see the stripes more clearly.

So, on the PCB is 24 LED driver chips. These are like 74HC595s but they have 16 outputs and they are constant current. 16 outputs * 24 chips = 384 LEDs that can be controlled at once, and 128 * 3 (R G and B) = 384. So now the design comes together: You have 384 outputs that can control one line at a time, with each of 384 R, G and B LEDs either on or off. The controller (say an FPGA or microcontroller) selects which section to currently draw (using LA, LB, LC and LD address pins - 4 bits can have 16 values). Once the address is set, the controller clocks out 384 bits of data (48 bytes) and latches it. Then it increments the address and clocks out another 384 bits, etc until it gets to address #15, then it sets the address back to #0

https://cdn-learn.adafruit.com/downloads/pdf/32x16-32x32-rgb-led-matrix.pdf

That gives a good overview, but there are a few details missing. rgb_python.py - Python code for driving RGB LED matrix is a functioning python program that gives a nice high-level view of how to drive the display.

Todo

Test this

Listing 125 rgb_python.py - Python code for driving RGB LED matrix#
 1#!/usr/bin/env python3
 2import Adafruit_BBIO.GPIO as GPIO
 3
 4# Define which functions are connect to which pins
 5OE="P1_29"      # Output Enable, active low
 6LAT="P1_36"     # Latch, toggle after clocking in a row of pixels
 7CLK="P1_33"     # Clock, toggle after each pixel
 8
 9# Input data pins 
10R1="P2_10"  # R1, G1, B1 are for the top rows (1-16) of pixels
11G1="P2_8"
12B1="P2_6"
13
14R2="P2_4"   # R2, G2, B2 are for the bottom rows (17-32) of pixels
15G2="P2_2"
16B2="P2_1"
17
18LA="P2_32"  # Address lines for which row (1-16 or 17-32) to update
19LB="P2_30"
20LC="P1_31"
21LD="P2_34"
22
23# Set everything as output ports
24GPIO.setup(OE,  GPIO.OUT)
25GPIO.setup(LAT, GPIO.OUT)
26GPIO.setup(CLK, GPIO.OUT)
27
28GPIO.setup(R1, GPIO.OUT)
29GPIO.setup(G1, GPIO.OUT)
30GPIO.setup(B1, GPIO.OUT)
31GPIO.setup(R2, GPIO.OUT)
32GPIO.setup(G2, GPIO.OUT)
33GPIO.setup(B2, GPIO.OUT)
34
35GPIO.setup(LA, GPIO.OUT)
36GPIO.setup(LB, GPIO.OUT)
37GPIO.setup(LC, GPIO.OUT)
38GPIO.setup(LD, GPIO.OUT)
39
40GPIO.output(OE,  0)     # Enable the display
41GPIO.output(LAT, 0)     # Set latch to low
42
43while True:
44    for bank in range(64):
45        GPIO.output(LA, bank>>0&0x1)    # Select rows
46        GPIO.output(LB, bank>>1&0x1)
47        GPIO.output(LC, bank>>2&0x1)
48        GPIO.output(LD, bank>>3&0x1)
49        
50        # Shift the colors out.  Here we only have four different 
51        # colors to keep things simple.
52        for i in range(16):
53            GPIO.output(R1,  1)     # Top row, white
54            GPIO.output(G1,  1)
55            GPIO.output(B1,  1)
56            
57            GPIO.output(R2,  1)     # Bottom row, red
58            GPIO.output(G2,  0)
59            GPIO.output(B2,  0)
60
61            GPIO.output(CLK, 0)     # Toggle clock
62            GPIO.output(CLK, 1)
63    
64            GPIO.output(R1,  0)     # Top row, black
65            GPIO.output(G1,  0)
66            GPIO.output(B1,  0)
67
68            GPIO.output(R2,  0)     # Bottom row, green
69            GPIO.output(G2,  1)
70            GPIO.output(B2,  0)
71    
72            GPIO.output(CLK, 0)     # Toggle clock
73            GPIO.output(CLK, 1)
74    
75        GPIO.output(OE,  1)     # Disable display while updating
76        GPIO.output(LAT, 1)     # Toggle latch
77        GPIO.output(LAT, 0)
78        GPIO.output(OE,  0)     # Enable display

rgb_python.py

Be sure to run the rgb_python_setup.sh script before running the python code.

Listing 126 rgb_python_setup.sh#
 1#!/bin/bash
 2# Setup for 64x32 RGB Matrix
 3export TARGET=rgb1.pru0
 4echo TARGET=$TARGET
 5
 6# Configure the PRU pins based on which Beagle is running
 7machine=$(awk '{print $NF}' /proc/device-tree/model)
 8echo -n $machine
 9if [ $machine = "Black" ]; then
10    echo " Found"
11    pins=""
12elif [ $machine = "Blue" ]; then
13    echo " Found"
14    pins=""
15elif [ $machine = "PocketBeagle" ]; then
16    echo " Found"
17    prupins="P2_32 P1_31 P1_33 P1_29 P2_30 P2_34 P1_36"
18    gpiopins="P2_10 P2_06 P2_04 P2_01 P2_08 P2_02"
19    # Uncomment for J2
20    # gpiopins="$gpiopins P2_27 P2_25 P2_05 P2_24 P2_22 P2_18"
21else
22    echo " Not Found"
23    pins=""
24fi
25
26for pin in $prupins
27do
28    echo $pin
29    # config-pin $pin pruout
30    config-pin $pin gpio
31    config-pin $pin out
32    config-pin -q $pin
33done
34
35for pin in $gpiopins
36do
37    echo $pin
38    config-pin $pin gpio
39    config-pin $pin out
40    config-pin -q $pin
41done

rgb_python_setup.sh

Make sure line 29 is commented out and line 30 is uncommented. Later we’ll configure for _pruout_, but for now the python code doesn’t use the PRU outs.

# config-pin $pin pruout
config-pin $pin out

Your display should look like Display running rgb_python.py.

Display running rgb_python.py

Fig. 758 Display running rgb_python.py#

So why do only two lines appear at a time? That’s how the display works. Currently lines 6 and 22 are showing, then a moment later 7 and 23 show, etc. The display can only display two lines at a time, so it cycles through all the lines. Unfortunately, python is too slow to make the display appear all at once. Here’s where the PRU comes in.

:ref:blocks_rgb1 is the PRU code to drive the RGB LED matrix. Be sure to run bone$ source rgb_setup.sh first.

Listing 127 PRU code for driving the RGB LED matrix#
 1// This code drives the RGB LED Matrix on the 1st Connector
 2#include <stdint.h>
 3#include <pru_cfg.h>
 4#include "resource_table_empty.h"
 5#include "prugpio.h"
 6#include "rgb_pocket.h"
 7
 8#define DELAY 10	// Number of cycles (5ns each) to wait after a write
 9
10volatile register uint32_t __R30;
11volatile register uint32_t __R31;
12
13void main(void)
14{
15	// Set up the pointers to each of the GPIO ports 
16	uint32_t *gpio[] = {
17			(uint32_t *) GPIO0, 
18			(uint32_t *) GPIO1, 
19			(uint32_t *) GPIO2, 
20			(uint32_t *) GPIO3
21		};
22	
23	uint32_t i, row;
24
25	while(1) {
26	    for(row=0; row<16; row++) {
27	    	// Set the row address
28			// Here we take advantage of the select bits (LA,LB,LC,LD)
29			// being sequential in the R30 register (bits 2,3,4,5)
30			// We shift row over so it lines up with the select bits
31			// Oring (|=) with R30 sets bits to 1 and
32			// Anding (&=) clears bits to 0, the 0xffc mask makes sure the
33			// other bits aren't changed.
34	        __R30 |=  row<<pru_sel0;
35	        __R30 &= (row<<pru_sel0)|0xffc3;
36
37    	    for(i=0; i<64; i++) {
38    	    	// Top row white
39    	    	// Combining these to one write works because they are all in 
40    	    	// the same gpio port
41    	      	gpio[r11_gpio][GPIO_SETDATAOUT] = r11_pin | g11_pin | b11_pin;
42    	    	__delay_cycles(DELAY);;
43    	      	
44    	      	// Bottom row red
45    	      	gpio[r12_gpio][GPIO_SETDATAOUT]   = r12_pin;
46    	    	__delay_cycles(DELAY);
47    	      	gpio[r12_gpio][GPIO_CLEARDATAOUT] = g12_pin | b12_pin;
48    	    	__delay_cycles(DELAY);
49    	      	
50                __R30 |=  pru_clock;	// Toggle clock
51    	    	__delay_cycles(DELAY);
52        		__R30 &= ~pru_clock;
53    	    	__delay_cycles(DELAY);
54    	    	
55    	    	// Top row black
56    	    	gpio[r11_gpio][GPIO_CLEARDATAOUT] = r11_pin | g11_pin | b11_pin;
57    	    	__delay_cycles(DELAY);
58    	      	
59    	      	// Bottom row green
60    	    	gpio[r12_gpio][GPIO_CLEARDATAOUT] = r12_pin | b12_pin;
61    	    	__delay_cycles(DELAY);
62    	      	gpio[r12_gpio][GPIO_SETDATAOUT]   = g12_pin;
63    	    	__delay_cycles(DELAY);
64    	      	
65                __R30 |=  pru_clock;	// Toggle clock
66    	    	__delay_cycles(DELAY);
67        		__R30 &= ~pru_clock;
68    	    	__delay_cycles(DELAY);
69    	    }
70    	    __R30 |=  pru_oe;        // Disable display
71    	   	__delay_cycles(DELAY);
72    	    __R30 |=  pru_latch;     // Toggle latch
73    	   	__delay_cycles(DELAY);
74    	    __R30 &= ~pru_latch;
75    	   	__delay_cycles(DELAY);
76    	    __R30 &= ~pru_oe;        // Enable display
77    	    __delay_cycles(DELAY);
78	    }
79	}
80}

rgb1.pru0.c

The results are shown in Display running rgb1.c on PRU 0.

Display running rgb1.pru0.c on PRU 0

Fig. 759 Display running rgb1.c on PRU 0#

The PRU is fast enough to quickly write to the display so that it appears as if all the LEDs are on at once.

Discussion#

There are a lot of details needed to make this simple display work. Let’s go over some of them.

First, the connector looks like RGB Matrix J1 connector.

RGB Matrix J1 connector, 200

Fig. 760 RGB Matrix J1 connector#

Notice the labels on the connect match the labels in the code. PocketScroller pin table shows how the pins on the display are mapped to the pins on the PocketBeagle.

Todo

Make a mapping table for the Black

FalconChristmas/fpp

Table 160 PocketScroller pin table#

J1 Connector Pin

Pocket Headers

gpio port and bit number

Linux gpio number

PRU R30 bit number

R1

P2_10

1-20

52

B1

P2_06

1-25

57

R2

P2_04

1-26

58

B2

P2_01

1-18

50

LA

P2_32

3-16

112

PRU0.2

LC

P1_31

3-18

114

PRU0.4

CLK

P1_33

3-15

111

PRU0.1

OE

P1_29

3-21

117

PRU0.7

G1

P2_08

1-28

60

G2

P2_02

1-27

59

LB

P2_30

3-17

113

PRU0.3

LD

P2_34

3-19

115

PRU0.5

LAT

P1_36

3-14

110

PRU0.0

The J1 mapping to gpio port and bit number comes from FalconChristmas/fpp. The gpio port and bit number mapping to Pocket Headers comes from https://docs.google.com/spreadsheets/d/1FRGvYOyW1RiNSEVprvstfJAVeapnASgDXHtxeDOjgqw/edit#gid=0.

Oscilloscope display of CLK, OE, LAT and R1 shows four of the signal waveforms driving the RGB LED matrix.

Oscilloscope display of CLK, OE, LAT and R1

Fig. 761 Oscilloscope display of CLK, OE, LAT and R1#

The top waveform is the CLK, the next is OE, followed by LAT and finally R1. The OE (output enable) is active low, so most of the time the display is visible. The sequence is:

  • Put data on the R1, G1, B1, R2, G2 and B2 lines

  • Toggle the clock.

  • Repeat the first two steps as one row of data is transferred. There are 384 LEDs (2 rows of 32 RGB LEDs times 3 LED per RGB), but we are clocking in six bits (R1, G1, etc.) at a time, so 384/6=64 values need to be clocked in.

  • Once all the values are in, disable the display (OE goes high)

  • Then toggle the latch (LAT) to latch the new data.

  • Turn the display back on.

  • Increment the address lines (LA, LB, LC and LD) to point to the next rows.

  • Keep repeating the above to keep the display lit.

Using the PRU we are able to run the clock a about 2.9 MKHz. FPP waveforms shows the optimized assembler code used by FPP clocks in at some 6.3 MHz. So the compiler is doing a pretty good job, but you can run some two times faster if you want to use assembly code. In fairness to FPP, it’s having to pull it’s data out of RAM to display it, so isn’t not a good comparison.

FPP waveforms

Fig. 762 FPP waveforms#

Getting More Colors#

The Adafruit description goes on to say:

information

The only downside of this technique is that despite being very simple and fast, it has no PWM control built-in! The controller can only set the LEDs on or off. So what do you do when you want full color? You actually need to draw the entire matrix over and over again at very high speeds to PWM the matrix manually. For that reason, you need to have a very fast controller (50 MHz is a minimum) if you want to do a lot of colors and motion video and have it look good.

https://cdn-learn.adafruit.com/downloads/pdf/32x16-32x32-rgb-led-matrix.pdf

This is what FPP does, but it’s beyond the scope of this project.

Compiling and Inserting rpmsg_pru#

Problem#

Your Beagle doesn’t have rpmsg_pru.

Solution#

Do the following.

bone$ *cd code/05blocks/module*
bone$ *sudo apt install linux-headers-\`uname -r`*
bone$ *wget https://github.com/beagleboard/linux/raw/4.9/drivers/rpmsg/rpmsg_pru.c*
bone$ *make*
make -C /lib/modules/4.9.88-ti-r111/build M=$PWD
make[1]: Entering directory '/usr/src/linux-headers-4.9.88-ti-r111'
  LD      /home/debian/PRUCookbook/docs/code/05blocks/module/built-in.o
  CC [M]  /home/debian/PRUCookbook/docs/code/05blocks/module/rpmsg_client_sample.o
  CC [M]  /home/debian/PRUCookbook/docs/code/05blocks/module/rpmsg_pru.o
  Building modules, stage 2.
  MODPOST 2 modules
  CC      /home/debian/PRUCookbook/docs/code/05blocks/module/rpmsg_client_sample.mod.o
  LD [M]  /home/debian/PRUCookbook/docs/code/05blocks/module/rpmsg_client_sample.ko
  CC      /home/debian/PRUCookbook/docs/code/05blocks/module/rpmsg_pru.mod.o
  LD [M]  /home/debian/PRUCookbook/docs/code/05blocks/module/rpmsg_pru.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.9.88-ti-r111'
bone$ *sudo insmod rpmsg_pru.ko*
bone$ *lsmod | grep rpm*
rpmsg_pru               5799  2
virtio_rpmsg_bus       13620  0
rpmsg_core              8537  2 rpmsg_pru,virtio_rpmsg_bus

It’s now installed and ready to go.