OpenWrt Forum Archive

Topic: Compiler Optimization Tweaks

The content of this topic has been archived between 27 Apr 2018 and 29 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

See below for results from my recent tests. Clearly there's some difference in OpenSSL performance, but has anyone seen any improvements in real life use?

Tested with trunk r33210. Flash space usage was ~15% higher with -O3 than with -Os. Between tests entire source tree was deleted and recompiled from scratch to ensure compiler flags were used and no old binaries were mixed in.

system type     : Atheros AR7241 rev 1
machine         : TP-LINK TL-MR3220
processor       : 0
cpu model       : MIPS 24Kc V7.4
BogoMIPS        : 265.42
CPU frequency   : 400 MHz
Baseline: -Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves (OpenWrt default)
Test 1:   -O3 -pipe -march=24kc -mtune=24kc -fno-caller-saves
Test 2:   -Os -pipe -march=24kc -mtune=24kc -fno-caller-saves
Test 3:   Base system built with:    -O3 -pipe -march=24kc -mtune=24kc -fno-caller-saves
          OpenSSL, zlib, Strongswan: -Os -pipe -mips32r2 -mtune=mips32r2 -fno-caller-saves
OpenSSL benchmark:

                          MD5       SHA1     SHA256     SHA512        DES       3DES     AES128     AES192     AES256  RSAsign  RSAvrfy DSAsign DSAvrfy
Baseline | 1.0.1c |  21565440 |  6903470 |  4688820 |  2866180 |  2922150 |  1043180 |  4944900 |  4292270 |  3793580 |  3.6 |  122.2 |  12.3 |  10.0 |
Test 1   | 1.0.1c |  20801880 |  7393690 |  5376000 |  2405720 |  3456000 |  1242790 |  5537110 |  4813480 |  4258820 |  4.4 |  154.0 |  15.4 |  12.7 |
   vs. baseline         -3,5%      +7,1%     +14,6%     -16,1%     +18,3%     +19,1%     +12,0%     +12,1%     +12,3%  +22,2%  +26,0%   +25,2%  +27,0%
Test 2   | 1.0.1c |  21723870 |  6949550 |  4764330 |  2896550 |  3103400 |  1107630 |  5391020 |  4696750 |  4161190 |  4.4 |  152.9 |  15.3 |  12.6 |
   vs. baseline         +0,7%      +0,7%      +1,6%      +1,1%      +6,2%      +6,2%      +9,0%      +9,4%      +9,7%  +22,2%  +25,1%   +24,4%  +26,0%
Test 3   | 1.0.1c |  21845330 |  6928380 |  4676270 |  2876100 |  2921470 |  1039700 |  4945920 |  4288510 |  3793580 |  3.6 |  122.4 |  12.2 |  10.0 |
   vs. baseline         +1,3%      +0,4%      -0,3%      +0,3%         0%      -0,3%         0%      -0,1%         0%      0%   +0,2%    -0,8%      0%
Network tests: (wget http://remote-server/100M -O /dev/null)

Strongswan IPSEC (ike:aes256-sha1-modp1536, esp:aes128-sha1):
Baseline: 2,3MB/s
Test 1:   2,3MB/s
Test 2:   2,3MB/s
Test 3:   2,3MB/s

Strongswan IPSEC (ike:aes256-sha1-modp1536, esp:null-null):
Baseline: 9,1MB/s
Test 1:   9,1MB/s
Test 2:   9,1MB/s
Test 3:   9,1MB/s

Between same hosts just L3 routed, no nat, no ipsec
Baseline: 9,1MB/s
Test 1:   9,1MB/s
Test 2:   9,1MB/s
Test 3:   9,1MB/s

There was no difference in numbers if wget was executed on OpenWrt router or on PC behind it.

Strongswan tests used built-in crypto libs. Router didn't have enough flash to install both openssl and strongswan.

Tests performed with snapshots from downloads.openwrt.org on 680MHz AR7161 resulted 3,5MB/s with Strongswan AES128-SHA1. Performance was identical regardless what underlying crypto lib was used.

Attempts to use kernel ciphers (af-alg) with Strongswan resulted incorrectly encrypted packets. This works on i386 PC so something seems to be broken with kernel crypto or Strongswan on OpenWrt. Not sure if this affects other applications utilizing kernel ciphers on OpenWrt.

(Last edited by jr on 19 Aug 2012, 01:32)

I successfully compiled a firmware for my WDR4300 using the following CFLAGs:
-Ofast -march=74kc -mdspr2
But I couldn't prove if this caused a notable improvement in Samba file transfer speeds (which was my goal). I measured higher speeds after the update but these measurements have very bad repeatability.

So, I just wished to share that using this "ultimate speed" setup is possible in case you use uclibc and you don't need any software which requires strictly standard floating point math (like SQLitev3 -> which I don't need but some may do), but you normally get a message about that.
And I wanted to add that -fno-caller-saves should be removed and that mtune is redundant if you set march for your CPU - which is theoretically the best ; but you should add the extra instruction sets manually (like dsp).

I am not sure if mips16 could improve anything but I couldn't test this with the current toolchain. I guess it requires GCC 4.7.1_final or later (so the current pre-release 4.7.x Linaro compiler won't do it).

EDIT: No, updating the compiler didn't help with -mips16. (But thank you for that changset anyway, florian.)

(Last edited by janos666 on 14 Oct 2012, 05:11)

eigma wrote:

https://dev.openwrt.org/changeset/33531/ should have helped a bit with OpenSSL SHA-1 and AES on MIPS.

It seems it helps a lot... SHA-1 scores are more than doubled, SHA-256 more than 30% up! AES went down actually, the rest increased a little bit.

mysys-30909-tune24kc  | 1.0.0g |  37290670 |  11891070 |  8070760 |  4896020 |  5278750 |  1886500 |  9166520 |  8014780 |  7127520 |  7.5 |  261.2 |  26.1 |  21.5 |
mysys-32165-tune24kc  | 1.0.1c |  36653400 |  11742550 |  8041130 |  5162190 |  5237420 |  1870850 |  9093460 |  7920640 |  7285640 |  7.5 |  258.5 |  26.7 |  22.5 |
mysys-34054-24kc-gcc47| 1.0.1c |  36851370 |  23923030 | 11097900 |  4999850 |  5333390 |  1905530 |  8781480 |  7592330 |  6651560 |  8.1 |  281.7 |  28.3 |  23.1 |

I do have to note that my last result (r34054) was obtained using gcc 4.7.3, where the other scores used gcc 4.6.x. All results on a Netgear WNDR3700v1 using my own compilations, using mtune=24kc (not default).

Has anyone actually seen real-world improvements using -O3 vs default -Os flags? I do see better results with "openssl speed" benchmarks, but file transfer or bandwidth tests via SSL is still limited by actual bandwidth caps.

You can set it in menuconfig:

Advanced configuration option -> Target Options -> Target Optimizations

But I think you just have to enable it, or maybe the settings shown there, are already the default settings (?)

This is what I found on BB for the TL-WDR4300:

 -Os -pipe -mno-branch-likely -mips32r2 -mtune=34kc 

And this looks like quite optimized. Runable on all mips32r2, optimized for 34kc-CPUs, and optimized for minimal (flash-)space usage.

(Last edited by eleon216 on 17 Feb 2015, 11:25)

I don't think the global CFLAG set (the one discussed here which can be modified in the buildroot menuconfig and stored in the top level .config) gets appended to the end of the cflag list used when building the Linux kernel. I tried to change between -Os and -O2 but the uncompressed kernel binaries came in at the exact same size (the packages aren't).

Could somebody confirm this? Should this be reported as a bug or merely posted on the mailing list as suggestion/request to get this sorted out?
I think it's inconsistent behavior if kernel modules are deliberately built with CFLAG included but the kernel itself intentionally isn't.

Ps: I had manually edited the target kernel config template earlier to disabled the optimize_for_size option, so I think my kernel is always built with O2 now. Although, this option would in theory always force Os if enabled but not force O2 or anything if disabled.

(Last edited by janos666 on 21 Jun 2015, 10:38)

janos666 wrote:

I don't think the global CFLAG set (the one discussed here which can be modified in the buildroot menuconfig and stored in the top level .config) gets appended to the end of the cflag list used when building the Linux kernel. I tried to change between -Os and -O2 but the uncompressed kernel binaries came in at the exact same size (the packages aren't).

You are right. I have the following patch file for AA that will modify the kernel Makefile to optimise for 24kc.

$ cat 999-kernel_makefile.patch
--- a/arch/mips/Makefile    2013-03-06 21:54:18.825913173 +0000
+++ b/arch/mips/Makefile    2013-03-06 21:55:30.441916721 +0000
@@ -130,8 +130,8 @@
 cflags-$(CONFIG_CPU_TX49XX)    += -march=r4600 -Wa,--trap
 cflags-$(CONFIG_CPU_MIPS32_R1)    += $(call cc-option,-march=mips32,-mips32 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS32) \
             -Wa,-mips32 -Wa,--trap
-cflags-$(CONFIG_CPU_MIPS32_R2)    += $(call cc-option,-march=mips32r2,-mips32r2 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS32) \
-            -Wa,-mips32r2 -Wa,--trap
+cflags-$(CONFIG_CPU_MIPS32_R2)    += $(call cc-option,-march=mips32r2,-mtune=24kc -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS32) \
+            -Wa,-mips32r2 -mtune=24kc -Wa,--trap
 cflags-$(CONFIG_CPU_MIPS64_R1)    += $(call cc-option,-march=mips64,-mips64 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS64) \
             -Wa,-mips64 -Wa,--trap
 cflags-$(CONFIG_CPU_MIPS64_R2)    += $(call cc-option,-march=mips64r2,-mips64r2 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS64) \

I didn't continue doing this for BB and CC but you can create a similar patch file for the current builds.

Thank you for the answer.
I started to peak into some Linux related Makefiles in the trunk sources before I noted the issue here but I couldn't find the right one to fiddle with (hence the ask for some insight).
I did find some interesting stuff though. It seems like there are some -mtune=xy (and even -mdsp[r2])  presets ready at place for several kind of MIPS CPUs, although these don't seem to be actually used anywhere at the moment (unless for the kernel and I just failed to see it through? I mean both that .nk file and the V=s output?).

janos666 wrote:

Thank you for the answer.
I started to peak into some Linux related Makefiles in the trunk sources before I noted the issue here but I couldn't find the right one to fiddle with (hence the ask for some insight).
I did find some interesting stuff though. It seems like there are some -mtune=xy (and even -mdsp[r2])  presets ready at place for several kind of MIPS CPUs, although these don't seem to be actually used anywhere at the moment (unless for the kernel and I just failed to see it through? I mean both that .nk file and the V=s output?).


Openwrt's trunk or branches doesn't have any "Linux Makefiles". It has the patches to the actual Linux kernel source codes. During build process, the scripts will download the real Linux source, run those patches on the source codes. My patch file above is applied the Linux kernel's Makefiles after it is downloaded.


If you want compile the Linux kernel with -O2, you'll need to:

1. Find "Makefile" of the kernel version that Openwrt will using.
2. Create a patch file to modify the compile-time CFLAGs in that Makefile
3. Put that patch file inside "target/linux/<arch>/patches-<kernel-version>/". For common Atheros routers on CC, that path would be "target/linux/ar71xx/patches-3.18/"


That being said, I was trying to increase WAN-LAN routing speed. But I didn't find any performance difference between -O2 and -Os.

It looks like the only difference regarding MIPS32_R2 since you updated your patch are the line numbers. I tried to use this for my WDR4300:

--- a/arch/mips/Makefile    Sun Jun 14 18:19:31 2015
+++ b/arch/mips/Makefile    Tue Jun 23 20:24:48 2015
@@ -154,8 +154,8 @@
 cflags-$(CONFIG_CPU_TX49XX)    += -march=r4600 -Wa,--trap
 cflags-$(CONFIG_CPU_MIPS32_R1)    += $(call cc-option,-march=mips32,-mips32 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS32) \
             -Wa,-mips32 -Wa,--trap
-cflags-$(CONFIG_CPU_MIPS32_R2)    += $(call cc-option,-march=mips32r2,-mips32r2 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS32) \
-            -Wa,-mips32r2 -Wa,--trap
+cflags-$(CONFIG_CPU_MIPS32_R2)    += $(call cc-option,-march=74kc ,-march=74kc -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS32) \
+            -Wa,-march=74kc -Wa,--trap
 cflags-$(CONFIG_CPU_MIPS64_R1)    += $(call cc-option,-march=mips64,-mips64 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS64) \
             -Wa,-mips64 -Wa,--trap
 cflags-$(CONFIG_CPU_MIPS64_R2)    += $(call cc-option,-march=mips64r2,-mips64r2 -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS64) \

along with this patch for the kernel config:

--- a/target/linux/ar71xx/config-3.18
+++ b/target/linux/ar71xx/config-3.18
@@ -155,7 +155,6 @@ CONFIG_ATH79_NVRAM=y
 CONFIG_ATH79_PCI_ATH9K_FIXUP=y
 # CONFIG_ATH79_ROUTERBOOT is not set
 CONFIG_ATH79_WDT=y
-CONFIG_CC_OPTIMIZE_FOR_SIZE=y
 CONFIG_CEVT_R4K=y
 CONFIG_CLKDEV_LOOKUP=y
 CONFIG_CLONE_BACKWARDS=y

and my buildroot config CFGLAG are:

-march=74kc -O2 -fomit-frame-pointer

I didn't include this last flag in the kernel Makefile patch because this is controlled by a kernel config parameter (just like the default->size optimization switch). I am not really sure about the default of this option for MIPS or the Linux kernel when configured for MIPS. I think it is -fomit for the kernel (at least on AMD64) but not for GCC. I will check this later.

Unfortunately, my board fails to boot with the resulting binary image. sad
It looks like I have to use mtune with the default march. I will try that later.


Another note:
Including the -mips16 cflag for the kernel results in build errors, even though I used the MIPS16 checkbox in the buildroot menuconfig without any problems (so packages which do not individually disable this cflag were theoretically built with -mips16 [seemingly without errors], thus the toolchain should be ready to handle the corresponding library calls):

CC [M]  crypto/ablkcipher.o
{standard input}: Assembler messages:
{standard input}:941: Error: invalid operands `lbu $1,($3)'
{standard input}:943: Error: invalid operands `sb $1,($2)'
{standard input}:944: Error: invalid operands `beqz $1,2f'
{standard input}:964: Error: invalid operands `lbu $1,($24)'
{standard input}:966: Error: invalid operands `sb $1,($3)'
{standard input}:967: Error: invalid operands `beqz $1,2f'
{standard input}:970: Error: invalid operands `addiu $24,1'
{standard input}:1032: Error: invalid operands `lbu $1,($3)'
{standard input}:1034: Error: invalid operands `sb $1,($2)'
{standard input}:1035: Error: invalid operands `beqz $1,2f'
{standard input}:1055: Error: invalid operands `lbu $1,($24)'
{standard input}:1057: Error: invalid operands `sb $1,($3)'
{standard input}:1058: Error: invalid operands `beqz $1,2f'
{standard input}:1061: Error: invalid operands `addiu $24,1'
{standard input}:1563: Error: unrecognized opcode `tne $0,$24,12'
scripts/Makefile.build:257: recipe for target 'crypto/ablkcipher.o' failed
make[6]: *** [crypto/ablkcipher.o] Error 1
Makefile:937: recipe for target 'crypto' failed

I doubt the -mdspr2 could be actually useful (for my OpenWRT package set in general, let alone for the Linux kernel).

(Last edited by janos666 on 24 Jun 2015, 07:13)

IMO, the different categories of compiler options must be clearly kept, discussed and tested separately, as they control very different aspects of the generated binaries (not just "faster").


1. Optimisation levels: -Ox

For storage space and main memory constrained devices as routers, IMO only -Os makes sense as a general optimisation.

For individual builds, other optimisation levels may be useful, but the binaries will become larger (some much larger; in FLASH memory, but also in main memory, when executed) and high optimisation levels may result in binaries, which exhibit different run time behavior.


2. Compiler options, which control specific features: e.g. --fomit-frame-pointer, etc.

These are frequently discussed and debated controversially in various places.
IMHO following the best practices of experienced people (e.g. Linux kernel hackers) and fully understanding, what you are doing (RTFM: gcc's documentation is good!), is an appropriate approach.


3. Architectural compiler options: -march and -mtune

-march tells the compiler, which CPU instruction set it has to use.
Binaries generated with instructions the actual CPU does not support, will not run (at least not correctly).
OTOH, specific CPU features cannot be used, if the compiler is not told to generate CPU instructions for them.  These CPU features are designed to enhance performance vastly in specific corner cases, e.g. a FPU (floating point unit) for operations on floating point numbers, cryptographic accelerators for specific crypto operations and DSP extensions (originally intended for audio and video processing) may be used to accelerate various operations.
Note that if binaries compiled with too different -march settings are used together (think of e.g. base image versus downloaded packages), the "calling conventions" may differ, hence the binaries may not work together, even though they are all executable on a specific CPU.

-mtune controls the scheduling of the CPU instructions, i.e. in which order they are put into a binary.
The gains in performance are rather small and when optimising for a specific CPU type, the binaries will run slower on other, compatible CPU types.  Thus this comes for free (i.e. no disadvantages) for board specific builds, but is not useful for generic builds.

The specific options, which may be used for 24K*, 34K*, 74K* and 1074K* MIPS CPUs are discussed in my next posting.

(Last edited by olf on 4 Oct 2015, 14:19)

All variants of the 4K, 24K, 34K, 74K, 1004K and 1074K MIPS CPU cores (and maybe 14K, too, although its brother MicroAptiv uses mips32r3) use mips32r2 as their base instruction set (ISA).  Hence all these MIPS CPU cores run fine with binaries compiled with -march=mips32r2.
NB: The next generation of MIPS cores, the ProAptiv, InterAptiv and MicroAptiv series uses mips32r3, and the recent generation, the MIPS Series 5 "Warrior(s)" uses mips32r5 (P- and M-series) rsp. mips64r5 (I-series) as their base ISA.  IIRC r4 was omitted due to the 4 being seen as an unlucky number in Asia.

The suffix "f" in the name of a x4K* MIPS core denotes the presence of an optional floating point unit (e.g. 74Kf), which is not particularly useful for routers, i.e. the floating point instructions do not accelerate typical use cases of routers.

The "MIPS DSP Module" aka "DSP16 ASE" (Application Specific Extension) is an optional instruction set extension (compiler option -mdsp) for 24K and 34K.  It is sometimes indicated by an "E" in the name, e.g. 24KEc.
The 74K and 1074K seem to always include a "MIPS DSP Module Rev2" (-mdspr2 compiler option) instruction set extension.  It seems that binaries compiled for the original MIPS DSP ASE are executable by MIPS cores with the DSPr2 ASE.

The 4K has the MIPS 16e ASE built in (compiler option -mips16, additionally to e.g. -march=mips32r2 or -march=74kc), it is an option for other mips32r2 cores (e.g. often / always present in 74K, sometimes in 24K).  This instruction set extension is aimed at code density (i.e. small size of binaries), not at speed.
MicroMIPS is a different, incompatible extension with the same goal built into 14K, MicroAptiv and the M-series.

34K, 1004K and InterAptiv optionally support the MIPS multi-threading ("MT"; compiler option -mmt) extension, the equivalent to Intel's "hyperthreading", exhibiting multiple virtual cores per physical core.
IIRC, to use the MIPS multi-threading extension, the Linux kernel has to be configured and compiled with explicit support for it.

The 1004K and the 1074K are variants of the 34K (!) and the 74K for multi-core designs, which do not differ otherwise from their respective base model.
Newer MIPS cores are either multi-core capable (ProAptiv, P-series, I-series) or not, without specifically denoting that in their name.
IIRC, to use MIPS multi-cores, the Linux kernel also has to be configured and compiled with explicit support for it.

NB: The 4K and the 14K seem to lack a proper MMU.


Consequently, e.g. the compiler may automatically use the "MIPS DSP Extensions Rev 2" instructions in binaries compiled with -march=74kc (or explicitly with -mdspr2), and those will not run on any 24K and 34K CPUs (among many more).

But as OpenWRT images are created for specific routers models, a mechanism to specify a particular CPU core in the board description to be used for -march and -mtune (plus the ability to set other board specific compiler and kernel options) when compiling the binaries for an image would be very useful for making the OpenWRT binaries faster and often also smaller on the same hardware.
Software compiled for a broad range of CPU cores (e.g. post-install downloadable binary packages) have to use the smallest common denominator, which is the bare mips32r2 instruction set for x4K* MIPS cores.
WRT the calling conventions, I believe all binaries compiled for any specific mips32r2 core and / or extension(s) are compatible, but have not checked that explicitly for the various extensions (DSP16 ASE, DSPr2 ASE, MT ASE, MIPS 16e ASE etc.).


WRT -mtune, the implementations of the various MIPS CPU cores differ greatly, hence an optimised instruction scheduling may make some difference:
- 4K, 14K, 24K, MicroAptiv and the M-series are little pipelined (typically 5 stages), in-order designs.
- 34K, 1004K and InterAptiv are medium pipelined (9 stages), in-order designs, optionally capable of multi-threading.
- 74K, 1074K, ProAptiv and the P-Series are deeply pipelined (e.g. 15 stages in 74K / 1074K), out-of-order designs.


HTH
olf

(Last edited by olf on 4 Oct 2015, 14:23)

Thank you olf. Your posts helped me solve the problem.
I had to explicitly add the -mno-dspr2 option after -march=74kc. It seem the latter automatically enables the former but it produces broken binaries for my board.
A "cat /proc/cpuid" lists the dspr2 capability but either it's broken on the SoC level (may be a microdoce bug on this particular board?) or it's a GCC 4.8.x bug (or a plain incompatibility from some kernel or software code?).

janos666 wrote:

Thank you olf. Your posts helped me solve the problem.

1. I had to explicitly add the -mno-dspr2 option after -march=74kc. It seem the latter automatically enables the former ...

2. but it produces broken binaries for my board.

3. A "cat /proc/cpuid" lists the dspr2 capability but either it's broken on the SoC level (may be a microdoce bug on this particular board?) or ...

4. it's a GCC 4.8.x bug (or a plain incompatibility from some kernel or software code?).

@1. -march=74kc is supposed to include -mdspr2, as it is an ASE always present in MIPS74Kc cores.
        I am not sure about -mips16 being implied by -march=74kc, though.

@2. That is interesting: The DSPr2 instructions actually seem to be used, when building an OpenWRT image.

@3. https://wikidevi.com/wiki/MIPS_74K does not indicate anything such for Revision 2.0.0 or greater, which probably all production SoCs use.

@4. I assume a compiler toolchain (gcc, binutils et al) bug.

But then (using -march=74kc -mno-dspr2) the only difference (relative to -march=mips32r2 -mtune=74kc) may be the implied -mips16, rather leading to smaller binaries than more speed.
Maybe you want to check that (and also with -march=74kc -mno-mips16 -mno-dspr2), e.g. just simply by comparing the sizes of the generated binaries or images, while leaving all other parameters (compiler etc.) the same?

(Last edited by olf on 16 Oct 2015, 02:45)

olf wrote:
janos666 wrote:

[...]

4. it's a GCC 4.8.x bug (or a plain incompatibility from some kernel or software code?).

[...]

@4. I assume a compiler toolchain (gcc, binutils et al) bug.

See post 2012-03-12 by alphasparc in this thread: Back then at least -mdsp (for a 24KEc) was generating a working image.

The following posts there (2012-03-22 ff.) are quite interesting, providing 0% to 25% speedup across different SoCs and MIPS cores just by these options on older MIPS32R2 cores.  I would expect higher gains for e.g. a 74Kc.
NB: Sad, that no results for the 34Kc with -mt and the right kernel options have been posted (2012-03-17).

(Last edited by olf on 21 Oct 2015, 01:07)

darkwin wrote:

ok i found the multi threading support. What is the right option?

(X) Disable multithreading support.                 
( ) Use 1 TC on each available VPE for SMP     
( ) SMTC: Use all TCs on all VPEs for SMP

For an explanation of VPE and TC, see http://imgtec.com/mips/architectures/multithreading/, paragraph "Multi-Threaded Technology".

olf wrote:

[...]
But as OpenWRT images are created for specific routers models, a mechanism to specify a particular CPU core in the board description to be used for -march and -mtune (plus the ability to set other board specific compiler and kernel options) when compiling the binaries for an image would be very useful for making the OpenWRT binaries faster and often also smaller on the same hardware.

Well, if looks like that mechanism is provided by quilt patches, see http://wiki.openwrt.org/doc/devel/patches.

olf wrote:

Software compiled for a broad range of CPU cores (e.g. post-install downloadable binary packages) have to use the smallest common denominator, which is the bare mips32r2 instruction set for x4K* MIPS cores.
WRT the calling conventions, I believe all binaries compiled for any specific mips32r2 core and / or extension(s) are compatible, but have not checked that explicitly for the various extensions (DSP16 ASE, DSPr2 ASE, MT ASE, MIPS 16e ASE etc.).

WRT the package feeds, I have  not yet found an easy way to control the compiler options for a feed.

The discussion might have continued from here.