Yeah, that probably has something to do with it. Not necessarily the affinity for which processor handles the interrupt, but which one does the processing. RPS is at work there, so which processor takes the (hardware) interrupt isn't that important. If you look at /proc/softirqs while doing transfers, sometimes it seems to distribute the load between the two cores while other times it will mostly trigger on one or the other. I'm not sure if that's expected behavior or not.
You can set affinity specifically for RPS. I haven't experimented with that yet. Its default is set to 3, so it'll distribute over both cores. If you set it to 0 you get the old pre-RPS behavior where the core that takes the hardware interrupt processes the packet. That returns some of the bad, old, shaky behavior. (edit: I should say I've only experimented with setting it to zero, since I'm describing it
FWIW, I was able to test a vanilla OpenWrt CC image and config (except static IP on the WAN as before) between my old laptop and my wife's 2011 laptop. My laptop was still the iperf server on the WAN port of the router. In the "upstream" direction (standard iperf3 -- sends from client to server) I could not get it to go beyond 600mbps. In the other direction (iperf3 -R) I got 920mbps over several tests, although some were a bit shaky and moved between 800 and 900mbps.
The exact same config with the client switched out for my desktop went up to 930 in both directions as before.
(Last edited by leitec on 24 Sep 2015, 01:09)