OpenWrt Forum Archive

Topic: MWAN3 - loadbalancing not switching continuous ping after WAN switch

The content of this topic has been archived on 20 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

Hi All,

I have installed MWAN3 on my WR1043NDv4 router, using the last stable of Openwrt/LEDE. I have 1 cable interface going to my home LAN plugged into the router's WAN interface, and I have installed on 3G USB stick. Both work, and as you would expect my 3G interface shows a slower response:

root@LEDE:~# ping -c 1 -I eth0.2 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=57 time=10.267 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 10.267/10.267/10.267 ms
root@LEDE:~# ping -c 1 -I 3g-lte 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=56 time=290.559 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 290.559/290.559/290.559 ms

My MWAN3 config is as such that everything is routed via the cabled interface, and if that should fail, the 3G interface takes over. That works as well, when I pull the cable, after 15 seconds I am back connected again and on sites such as ip4.me I see the IP address of my UMTS provider.

However, when I start a continuous ping to, for example 8.8.8.8, from my laptop that sits behind the router, and I do a failover, my ping keeps showing as

C:\Users\nielsl>ping -t 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Reply from 192.168.178.183: Destination host unreachable.
Reply from 192.168.178.183: Destination host unreachable.

Any other address that I haven't pinged recently gives a proper response:

C:\Users\nielsl>ping -t 8.8.4.4

Pinging 8.8.4.4 with 32 bytes of data:
Reply from 8.8.4.4: bytes=32 time=40ms TTL=55
Reply from 8.8.4.4: bytes=32 time=40ms TTL=55
Reply from 8.8.4.4: bytes=32 time=39ms TTL=55
Reply from 8.8.4.4: bytes=32 time=37ms TTL=55
Reply from 8.8.4.4: bytes=32 time=45ms TTL=55

Ping statistics for 8.8.4.4:
    Packets: Sent = 5, Received = 5, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 37ms, Maximum = 45ms, Average = 40ms

Is there some routing table maintained for the ping that is running consistantly? Is this expected behaviour? Or am I missing something in my config?

/etc/config/network

root@LEDE:~# cat /etc/config/network

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.255'
        option gateway '192.168.1.1'

config globals 'globals'
        option ula_prefix 'fd14:70f1:ea27::/48'

config interface 'lan'
        option type 'bridge'
        option ifname 'eth0.1'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'

config device 'lan_dev'
        option name 'eth0.1'
        option macaddr '84:16:f9:9b:af:cc'

config interface 'wan'
        option ifname 'eth0.2'
        option proto 'dhcp'
        option defaultroute '1'
        option metric '10'

config device 'wan_dev'
        option name 'eth0.2'
        option macaddr '84:16:f9:9b:af:cd'

config interface 'wan6'
        option ifname 'eth0.2'
        option proto 'dhcpv6'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option ports '1 2 3 4 0t'

config switch_vlan
        option device 'switch0'
        option vlan '2'
        option ports '5 0t'

config interface 'lte'
        option proto '3g'
        option device '/dev/ttyUSB0'
        option service 'umts'
        option apn 'live.vodafone.com'
        option pincode '0000'
        option dialnumber '*99#'
        option ipv6 'auto'
        option defaultroute '0'

config route 'lte_route'
        option interface 'lte'
        option target '0.0.0.0'
        option gateway '10.64.64.64'
        option netmask '0.0.0.0'
        option metric '50'

/etc/config/mwan3

root@LEDE:~# cat /etc/config/mwan3

config interface 'wan'
        option enabled '1'
        option timeout '2'
        list track_ip '208.67.222.222'
        list track_ip '208.67.220.220'
        list track_ip '8.8.4.4'
        list track_ip '8.8.8.8'
        option reliability '3'
        option up '10'
        option count '1'
        option interval '3'
        option down '2'

config interface 'lte'
        option enabled '1'
        option reliability '1'
        option count '1'
        option timeout '2'
        option down '5'
        option up '10'
        list track_ip '208.67.220.220'
        list track_ip '208.67.222.222'
        list track_ip '8.8.4.4'
        list track_ip '8.8.8.8'
        option interval '30'

config policy 'wan_lte'
        list use_member 'wan_m1_w1'
        list use_member 'lte_m2_w2'
        option last_resort 'unreachable'

config member 'wan_m1_w1'
        option interface 'wan'
        option metric '1'
        option weight '1'

config member 'lte_m2_w2'
        option interface 'lte'
        option metric '2'
        option weight '2'

config rule 'default'
        option proto 'all'
        option sticky '0'
        option use_policy 'wan_lte'
        option dest_ip '0.0.0.0/0'

(Last edited by nielsl on 28 Jan 2018, 17:39)

hi
I have a similar problem.
In my case is failing the script mwan3.sh, it can´t create the rules in the routing policy database for control the route selection. That's why my router just send all the trafic for the wan with lower metric (and the fail over work correctly, but there is nothing of load balancing)

when I run the script:  (try this with ssh console to see errors)

#mwan3 restart
ip: invalid argument '0xfd00/0xff00' to 'fwmark'
ip: invalid argument '0xfe00/0xff00' to 'fwmark'
ip: invalid argument '0xfd00/0xff00' to 'fwmark'
ip: invalid argument '0xfe00/0xff00' to 'fwmark'
ip: invalid argument '0x100/0xff00' to 'fwmark'
ip: invalid argument '0xfd00/0xff00' to 'fwmark'
ip: invalid argument '0xfe00/0xff00' to 'fwmark'
ip: invalid argument '0xfd00/0xff00' to 'fwmark'
ip: invalid argument '0xfe00/0xff00' to 'fwmark'
ip: invalid argument '0x200/0xff00' to 'fwmark'

The script is using the comand "ip rule add...." to try to create the rules in the RPD (routing policy database)

debuging the script:

+ ip -4 rule add pref 2253 fwmark 0xfd00/0xff00 blackhole

error-> ip: invalid argument '0xfd00/0xff00' to 'fwmark'

(see line 142 381 395 etc in github.com/openwrt/packages/blob/master/net/mwan3/files/lib/mwan3/mwan3.sh)


but look like that the ip command in BusyBox v1.27.2 is not handling
the mask fwmark (/0xff00) and the  "blackhole" action as it would

This happened 10 times during all the script  for diferent rules

see man7.org/linux/man-pages/man8/ip-rule.8.html

I don´t know if the problem is a change in the script mwan3.sh or a change in the ip command in busybox
I did a dirty fix in the script and some  ip rules are created in the R.P.D.

I'm a novice with Linux( and with the English language too)

I hope this information is useful for someone else

Im runing the LEDE/OpenWrt SNAPSHOT r5977-a9c65c22a1 for a tplink wr-841n V13

I installed the ip-full package and... voila (see github.com/openwrt/packages/issues/5058)
no more error!!! the rules are now there

~# ip rule show
0:      from all lookup local
1001:   from all iif eth0.2 lookup main
1002:   from all iif eth0.3 lookup main
2001:   from all fwmark 0x100/0xff00 lookup 1
2002:   from all fwmark 0x200/0xff00 lookup 2
2253:   from all fwmark 0xfd00/0xff00 blackhole
2254:   from all fwmark 0xfe00/0xff00 unreachable
32766:  from all lookup main
32767:  from all lookup default

the load balancing is working properly!!!

I've installed the ip-full package, and still no luck. Whenever I start a ping on my laptop to 8.8.8.8, and I pull the cable, I would expect the ping to fall from my cable connection (primary) towards the UMTS connection, since I've configured the LAN cable as my 100% traffic supplier, and the 3G connection as my back-up supplier, so no load balancing.

Odd situation, if I start a ping on my back-up interface (so cable pulled, MWAN3 showing primary link as down), and I plug the cable back in, once MWAN3 sees the cable as "online" , my ping switches back to the cabled connection.

To summarize:

Failover from Cabled WAN to UMTS WAN -> Not working
Failover BACK from UMTS WAN to Cabled WAN -> Working

Who can help me?

The discussion might have continued from here.