Hi guys! Nice to know there is a thread for discuss mwan3 .
Well, i've decided to use mwan3 to replace a very old script made by me to use dual wan.
Unfortunatelly i am getting all kinds of problems, and maybe someone can help me. A few of them looks like a bug, so i am gonna describe them
Environment: CHAOS CALMER (Chaos Calmer, r47004) running on buffalo AG-300H
Problems:
#1 The moment the interface which holds the 'default gateway' goes down, all local generated traffic stops routing. For example dnsmasq wont be able to resolv the names anymore, so the network appears to off for the users using the dnsmasq server.
How to replicate:
Lets stop both wan and wan2
# idown wan ; ifdown wan2
# ip route
192.168.254.0/24 dev br-lan proto kernel scope link src 192.168.254.1
# ip rule
0: from all lookup 128
1: from all lookup local
2253: from all fwmark 0xfd00/0xff00 blackhole
2254: from all fwmark 0xfe00/0xff00 unreachable
32766: from all lookup main
32767: from all lookup default
now lets enable wan
# ifup wan
# ip route
default via 187.115.211.13 dev pppoe-wan proto static
187.115.211.13 dev pppoe-wan proto kernel scope link src 177.135.34.67
192.168.254.0/24 dev br-lan proto kernel scope link src 192.168.254.1
# ip rule
0: from all lookup 128
1: from all lookup local
1001: from all iif pppoe-wan lookup main
2001: from all fwmark 0x100/0xff00 lookup 1
2253: from all fwmark 0xfd00/0xff00 blackhole
2254: from all fwmark 0xfe00/0xff00 unreachable
32766: from all lookup main
32767: from all lookup default
We can see that the pppoe-wan default gateway became the default gateway
now lets enable wan2
# ip route
default via 187.60.72.1 dev eth0.2 proto static src 187.60.72.136
187.60.72.0/22 dev eth0.2 proto kernel scope link src 187.60.72.136
187.60.72.1 dev eth0.2 proto static scope link src 187.60.72.136
187.115.211.13 dev pppoe-wan proto kernel scope link src 177.135.34.67
192.168.254.0/24 dev br-lan proto kernel scope link src 192.168.254.1
# ip rule
0: from all lookup 128
1: from all lookup local
1001: from all iif pppoe-wan lookup main
1002: from all iif eth0.2 lookup main
2001: from all fwmark 0x100/0xff00 lookup 1
2002: from all fwmark 0x200/0xff00 lookup 2
2253: from all fwmark 0xfd00/0xff00 blackhole
2254: from all fwmark 0xfe00/0xff00 unreachable
32766: from all lookup main
32767: from all lookup default
Now the default gateway became the gateway from eth0.2 interface. No problem, since all the routing is done
by the mark thing.
The problems start if i disable wan2 interface, which holds the current default gateway.
# ifdown wan2
# ip route
187.115.211.13 dev pppoe-wan proto kernel scope link src 177.135.34.67
192.168.254.0/24 dev br-lan proto kernel scope link src 192.168.254.1
# ip rule
0: from all lookup 128
1: from all lookup local
1001: from all iif pppoe-wan lookup main
2001: from all fwmark 0x100/0xff00 lookup 1
2253: from all fwmark 0xfd00/0xff00 blackhole
2254: from all fwmark 0xfe00/0xff00 unreachable
32766: from all lookup main
32767: from all lookup default
Now linux does not have a 'global' default gateway, so dnsmasq cant resolve names:
# ping www.google.com
ping: bad address 'www.google.com'
if i add a dummy default gateway (any ip from my network even one not used) it will work again
# ip route add default via 192.168.254.111 (ip not in use)
# ping www.google.com -c1 -w2
PING www.google.com (177.43.170.110): 56 data bytes
64 bytes from 177.43.170.110: seq=0 ttl=60 time=8.351 ms
This shows that linux requires ANY default gateway in main table to work
I was able to fix this problem using this patch: i created.
It fixes two problems:
1) when the interface goes down and the current gw is lost, it will set the default gw for the first default gw it founds in any other table
2) It will stop complaining (a few times) with "Could not find gateway for interface..."
--- 15-mwan3
+++ 15-mwan3.new
@@ -1,5 +1,11 @@
#!/bin/sh
+
+source /usr/share/libubox/jshn.sh
+source /lib/functions/network.sh
+
+
+
mwan3_get_iface_id()
{
let iface_count++
@@ -363,16 +369,11 @@
if [ $ACTION == "ifup" ]; then
[ "$enabled" -eq 1 ] || return 0
- while [ -z "$($IP route list dev $DEVICE default | head -1)" -a "$counter" -lt 10 ]; do
- sleep 1
- let counter++
- if [ "$counter" -ge 10 ]; then
- $LOG warn "Could not find gateway for interface $INTERFACE ($DEVICE)" && return 0
- fi
- done
+ json_load "$(ifstatus $INTERFACE)"
+ network_get_gateway IFGW $INTERFACE
- route_args=$($IP route list dev $DEVICE default | head -1 | sed '/.*via \([^ ]*\) .*$/!d;s//via \1/;q' | egrep '[0-9]{1,3}(\.[0-9]{1,3}){3}')
- route_args="$route_args dev $DEVICE"
+ $LOG info gateway for interface $INTERFACE \(${DEVICE:-unknown}\) is $IFGW
+ route_args="via $IFGW dev $DEVICE"
fi
while [ "$(pgrep -f -o hotplug-call)" -ne $$ -a "$counter" -lt 60 ]; do
@@ -391,7 +392,14 @@
mwan3_set_iface_route
mwan3_set_iface_rules
- [ $ACTION == "ifdown" ] && mwan3_set_iface_ipset
+ if [ $ACTION == "ifdown" ] ; then
+ mwan3_set_iface_ipset
+ if [ $(ip route | grep -c '^default via') == 0 ] ; then # NO DEFAULT GATEWAY FOR LOCAL TRAFFIC
+ NEW_GW=$(ip route show table all | grep '^default via' | head -1 | cut -d ' ' -f 3)
+ [ -n "$NEW_GW" ] && ip route add default via $NEW_GW || $LOG err No default gateway available
+ fi
+
+ fi
[ $ACTION == "ifup" ] && mwan3_track
config_foreach mwan3_set_policies_iptables policy
#2 This is not exactly a bug, but maybe a feature request
# Local generated traffic should honor the interface it is bound to.
How to replicate:
Lets stop both wan and wan2 interfaces, and test each one separately using the speedtest console client bind the source address to the right interface, just to become clear whats happening.
# ifdown wan
# ifdown wan2
# ifup wan
# speedtest_cli --server 5135 --source $(getip pppoe-wan)
Running speedtest.net console client
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Global Village Telecom (177.135.36.245)...
Testing download speed........................................
Download: 48.05 Mbits/s
Testing upload speed..................................................
Upload: 5320.10 Kbits/s
# ifdown wan
# ifup wan2
# speedtest_cli --server 5135 --source $(getip eth0.2)
Running speedtest.net console client
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Cabo Servicos De Telecomunicacoes Ltda (187.60.72.136)...
Testing download speed........................................
Download: 5.92 Mbits/s
Testing upload speed..................................................
Upload: 402.48 Kbits/s
The results are as expected, my main link is 50Mbit/5Mbit band the backup link 6Mbit/512kbit
# ifup wan (both interfaces are now up)
# speedtest_cli --server 5135 --source $(getip pppoe-wan)
Running speedtest.net console client
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Cabo Servicos De Telecomunicacoes Ltda (187.60.72.136)... -> I asked to bound to main link (getip pppoe-wan), it appears to have bound to the backup IP
Testing download speed........................................
Download: 21.09 Mbits/s
Testing upload speed..................................................
Upload: 504.31 Kbits/s
Very strange speeds.
I think that behaviour for local generated traffic is causing the next bug
# Interface not holding the default gateway changes to offline after a while.
After a while , the backup interface not holding the default gw just stop answering pings, and
it is put in offline state. ()
# ip route
default via 187.115.211.13 dev pppoe-wan proto static
# ping -I pppoe-wan 8.8.8.8 -c1
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=53 time=53.102 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 53.102/53.102/53.102 ms
# ping -I eth0.2 8.8.8.8 -w 2 -c1
PING 8.8.8.8 (8.8.8.8): 56 data bytes
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
I known that the interface is up, answering the ping for its gateway
# ping -I eth0.2 187.60.72.1 -w 2 -c1
PING 187.60.72.1 (187.60.72.1): 56 data bytes
64 bytes from 187.60.72.1: seq=0 ttl=255 time=9.696 ms
--- 187.60.72.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 9.696/9.696/9.696 ms
If i add the following rules the interface will work correctly:
ip rule add pref 501 oif pppoe-wan lookup 1
ip rule add pref 502 oif eth0.2 lookup 2
# ip rule
0: from all lookup 128
1: from all lookup local
501: from all oif pppoe-wan lookup 1
502: from all oif eth0.2 lookup 2
1001: from all iif pppoe-wan lookup main
1002: from all iif eth0.2 lookup main
2001: from all fwmark 0x100/0xff00 lookup 1
2002: from all fwmark 0x200/0xff00 lookup 2
2253: from all fwmark 0xfd00/0xff00 blackhole
2254: from all fwmark 0xfe00/0xff00 unreachable
32766: from all lookup main
32767: from all lookup default
# ping -I eth0.2 8.8.8.8 -w 2 -c1
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=54 time=55.013 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 55.013/55.013/55.013 ms
# ping -I pppoe-wan 8.8.8.8 -w 2 -c1
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=53 time=53.251 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 53.251/53.251/53.251 ms
I dont have luci interface installed, all the configuration was done by editing config files.
Hope someone can help!