OpenWrt Forum Archive

Topic: Kernel Bug : NAT does not work correctly after a PPPOE disconnect.

The content of this topic has been archived on 24 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

Hi. I'm using openwrt whiterussian 0.9

After trying to solve a registration problem with asterisk, i think i've finally tracked down a problem with iptables and NAT.

I have an asterisk server on the local lan, not inside the openwrt router. This server register with another Asterisk located on Internet, through the ppp0 WAN link. The link is using pppoe with a static IP address.This detail is important.



After a pppoe disconnect / reconnect, i'm loosing udp connectivity from the internal LAN Asterisk (UDP 4569), to the asterisk Internet peer (UDP 4569 too).


To track the problem, i've made two tcpdumps on the ppp0 interface. One before the pppoe disconnect, one after.


And i've discovered something that could be considered like a strange NAT bug :


Before pppoe disconnect, all is ok, IAX regsitration works well; we have local request from the local asterisk machine and answers from the Internet asterisk peer :

tcpdump -i ppp0 host asterisk.external.peer # (addresses have been volontary changed for privacy)

15:18:56.107906 IP my.external.IP.4569 > asterisk.external.peer.4569: UDP, length 12
15:18:56.148521 IP asterisk.external.peer.4569 > my.external.IP.4569: UDP, length 12
15:18:56.149290 IP my.external.IP.4569 > asterisk.external.peer.4569: UDP, length 12
15:18:57.024791 IP my.external.IP.4569 > asterisk.external.peer.4569: UDP, length 28
15:18:57.065587 IP asterisk.external.peer.4569 > my.external.IP.4569: UDP, length 39
15:18:57.066404 IP my.external.IP.4569 > asterisk.external.peer.4569: UDP, length 62
15:18:57.110189 IP asterisk.external.peer.4569 > my.external.IP.4569: UDP, length 56


But after the pppoe reconnect :

tcpdump -i ppp0 host asterisk.external.peer


15:22:27.178244 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:29.178923 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:29.184854 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 28
15:22:30.175913 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:31.178632 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:37.178865 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:37.184855 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:39.179526 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:39.180041 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:39.184878 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 12
15:22:39.185024 IP 192.168.15.100.4569 > asterisk.external.peer.4569: UDP, length 28


We can see that the LAN address of the asterisk server is sent over the WAN !!

So we loose communication with the Internet asterisk peer, the reason seems evident, as the peer cannot reply to a non routable LAN address.


It's the second time i see this problem. The first time, i've discovered it with ez-ipupdate, sending the local LAN address of the router instead of the public address to Dyndns. I didn't thought it could be an openwrt or linux problem, and i simply replaced ez-ipupdate by updatedd, definitively solving the problem.

This is a very annoying NAT bug, because as long the we are trying to send data through the NAT opened IAX2 port, the abnormal state is hold. So we loose definitively this udp connection.


To get things back to normal, we need to stop the sending of udp data to the NAT during more than 30 secondes. (more than the NAT udp session timeout). Then after this delay, a new session is opened with masquerading working again.


Unfortunately, asterisk, when loosing the registered state with his peer, try to send data almost each second, causing a definitively lost connectivity.


It seems that the problem does come from conntrack. So there is no way to flush the connection (no userland for conntrack) and no possibility to restart a module as conntrack is build into the kernel.



I would be happy to hear someone come with a clean fix to this problem. A simple add to the ifdown - ifup scripts to disable traffic crossing the NAT should be enough to solve the problem.


It seems that the linux 2.6.20 kernel has the same problem. So this need to be fixed with a workaround.


I will add that the problem seems to exhibit only when using a static IP pppoe connection. I have no problem, until now, with dynamic pppoe dsl links.


Thanks a lot for your help,


Olivier.

Perhaps try setting /proc/sys/net/ipv4/ip_dynaddr to 1?

Thanks, i will try this, but this parameter normaly concern only TCP connections. Asterisk IAX2 is using udp connections.


Olivier.

The discussion might have continued from here.