OpenWrt Forum Archive

Topic: Urgent problem - Unexplained network slowdowns !

The content of this topic has been archived on 1 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

I'm running whiterussian RC3 on many WRT54G V2.2 units.

One unit started to display unexplained network slowdowns since yesterday.

This ping is from another WRT box, directly connected "back-to-back" via an Ethernet cable to the unit in question (192.168.100.2).

After about a minute of "slowdown" everything goes back to Normal:


64 bytes from 192.168.100.2: icmp_seq=181 ttl=64 time=1.3 ms
64 bytes from 192.168.100.2: icmp_seq=182 ttl=64 time=1.4 ms
64 bytes from 192.168.100.2: icmp_seq=183 ttl=64 time=1.9 ms
64 bytes from 192.168.100.2: icmp_seq=184 ttl=64 time=97500.5 ms
64 bytes from 192.168.100.2: icmp_seq=185 ttl=64 time=96501.0 ms
64 bytes from 192.168.100.2: icmp_seq=186 ttl=64 time=95501.4 ms
64 bytes from 192.168.100.2: icmp_seq=187 ttl=64 time=94501.7 ms
64 bytes from 192.168.100.2: icmp_seq=188 ttl=64 time=93502.0 ms
64 bytes from 192.168.100.2: icmp_seq=189 ttl=64 time=92502.4 ms
64 bytes from 192.168.100.2: icmp_seq=190 ttl=64 time=91502.7 ms
64 bytes from 192.168.100.2: icmp_seq=191 ttl=64 time=90502.9 ms
64 bytes from 192.168.100.2: icmp_seq=192 ttl=64 time=89503.3 ms
64 bytes from 192.168.100.2: icmp_seq=193 ttl=64 time=88503.6 ms
64 bytes from 192.168.100.2: icmp_seq=194 ttl=64 time=87504.0 ms
64 bytes from 192.168.100.2: icmp_seq=195 ttl=64 time=86504.3 ms
64 bytes from 192.168.100.2: icmp_seq=196 ttl=64 time=85504.6 ms
64 bytes from 192.168.100.2: icmp_seq=197 ttl=64 time=84504.9 ms
64 bytes from 192.168.100.2: icmp_seq=198 ttl=64 time=83504.4 ms
64 bytes from 192.168.100.2: icmp_seq=199 ttl=64 time=82505.6 ms
64 bytes from 192.168.100.2: icmp_seq=282 ttl=64 time=1.3 ms
64 bytes from 192.168.100.2: icmp_seq=283 ttl=64 time=2.0 ms
64 bytes from 192.168.100.2: icmp_seq=284 ttl=64 time=1.3 ms

Here are some details about the machine:


root@TCV-M-07:~# wget http://openwrt.org/wp/go.php?http://openwrt.inf.fh-brs.de/~nbd/linksys-fixup.sh
Connecting to openwrt.org[195.56.146.238]:80
linksys-fixup.sh     100% |******************************************************|  1808       00:00 ETA
root@TCV-M-07:~# chmod 755 linksys-fixup.sh
root@TCV-M-07:~# ./linksys-fixup.sh
HW type: BCM4712+BCM5325E
nvram set pa0itssit=62
nvram set pa0b0=0x15eb
nvram set pa0b1=0xfa82
nvram set pa0b2=0xfe66
nvram set pa0maxpwr=0x4e
root@TCV-M-07:~# nvram commit
root@TCV-M-07:~# uname -a
Linux TCV-M-07 2.4.30 #1 Wed Sep 14 17:49:26 CEST 2005 mips unknown
root@TCV-M-07:~# lsmod
Module                  Size  Used by    Tainted: P 
wlcompat               14688   0 (unused)
wl                    423640   0 (unused)
et                     32064   0 (unused)
diag                    2560   0 (unused)
root@TCV-M-07:~#
root@TCV-M-07:~# ps -ef
  PID  Uid     VmSize Stat Command
    1 root        392 S   init      
    2 root            SW  [keventd]
    3 root            RWN [ksoftirqd_CPU0]
    4 root            SW  [kswapd]
    5 root            SW  [bdflush]
    6 root            SW  [kupdated]
    7 root            SW  [mtdblockd]
   24 root            SWN [jffs2_gcd_mtd4]
   44 root        392 S   init      
   45 root        428 S   syslogd -C 16 -L -R 192.168.111.10
   47 root        348 S   klogd
  475 root       1588 S   snmpd -Lf /dev/null -p /var/run/snmpd.pid
  478 root        396 S   /usr/sbin/crond -c /etc/crontabs
  484 root        420 S   /usr/sbin/dropbear
  488 root        636 S   /usr/sbin/olsrd -f /etc/olsrd.conf
  505 root        644 S   /usr/sbin/dropbear
  506 root        468 S   -ash
  509 root        384 R   ps -ef
root@TCV-M-07:~#

Any ideas ? suggestions ?

Could this be some sort of DoS attack on the router ? Via the router ?

MRTG traffic graphs do not show any sudden increase in traffic (but I check every 5 minutes - so it might not be noticeable).

The problematic box is the head (internet gateway) of a large Mesh (olsr) network.
Our non-profit community network suffer great deal since this started.

Thanks,


Yahel.
http://www.TibTec.Org/

Hi,

did try and reboot after that commit?

Sure did ...

The fix was there from long ago...
Just did it again for the post (and due to habbit - did the commit again :-)


Any other idea ?


Thanks,

My first suspects would be infected client somewhere or routing loop. I had this happen once (with Cisco Aironet, not wrtg) when client associated with unexpected AP and until switch mac table timed out the traffic for that client would loop (no STP).

- DL

Interesting...
Yet, since this is not an AP, but a Mesh router, it rules out the possibility of a bad client.
Moreoever, the wireless interface is WEP encrypted.

Routing loop ? Where  ? why ?
I'm pinging from a directly connected wired Ethernet interface ?

Thanks anyhow.

yahel wrote:

Interesting...
Yet, since this is not an AP, but a Mesh router, it rules out the possibility of a bad client.
Moreoever, the wireless interface is WEP encrypted.

Not necessarily. What I meant was consider the possibility of a virus infected PC connected to your network. It could create bursts of high rate traffic that would bog your network.

Routing loop ? Where  ? why ?
I'm pinging from a directly connected wired Ethernet interface ?

But if the router is overloaded with traffic from somewhere else it could cause slow ping responses on a directly connected wired interface.

I'm not sure at all that either of these scenarios is your problem - you'd think such things would affect more than the one router. I'm assuming this was working correctly at one point which leads me to suspect traffic issues. Perhaps it would be worth swapping out the router to eliminate the possibility of an issue with this particular unit?

- DL

I also suspect some traffic issues - more like specific DoS attack on the router.
We have 2000 computers surfing the net VIA this router !
It is the head of the mesh peramid.

Yes it was working fine until 2 days ago, and still works fine most of the time.
Every now and then (no pattern) the problem comes again.
It can be fine for 10 hours, then start again and again for 20 minutes.

We are going to replace the router in two days.
I already have a duplicate ready - only with WR-RC4 (now it's RC3).
It's on a very high mast - so not an easy task.

In the meanwhile - I'm going to install a firewall on the machine, preventing any access to it.
(it has open SSH, SNMP and ICMP - which I'll close - other then to my IP).

I don't think it's a hardware problem.
Weather is fine (not cold yet, nor hot) and it's been working for 6 month without a problem.
Power supply is also fine, current drain over the PoE as normal.

thanks.

yahel wrote:

Yes it was working fine until 2 days ago, and still works fine most of the time. Every now and then (no pattern) the problem comes again. It can be fine for 10 hours, then start again and again for 20 minutes.

This is a pattern I've seen before with a certain virus (can't remember which). The infected machine would be quiet most of the time but randomly launch massive traffic for 10 or 20 minutes. IIRC it was a distributed DOS against some website in Japan or Korea. The timing randomness and duration made it difficult to track down because shortly after you started tracing it, it would stop. The easiest way to track these assuming you have traffic counters somewhere, is to look for an imbalance with outbound higher than inbound. For most of our clients inbound is typically 5 to 10 higher than outbound, but again the short duration of the attacks may not make the stats jump out like the constant ones.

ps, IIRC it was mydoom. The original one expired last February but there have been a number of variants since with no expiration, plus there are a number of other ddos viruses out there.

- Don

(Last edited by dl on 8 Dec 2005, 07:28)

One way to at least determine if this is a DOS targetting you or an infected machine on your network targetting outside is to simply look at your border router traffic during one of the attacks and see whether it's inbound or outbound that jumps. The next thing would be to capture some of the traffic and look at the source/dest IPs, but bear in mind that the source may be forged, and the dest may be broadcast.

If swapping the hw involves a tower climb I think I'd only do this as a last resort - imo it's almost certainly a worm issue given the symptoms.

- DL

(Last edited by dl on 8 Dec 2005, 09:44)

The discussion might have continued from here.