Topic: Bug reports for Experimental

The content of this topic has been archived between 2 Apr 2018 and 6 May 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

Page 2 of 3

Post #26

Kurgan

6 Apr 2005, 17:55

dsouth,

I have seen such a behaviour on a WRT54G with original firmware from Linksys... maybe it's a bug in some driver. I was copying lots of files from the LAN to a wireless notebook (Intel Centrino) and the router rebooted several times in about 30 minutes, while copying about 8 gig of data.

Post #27

TheRoDent

7 Apr 2005, 00:40

The rebooting during heavy traffic has been discussed (albeit without much result) on a number of other forums, uncluding the Sveasoft, and linksysinfo.org sites.

Happens not just with OpenWRT but with other firmware too. Either there's a bad batch of WRT's out there, or there is some kind of driver problem in the latest firmware that most builds are now based off.

Post #28

manoj

7 Apr 2005, 01:28

So, since no one's had time to debug my traffic shaping issue above, any
suggestions for how I can go about debugging this? Is there a System.map
for the current openwrt experimental kernel build so I can try pulling
out a stack trace at least?

I have a tiny bit of experience with kernel debugging, and none with
cross-compilation.

Post #29

trevorj

7 Apr 2005, 10:00

The rebooting during heavy traffic has been discussed (albeit without much result) on a number of other forums, uncluding the Sveasoft, and linksysinfo.org sites.
Happens not just with OpenWRT but with other firmware too. Either there's a bad batch of WRT's out there, or there is some kind of driver problem in the latest firmware that most builds are now based off.

Oh man am I glad I'm not the only person having this rebooting issue with wireless. It only happens on the experimental builds for me. Stable works fine. wrt54g 1.0.

It seemed to happen when the speed on a sustained download went over 300k/s, which would agree with the large packet size. Smaller operations worked just fine.

I went back to stable, and it works great as it always did, although I don't get that warm and fuzzy experimental feeling. Maybe it's a bug with the new kernel/drivers ? Makes me wish I had a serial console for my wrt.

Post #30

TheRoDent

7 Apr 2005, 20:44

Alright, not entirely sure if this is related to 'reboot-during-heavy-file-transfers' problem, but I've managed to get a koops on my development wrt2.2 today, constantly as I was trying to compile perl using the uclibc mipsel rootfs.

virtual address 00000150, epc == 800d5cfc, ra
== 800d5ca0
Oops in fault.c::do_page_fault, line 206:
$0 : 00000000 80210000 00000000 00000000 80191180 028f5c29 0000270f 1000fc01
$8 : 00000045 00000001 00000002 214fc000 00000000 80ff6800 00000000 00000000
$16: 80ab62c0 803e3a60 80a71810 1000fc01 00000074 00000074 80d1c930 00000000
$24: 00000000 0047ac34                   80800000 80801a60 8073ca20 800d5ca0
Hi : 000012b9
Lo : 63d75523
epc   : 800d5cfc    Tainted: P
Status: 1000fc02
Cause : 00000008
PrId  : 00029007
Process sh (pid: 4756, stackpage=80800000)
Stack:    80ab68c0 00000000 80ab62c0 803e3a60 80ab62c0 c00c68c8 800b4fbc
 00000000 00000000 80191078 80250e00 801911a8 80cb4720 c00cb70c 80cb4720
 c00c8218 00000000 0047ac34 803e3a60 c00cb70c 80cb4720 c00c6704 00000074
 00000074 80d1c930 800d671c 00000000 80191078 fffffff3 800169b8 00000128
 00000000 00000001 80191070 00000000 80016610 00000003 80b109ac 00000001
 c01420a4 ...
Call Trace:   [<c00c68c8>] [<800b4fbc>] [<c00cb70c>] [<c00c8218>] [<c00cb70c>]
 [<c00c6704>] [<800d671c>] [<800169b8>] [<80016610>] [<c01420a4>] [<c0144258>]
 [<c0143e30>] [<800eced8>] [<800ecc88>] [<80143780>] [<800eced8>] [<800b4fbc>]
 [<c00cb70c>] [<c0143c94>] [<c01442f0>] [<c0141214>] [<c0140b88>] [<c0145d14>]
 [<c0145d14>] [<c0171aa8>] [<c01460d0>] [<c0140950>] [<c0171e44>] [<c0171e34>]
 [<c0171f00>] [<c0172010>] [<c01720c8>] [<c01720a4>] [<c01719e8>] [<c0175a68>]
 [<c01724fc>] [<c017249c>] [<80025588>] [<8002fdf4>] [<80025684>] ...

Code: 10400012  00000000  3c028021 <8c42cf64> 1040000e  00000000  40016000  30e70001  3421
0001
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
 <0>Rebooting in 3 seconds..Please stand by while rebooting the system...

Will do ksymoops analysis shortly.

Post #31

TheRoDent

7 Apr 2005, 21:07

OK, a ksymoops analysis of the kernel dump indicates that et.o (the ethernet driver) is most likely to blame.

Oops in fault.c::do_page_fault, line 206:
Oops in fault.c::do_page_fault, line 206:
$0 : 00000000 80210000 00000000 00000000 80191180 028f5c29 0000270f 1000fc01
$0 : 00000000 80210000 00000000 00000000 80191180 028f5c29 0000270f 1000fc01
$8 : 00000045 00000001 00000002 214fc000 00000000 80ff6800 00000000 00000000
$8 : 00000045 00000001 00000002 214fc000 00000000 80ff6800 00000000 00000000
$16: 80ab62c0 803e3a60 80a71810 1000fc01 00000074 00000074 80d1c930 00000000
$16: 80ab62c0 803e3a60 80a71810 1000fc01 00000074 00000074 80d1c930 00000000
$24: 00000000 0047ac34                   80800000 80801a60 8073ca20 800d5ca0
$24: 00000000 0047ac34                   80800000 80801a60 8073ca20 800d5ca0
Hi : 000012b9
Hi : 000012b9
Lo : 63d75523
Lo : 63d75523
epc   : 800d5cfc    Tainted: P
epc   : 800d5cfc    Tainted: P
Using defaults from ksymoops -a i386
Status: 1000fc02
Status: 1000fc02
Cause : 00000008
Cause : 00000008
PrId  : 00029007
Process sh (pid: 4756, stackpage=80800000)
Process sh (pid: 4756, stackpage=80800000)
Stack:    80ab68c0 00000000 80ab62c0 803e3a60 80ab62c0 c00c68c8 800b4fbc
Stack:    80ab68c0 00000000 80ab62c0 803e3a60 80ab62c0 c00c68c8 800b4fbc
 00000000 00000000 80191078 80250e00 801911a8 80cb4720 c00cb70c 80cb4720
 00000000 00000000 80191078 80250e00 801911a8 80cb4720 c00cb70c 80cb4720
 c00c8218 00000000 0047ac34 803e3a60 c00cb70c 80cb4720 c00c6704 00000074
 c00c8218 00000000 0047ac34 803e3a60 c00cb70c 80cb4720 c00c6704 00000074
 00000074 80d1c930 800d671c 00000000 80191078 fffffff3 800169b8 00000128
 00000074 80d1c930 800d671c 00000000 80191078 fffffff3 800169b8 00000128
 00000000 00000001 80191070 00000000 80016610 00000003 80b109ac 00000001
 00000000 00000001 80191070 00000000 80016610 00000003 80b109ac 00000001
 c01420a4 ...
Call Trace:   [<c00c68c8>] [<800b4fbc>] [<c00cb70c>] [<c00c8218>] [<c00cb70c>]
 c01420a4 ...
Call Trace:   [<c00c68c8>] [<800b4fbc>] [<c00cb70c>] [<c00c8218>] [<c00cb70c>]
 [<c00c6704>] [<800d671c>] [<800169b8>] [<80016610>] [<c01420a4>] [<c0144258>]
 [<c00c6704>] [<800d671c>] [<800169b8>] [<80016610>] [<c01420a4>] [<c0144258>]
 [<c0143e30>] [<800eced8>] [<800ecc88>] [<80143780>] [<800eced8>] [<800b4fbc>]
 [<c0143e30>] [<800eced8>] [<800ecc88>] [<80143780>] [<800eced8>] [<800b4fbc>]
 [<c00cb70c>] [<c0143c94>] [<c01442f0>] [<c0141214>] [<c0140b88>] [<c0145d14>]
 [<c00cb70c>] [<c0143c94>] [<c01442f0>] [<c0141214>] [<c0140b88>] [<c0145d14>]
 [<c0145d14>] [<c0171aa8>] [<c01460d0>] [<c0140950>] [<c0171e44>] [<c0171e34>]
 [<c0145d14>] [<c0171aa8>] [<c01460d0>] [<c0140950>] [<c0171e44>] [<c0171e34>]
 [<c0171f00>] [<c0172010>] [<c01720c8>] [<c01720a4>] [<c01719e8>] [<c0175a68>]
 [<c0171f00>] [<c0172010>] [<c01720c8>] [<c01720a4>] [<c01719e8>] [<c0175a68>]
 [<c01724fc>] [<c017249c>] [<80025588>] [<8002fdf4>] [<80025684>] ...
 [<c01724fc>] [<c017249c>] [<80025588>] [<8002fdf4>] [<80025684>] ...
Warning (Oops_trace_line): garbage '...' at end of trace line ignored

Code: 10400012  00000000  3c028021 <8c42cf64> 1040000e  00000000  40016000  30e70001  3421
Code: 10400012  00000000  3c028021 <8c42cf64> 1040000e  00000000  40016000  30e70001  3421
/usr/bin/objdump: /tmp/ksymoops.58kTzK: File format not recognized
Error (pclose_local): Oops_decode_part pclose failed 0x100
Error (Oops_decode_part): no objdump lines read for /tmp/ksymoops.58kTzK


>>$1; 80210000 <ip_nat_irc_helpers+10c/1e0>
>>$4; 80191180 <softnet_data+0/180>
>>$13; 80ff6800 <_end+dd87f0/3fe8e8a0>
>>$16; 80ab62c0 <_end+8982b0/3fe8e8a0>
>>$17; 803e3a60 <_end+1c5a50/3fe8e8a0>
>>$18; 80a71810 <_end+853800/3fe8e8a0>
>>$22; 80d1c930 <_end+afe920/3fe8e8a0>
>>$28; 80800000 <_end+5e1ff0/3fe8e8a0>
>>$29; 80801a60 <_end+5e3a50/3fe8e8a0>
>>$30; 8073ca20 <_end+51ea10/3fe8e8a0>
>>$31; 800d5ca0 <netif_rx+20/318>

>>???; 800d5cfc <netif_rx+7c/318>   <=====

Trace; c00c68c8 <[et]et_sendup+fc/140>
Trace; 800b4fbc <dma_rx+bc/e4>
Trace; c00cb70c <[et]bcm47xx_et_chops+0/5c>
Trace; c00c8218 <[et]chiprx+20/48>
Trace; c00cb70c <[et]bcm47xx_et_chops+0/5c>
Trace; c00c6704 <[et]et_dpc+54/11c>
Trace; 800d671c <net_rx_action+b0/1dc>
Trace; 800169b8 <tasklet_action+94/104>
Trace; 80016610 <do_softirq+b0/170>
Trace; c01420a4 <[sunrpc]rpc_restart_call+16c0/1878>
Trace; c0144258 <[sunrpc]xprt_transmit+7a8/7d8>
Trace; c0143e30 <[sunrpc]xprt_transmit+380/7d8>
Trace; 800eced8 <ip_rcv_finish+0/2e8>
Trace; 800ecc88 <ip_rcv+51c/590>
Trace; 80143780 <vlan_skb_recv+320/56c>
Trace; 800eced8 <ip_rcv_finish+0/2e8>
Trace; 800b4fbc <dma_rx+bc/e4>
Trace; c00cb70c <[et]bcm47xx_et_chops+0/5c>
Trace; c0143c94 <[sunrpc]xprt_transmit+1e4/7d8>
Trace; c01442f0 <[sunrpc]xprt_reserve+68/1c8>
Trace; c0141214 <[sunrpc]rpc_restart_call+830/1878>
Trace; c0140b88 <[sunrpc]rpc_restart_call+1a4/1878>
Trace; c0145d14 <[sunrpc]rpc_delay+158/490>
Trace; c0145d14 <[sunrpc]rpc_delay+158/490>
Trace; c0171aa8 <[nfs]nfs_destroy_nfspagecache+1760/1b4c>
Trace; c01460d0 <[sunrpc]rpc_execute+84/29c>
Trace; c0140950 <[sunrpc]rpc_call_setup+50/98>
Trace; c0171e44 <[nfs]nfs_destroy_nfspagecache+1afc/1b4c>
Trace; c0171e34 <[nfs]nfs_destroy_nfspagecache+1aec/1b4c>
Trace; c0171f00 <[nfs]nfs_pagein_list+6c/b8>
Trace; c0172010 <[nfs]nfs_scan_lru_read+78/d0>
Trace; c01720c8 <[nfs]nfs_pagein_inode+60/3c4>
Trace; c01720a4 <[nfs]nfs_pagein_inode+3c/3c4>
Trace; c01719e8 <[nfs]nfs_destroy_nfspagecache+16a0/1b4c>
Trace; c0175a68 <[nfs]nfs_sync_file+94/bc>
Trace; c01724fc <[nfs]nfs_readpage+d0/124>
Trace; c017249c <[nfs]nfs_readpage+70/124>
Trace; 80025588 <add_to_page_cache_unique+144/15c>
Trace; 8002fdf4 <_alloc_pages+24/30>
Trace; 80025684 <page_cache_read+e4/130>

0001
0001
Kernel panic: Aiee, killing interrupt handler!
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
 <0>Rebooting in 3 seconds..Please stand by while rebooting the system...

Unfortunately I don't have a mipsel targetted ksymoops to get an instruction dump, but the symbol names definately look correct.

This _does_ appear to be related to the problem others are experiencing seeing as I was doing a massive perl compile over an nfs mounted volume on another host.

Looks like the new ethernet driver dies under high load.

Post #32

TheRoDent

7 Apr 2005, 23:42

I've dropped the MTU to 1024 on my NFS host, and my WRT and the crashes appears to have gone away... :?

Post #33

dsouth

8 Apr 2005, 04:16

I'd love to say that dropping the mtu to 1024 on my hosts cured the
issue, but it didn't. It did make the wrt hang in there longer during
a heavy transfer before finally rebooing though.

On the good side, if it's a bug in the linksys driver, there's the possiblity
of a fix in a future code drop.

--
Dale

Post #34

Kurgan

9 Apr 2005, 13:39

Some tests about the reset under heavy traffic issue:

I am running the tests on GS v1.1 with the previous experimental version (14/3/2005), with WPA enabled. I am transferring data between the LAN ports and the wireless port, so I am just bridging and not routing. Client is win2000 with a Netgear MiniPci b+g card.

I have tried using FTP, and transferring groups of files (60 files about 2 megs each) and single big files (80 megs) back and forth. I am using the serial port to monitor the console. I had no issues at all, with a sustained transfer rate of about 15 Mb/sec.

I then have tried using SMB transfers, with a sustained rate of about 13 Mb/sec and I have experienced just one glitch. The SMB session stopped transferring data and got disconnected, but pinging across the wrt did not fail, and just trying again to connect to the smb share worked without resetting anything. No messages where printed in the WRT's syslog or console. Trying again allowed me to transfer 2 GB of data in about 10 files without any issue.

Clearly I have lots of big (1500 bytes) packets, since I transfer big files.

So, it seems that my WRT does not reset on heavy wireless to wired (bridging) load. I'll have to try with routing, now.

UPDATE
I have tried routing from WAN to LAN (wireless) and from WAN to LAN (wired) and even if I had no resets (and nothing in the syslog) I have found two problems:

1- maximum attainable speed (using 100 megabit connections) if the connection goes through the WRT (with default NAT rules and nothing more) is about 20 megabit instead of the expected 70 (70 is what I get if I don't go through the WRT). During this test i have run "top" and noticed something strange. While "load average" stays under 1%, the process "ksoftirqd_CPU0" uses up more than 90% cpu time. Does this set the speed limit of the WRT to a sad 20 Mbit? I know that ordinary users don't have a 20Mbit internet connection, but I thought I could get more from a 200 MHz CPU. I have tried removing every firewall rule other than the masquerade one, and still I get 20 Mbit. What's worse is that wireless connection speed is limited to about 13 Mbit/sec, with ksoftirqd_CPU0 at 75% load. This is true even for LAN-to-WLAN transfers, without routing involved.

2- lots of times I see my SMB session die on me while I transfer the same file, so there is some pattern in that file that sistematically makes the WRT drop the (TCP, I suppose) connection. The WRT does not reboot or show any other misbehaviour, if I ping continuously, I can see pings go through, even when the SMB session dies.

If I connect the WRT just as a switch (that is, the server and the client for the tests are both connected on wired LAN ports) I get 70 MB as expected, and also I can transfer the "faulty" file without the SMB session dying, so it's not a switch chip issue.

Post #35

dsouth

9 Apr 2005, 17:00

As at datapoint, my 2.2 resets with heavy scp/rsync transfers between
two wireless hosts (b to b). Heavy transfers from wan to wireless or
vice-versa are fine, it's just the wireless to wireless lan stuff that causes
a reset.

I have another 54g in the box, so I should _really_ get around to
trying some experiment son the second router. [So many projects, so
little time...]

--
Dale

Post #36

TheRoDent

12 Apr 2005, 12:00

I think I may have found an ugly workaround.

I insmod et.o let it configure the broadcom switch, and then rmmod it again. This is the new 3.60 driver

Then, I insert the old ethernet driver v 3.50 (which I renamed to old_et.o) do the normal vconfig stuff again and it actually works. :shock: Things are pingable etc...

I'm now doing some nfs based compiles again, to see if I can get the same driver oops as I did with the 3.60 driver.

Summary: Use the new driver to initialize the switch, remove it, and then use the old driver.

Post #37

jobster

15 Apr 2005, 09:37

I bought a WRT54G v2.2 two weeks ago, expierienced the "hight-load & reboot" issues that people are talking about, so i convinced the store to let me exchange the wrt54g to an asus WL500g (about the same hardware as a wrt54).. same problems here.. works fine with the obsolete stable-release of openwrt and with the original firmware, but with the experimental release of openwrt it dies after just a couple of megabytes of transered data at speeds of just 1MiB/sec

Post #38

sendo

15 Apr 2005, 20:47

Well, as someone might have noticed I overread the latest posts in this thread and opened a new one concerning the high-load-issue.
I have a Asus WL-500g here and have the crashing router-problem using wlan with wpa and wep and without encryption on heavy load.

I hope someone comes with new ideas on how to fix this, 'cause I don't want to revert to the snapshot-releases

Cheers,
Sendo

Post #39

sendo

15 Apr 2005, 20:58

I think I may have found an ugly workaround.
I insmod et.o let it configure the broadcom switch, and then rmmod it again. This is the new 3.60 driver
Then, I insert the old ethernet driver v 3.50 (which I renamed to old_et.o) do the normal vconfig stuff again and it actually works. :shock: Things are pingable etc...
I'm now doing some nfs based compiles again, to see if I can get the same driver oops as I did with the 3.60 driver.
Summary: Use the new driver to initialize the switch, remove it, and then use the old driver.

Well, I'd like to test that as well, but would you mind posting the 3.50 driver in this forum?

Post #40

disq

16 Apr 2005, 07:13

i'm having the same high load & reboot problem you guys are having, but it's on a microsoft mn700 flashed with PMON (as the bootloader) and openwrt.

crashes with downloading FROM the wireless link (over wds, that is)
no crashes if i just download from the ethernet (from the local httpd, made up a cgi script that would cat /usr/bin/kismet_drone 20 times)

my two other wrt54g's (which are v2's) are fine with the same firmware though.

also sometimes i'm experiencing random reboots on the mn700 (even with no traffic) and trying to find a way to attach an UART chip to the uart header so i can get the serial port working and read the kernel panic messages. (if there are any, of course)

Post #41

sendo

17 Apr 2005, 21:31

Me again. I looked up where the et.o and wl.o come from and found out that that it comes precompiled from openwrt.openbsd-geek.de. The problem is that I do not understand with which source-code these drivers were compiled and how I can compile them on my own to test if my bandwidth-problem are a matter of them.
Would someone lend me a hand so I can test this in the experimental build?

BTW: In the meantime I compiled and tested the old snapshot and also a precompiled version of olegs WL500g-firmware, both didn't have the problem of rebooting (and as the matter of that a broken connection) on high wlan-bandwidth.

Post #42

davygrvy

20 Apr 2005, 08:43

for RADVD, please change the default pid file location from '/tmp/run' to '/var/run' please

Post #43

tosuja

20 Apr 2005, 12:06

Kurgan wrote:

1- maximum attainable speed (using 100 megabit connections) if the connection goes through the WRT (with default NAT rules and nothing more) is about 20 megabit instead of the expected 70 (70 is what I get if I don't go through the WRT). During this test i have run "top" and noticed something strange. While "load average" stays under 1%, the process "ksoftirqd_CPU0" uses up more than 90% cpu time. Does this set the speed limit of the WRT to a sad 20 Mbit? I know that ordinary users don't have a 20Mbit internet connection, but I thought I could get more from a 200 MHz CPU. I have tried removing every firewall rule other than the masquerade one, and still I get 20 Mbit. What's worse is that wireless connection speed is limited to about 13 Mbit/sec, with ksoftirqd_CPU0 at 75% load. This is true even for LAN-to-WLAN transfers, without routing involved.

To be honest, I was disappointed too with the speed of WRT. When I connected from LAN to WAN with only necessary NAT rules I've got 34Mbits (which is quite slow), when I added 512 rules for IP accounting and bandwidth shaping I was not able to get more than 13Mbit.

ksoftirqd_CPU0 CPU usage is normal since the CPU obviously has to serve lot (thousands) of interrupts from NIC's and it is weak CPU despite it's 200MHz speed. Speed is just the number, it has almost nothing to do with real performance (you see it on the desktops - AMD CPU's are as fast as Intel ones running at 1.5 times higher frequency....). This CPU is VERY simple and slow....

(Last edited by tosuja on 20 Apr 2005, 13:23)

Post #44

acidbits

20 Apr 2005, 12:17

I have some WRT54G in a Wlan (http://santafe1.dyndns.org/public/SFprov.pdf) most of them (the ones who handle the main traffic) have OpenWRT. I'm having big trouble with "Congo" (look at my network diagram), it crashes every 8-10 hours and have to reset it manually unppluging the power supply and plugging it again, It its located at the top of a Builidng. Don't know exactly if it is version 1.1 or 2.0.

It crashes, it doesn't reset itself. It stops responding to anything and all the Wlan can't acces Internet.

Congo is wired to the "Xina" a Debian Linux box. Users from Wlan have acces to Internet through it and a sort of FTP account linked with a multiuser P2P. India's P2P is used by Wlan users to download stuff to the server, then they FTP it to their computer through the WLAN.

Yesterday I did some testings and seems like I found what could be the problem. I thought that the problem could be that India is sending much more traffic that Congo can handle and send to the WLAN. What I did is to limit the bandwith Xina sends to Corea to 500 Kbytes/sec. Went to different users' place and started to download. I transfered over 2 GB and it didn't crash. Firsts users I tried had a good link till Congo so they downloaded at full speed (500 Kbytes/sec).

Went to a user who has a poor link between his WRT54G and "India" (the main AP for users), there I only could download at 300 KBytes/sec. So Xina was sending traffic at 500 KBytes/sec and user was recieving at 300 KBytes/sec, Congo was recieving more traffic that it could forward till user. After a while Congo crashed.

This happened before with Sveasoft's firmware but not so often, maybe 2-3 times per month. Could you point me to some "stable" OpenWRT firmware that I could use at Congo?

Hope it helps somehow,

aCiDBiTS

(Last edited by acidbits on 20 Apr 2005, 13:24)

Post #45

_marc_

21 Apr 2005, 11:50

tosuja wrote:

ksoftirqd_CPU0 CPU usage is normal since the CPU obviously has to serve lot (thousands) of interrupts from NIC's and it is weak CPU despite it's 200MHz speed. Speed is just the number, it has almost nothing to do with real performance (you see it on the desktops - AMD CPU's are as fast as Intel ones running at 1.5 times higher frequency....). This CPU is VERY simple and slow....

yes, I can say this too, because I have writen a while(...) loop in interpreted language (ember). On my linux desktop, 1Million can be reached in 1 second, but with a linksys 200Mhz CPU, I can loop only between 1000 to 10,000 in one second.

May be we could run a MIPS bench (nsieve) or bogomips.

Post #46

Kurgan

22 Apr 2005, 09:23

Acidbits,

These crashes under heavy load are a mistery which I can't understand, but we have some clues.

You have a 54G that crashes when the trafic coming in from the LAN comes at a faster rate than it goes out of the WLAN. Maybe that's an overflow in some buffer in the Broadcom driver that makes the router crash because it leaks out of its allocated space and overwrites something? We should maybe think of it... maybe it's not the trafic itself, but the difference between "incoming" and "outgoing" speed.

A frend of mine has a 54G (don't know the revision) with original firmware that crashes when I download something from the LAN to the WLAN, and this seems to be your same issue (trafic comes in from the LAN at more than 70 Mbit/sec, and goes out of the WLAN at no more than 10 Mbit/sec). I have not tried reversing the traffic direction, uploading from the WLAN to the LAN. If the "buffer full" theory is right, this way it should not crash. I will try to test it as soon as I can.

I also have two 54GS v1.1 (with Broadcom chips) with experimental OpenWrt that I use for tests, and I have experienced low speed with high cpu load, and even two disconnections in SMB session while testing (maybe packets get lost and the SMB protocol gets badly confused? I should TCPdump everything and then go through it), but not a single (visible) glitch in the WRTs. No reboots, no kernel messages, nothing on the serial console. This setup seems not to have the supposed "buffer problem", even if it uses the same experimental build that other users say crashes frequently under heavy load.

Maybe the issue is caused by a combination of new driver (experimental) and old hardware?

We still have lots of questions and no answers.

Post #47

acidbits

22 Apr 2005, 11:04

I'm not a linux guru and haven't enough skills to tarce & serach for the bug. The WRT now is been running for 2 days without crashing because I disabled the "FTP service" that produced high traffic (LAN->WLAN) and crashed it (even having it limited at 500 KBytes/sec).

Users want that service so I'll have to put again Linksys firmware till bug is fixed. If someone has a provisional fix or an experimental build that fix this I can be your tester ... Right now I'm going to try the 1024 MTU solution, I've set MTU to 1024 in "both sides of the wire", at linux server and WRT.

I'm not an expert on traffic but agree that something is happening with buffers or droping packets when buffer are full. It seems to work right for a while, after a certain time (some minutes 10-20) it crashes.

(Last edited by acidbits on 22 Apr 2005, 11:14)

Post #48

Kurgan

23 Apr 2005, 07:58

Acidbits,

What is the model and hardware revision of you crashing WRT? I'd like to see if my friend's and your WRT have the same HW revision.

If it's an old one (which it should be, or else it should not work with stable version of OpenWrt) maybe you could try using the old ethernet driver with the unstable openwrt, as explained before in this thread. You even don't need to load the new driver first and then the old one, because you have the old chipset so you don't need the new driver to set up the vlans in the switch.

If you try this, anyway, you'd better have a serial port, because you could easily break ethernet connection.

(Last edited by Kurgan on 23 Apr 2005, 07:59)

Post #49

acidbits

23 Apr 2005, 14:30

The WLan has 25-30 WRT54G, since it worked really bad and decided to switch to openwrt now it is in "testing" process. Everytime we try some modifications we move them from place to place, replace them and really don't know wich one is it, Some of them are placed at the top of buildings and is dangerous to reach them. Will write down exactly what we have when everything works right. The oldest version I remember to see is 1.1.

Tested the "mtu 1024" and also crashed, next step is to install linksys firmware to see hoe it works.

Post #50

dsouth

26 Apr 2005, 00:40

No specfic info, but the 24 Apr experimental release is "crash-o-matic" on
my v2.2 g. Runs fine as long as I don't do anything, but any amount of
serious traffic across the wireless causes it to lock up.

I'll try running some regression tests on my non-production wrt. [Both
are v2.2 hardware, so even the production one has to run either experimental
or therodent.]

Dale

Page 2 of 3