OpenWrt Forum Archive

Topic: White Russian 0.9 on WRT54GL often "crashing" req. reboot

The content of this topic has been archived between 6 Apr 2018 and 4 May 2018. Unfortunately there are posts – most likely complete pages – missing.

gmayer wrote:

Good to see a second developer here. Thanks for the info mbm, though I still don't understand why the sdram_ncdl timing values would be different for the same router, model and revision. Also why doesn't one just set it to zero to be safe?

Perhaps I was unclear -- you can set the value to 0 to for automatic calibration, but it doesn't stay at 0; the value gets overwritten by the new calibration.

Is the /etc/init.d/S05nvram you mentioned any different from nbd's nvram-fixup.sh? Cause the latter actually makes no difference on one of my production boxes, in fact it now seems to reboot more often than before (I've since reverted to my previous config of course), though it only touched the pa0b<n> and pa0maxpwr variables.

They're variations on the same concept; nbd's above post was to determine if we need to add more to the S05nvram script in the next release. The pa0 variables control the wifi power levels.

I do use standard configuration images that were created on one specific router and then applied to a whole lot of others (makes installation and configuration a hell of a lot easier), but scanning through them in a hex editor yields no reference to the sdram_* and pa0* parameters so I think this is unlikely to be the problem. Or am I mistaken?

I'm not talking about the firmware image (kernel, filesystem), I mean the actual nvram content should not be copied from one device to another since as pointed out above, some values are unique to that exact device.

Ok, cool, never mind about setting sdram_ncdl to 0. The question remains: Why would they all be different on identical hardware?

So your (I presume svn trunk) /etc/init.d/S05nvram is a variation of the same concept, but are there any settings in there other than pa0* that could be relevant to the problems that we're seeing? It doesn't hurt to try...

When I said "configuration images" I meant the images created using the Backup feature of DD-WRT (sorry, I'm not on OpenWRT yet, reason I'm using this forum is that there's a lot more help and expertise here). What it does essentially is to take a snapshot of the relevant nvram settings that the firmware uses upon boot up and operation to run the router as you intend to. Afaics it does not backup the entire nvram, just the firmware specific values, but I guess I should go and find out what exactly gets backed up and restored...

gmayer wrote:

When I said "configuration images" I meant the images created using the Backup feature of DD-WRT (sorry, I'm not on OpenWRT yet, reason I'm using this forum is that there's a lot more help and expertise here).

Well, that just screwed things up.

I want to make it ABSOLUTELY clear that these forums are for issues relating to OpenWrt ONLY. You're perfectly welcome to read all about OpenWrt but I must ask that you refrain from posting any information which may contaminate our efforts. We do not want to waste several days debugging only to throw out all our work and start over because we were given false information about the problem. As I said before, just because the symptoms are the same doesn't mean we're dealing with the same bug.

sad

mbm wrote:
lc wrote:

Hi mbm,

thanks for the explanation. What about the crashes that only occur when wireless radio is turned on? Is this an sdram issue?

Actually that may a problem with the power, turning on the wireless increases the draw on the power supply and may push the limits of what the power supply can provide. Try replacing the power adapter.

I use the standard Linksys power supplies. They supply 1000mA (12 VA) at 12 Volts (this is what the label says). The latest shipments included new power supplies. Those are switching power supplies (old ones are linear), but the output specs are the same. So far I only have "old" power supplies on the sites - I will replace a few. Maybe this is the solution - let's hope for the best. I will keep the forum updated.

To summarise/iterate my own experiences:

From the tests we have carried out, the rebooting issue occurs on the Buffalo WHR-G54S, Belkin F5D7231-4P and WRT54GL v1.1 (the WRT54GL v1.1 often crashes rather than reboots). Both Openwrt and DD-WRT seem to have the same problem, I have tested both firmware's on all the above routers and similar rebooting occurs.

When Wi-Fi is disabled and a separate access point is connected to one of the routers ethernet ports, both Openwrt and DD-WRT are completely stable.

In all cases, I have NOT copied NVRAM settings between routers.

I have also tested the default Linksys firmware on a WRT54GL v1.1, this also rebooted in a similar way.


If the power supply limits are possibly being reached when wireless is turned on, what would be the best pa0/nvram settings to minimise the current draw. We have used a variety of wl0_txpwr settings, but the rebooting issue still occurs.

(Last edited by kebab on 10 Apr 2007, 20:04)

kebab wrote:

To summarise/iterate my own experiences:

From the tests we have carried out, the rebooting issue occurs on the Buffalo WHR-G54S, Belkin F5D7231-4P and WRT54GL v1.1 (the WRT54GL v1.1 often crashes rather than reboots). Both Openwrt and DD-WRT seem to have the same problem, I have tested both firmware's on all the above routers and similar rebooting occurs.

When Wi-Fi is disabled and a separate access point is connected to one of the routers ethernet ports, both Openwrt and DD-WRT are completely stable.

In all cases, I have NOT copied NVRAM settings between routers.

I have also tested the default Linksys firmware on a WRT54GL v1.1, this also rebooted in a similar way.


If the power supply limits are possibly being reached when wireless is turned on, what would be the best pa0/nvram settings to minimise the current draw. We have used a variety of wl0_txpwr settings, but the rebooting issue still occurs.

I use pa0maxpwr to regulate tx-power. 0x20 is a value I use often, also in environments that reboot. From what I see I think it is not so much the fact that radio itself is turned on but the radio activity. It doesn't matter how many people are associated - I see a site serving 10-15 clients without problems and I see the same site reboot with only 2-3 associated clients (of course it also happens with more clients, but since it cannot be made a rule I am pretty sure it is independent from the number of clients).

@lc:
I think the new power supplies are worth a try. If this does not work for you, do you have another universal power supply with more than 1A at 12V or can you get one? Maybe the quality of the original power supplies is not the best and they don't supply a solid current flow at the higher ampere rates.

@kebab:
I'm sure that reducing transmit power hasn't got any significant effects on the overall power usage, altough enabling the wireless hardware itself has. One of my crashing access points has a transmit power of 13 dBm because I use 7 dBi antennas here, 13 dBm is about 20mW (http://home.in.tum.de/~prilmeie/wlan/db-umrechnung.php). If I recall correctly the power supplies are DC, so we can use Ohm's law and have a maximum of 12W output at all, I don't think that 20mW or 100mW transmit power is the problem. If there is a problem with the power supplies, it's more likely that it has to do with stable current flow at the upper level.

I'm not 100% convinced this is a power supply problem, but I am more than happy to be proved wrong.

The reason I am not convinced, is that I have seen the problem with 3 different manufacturers routers:

Linksys WRT54GL v1.1
Buffalo WHR-G54S
Belkin F5D7231-4P

don't think it is the power supply since I use lots of wrt54gl v1.1 units, some with their normal Linksys wall adapter, and some with a homebrew battery based power supply; all are rock stable. Have not noticed differences relative to v1.0 and 54g v3.1 and 2.2 and 2.0 other than the radio being more sensitive in the models since 3.1.
Moreover the power setting does very little to over-all power consumption; you are talking tens of milliwatts there, relative to several watts of total consumption.
Have you tried removing all unessential applications ? E.g. I don't use webif.
Could it be the unit running out of resources during associations while also having to deal with lots of traffic ?

I'm not convinced about this power thing either, and kebab mentioned the best point why it likely not the case, but I think we should try everything to find the cause.
Most of the evidences we found point towards the binary only module and some special traffic patterns, the dumps from the kernel panics, doddel's installation without any probs where Linksys routers just talk to other Linksys routers, etc.

Btw, I already use a custom build of White Russian done with Imagebuilder without webif and other unneeded packages.

I have very similar problems with different routers/chipsets/manufacturers/firmwares - but reboots only happens with WPA-encryption.

Exept the very first posts in this thread nothing is said about encryption - so maybe the problem is WPA related to special clients.
/ropf

(Last edited by ropf on 11 Apr 2007, 18:41)

I have tested with firmware images downloaded from the Openwrt web site and with custom firmware images with non-essential applications removed. With the custom firmware installed, free RAM was in excess of 5MBytes and load average was in the range of 0.00 to 0.10. The reboot/crash problem still occured, regardless of which images I used, so it has proven difficult to pin the cause down to resource usage issues.

@ropf:
I don't use encryption on any of my access points.

Ok, enough theories and speculation; let's try something else.

Can anyone find a way to easily reproduce the problem?

The closest I've come to replicating the error is as follows, but this is not always reliable:

1. Switched on my PDA and connect to the WRT54GL 1.1, my PDA is a HP Ipaq running an 802.11b wireless card.
2. Switched on my PC and connect to the WRT54GL 1.1, I use a Netgear WG111v2 usb wireless adapter with my PC.
3. I then browse the internet/check email for 3 or 4 minutes.
4. Both PDA and PC lose wireless connection as router either reboots or crashes.

Only the PDA and PC are connected to the WRT54GL 1.1, when I test with this scenario.


This may be a red hearing, but most of my sites have a mixture of 802.11g and 802.11b clients connecting to them. My test network typically has old Netgear MA111 (802.11.b) clients and newer Netgear WG111 (802.11g) clients connecting to it. Could this mix be causing our problem ?

The point is to come up with a way that we can easily reproduce the error here to gather more information about it, hence we need as generic of an example as possible.

mbm wrote:

The point is to come up with a way that we can easily reproduce the error here to gather more information about it, hence we need as generic of an example as possible.

Unfortunately I am not able to reproduce the issue. I can see it is happening on various locations, therefore I can tell that load/number of clients/uptime/amount of available memory has nothing to do with it. I once had a unit with a serial port on a site to get the console output. A small selection of the results are in this thread. I have spent weeks to find the exact reason and came to the same conclusion as others: once radio is turned off and an external AP is used, the WRTs run rock-stable. To be more concrete:  It is not a matter whether radio is on or not - it is a question whether it is being used or not. What I mean is I have a WRT which bridges certain ports of the switch with eth1. Other APs are connected to it though the switch. Clients associate with those APs and almost never to the WRT. As a result this WRT runs like a charm without any problems.

That's why in the end we tried (unsuccessfully) to play with other wl.o versions as we thought that certain radio conditions cause wl.o to mess up memory. Especially the reports that this issue also occurs with linksys stock firmware point into this direction.

What I can offer is to put my WRT with the serial port back together with a notebook on a site which usually reboots 2-3 times a day (student place, high frequency of various clients). Maybe you have a program which writes kind of "system trace" to the console to find out what causes the memory corruption.

So far kebab is the only one who found a procedure to make the issue occur on purpose. This is not very much to find and fix it, but it is already much much better than what most of us achieved.

(Last edited by lc on 11 Apr 2007, 21:28)

Has anyone else resolved the problem? We have in our University 15 WRT54GL and 4 WRT54G with High gain antennas, all with whiterussian 0.9 working as bridges to our LAN, they have only the basic to function, and logging to an external server.

Our problem is the same as other people here, they crash without apparent reason, and some of them reboot every few minutes. We have tried everything (from RTS/CTS settings, noise, range,  B/G to only B standard, etc) until we ended up reading this thread and noticed that some people have the same problem.

We are desperate because our users can't have a steady connection for more than a few minutes when we have high loads (20 and up (aprox) clients per AP)

Any help would be deeply appreciated!!!

As a Solaris sys-admin, when I get a problem with a server panicking or hanging the first thing I am asked by Sun is where is the crash-dump file?

Is there some similar facility in OpenWRT we could make use of?  Even if it involved fitting a GL with an SD-card mod that might be useful.

I have a WRTSL54GS running as my house-router with v0.9 and it's been rock-solid with weeks of uptime and both B and G clients in the house.

(Last edited by vincentfox on 16 Apr 2007, 06:21)

You can find some kernel dumps or links to corresponding dumps in this thread at the first pages.

lc wrote:

I once had a unit with a serial port on a site to get the console output. A small selection of the results are in this thread.

lc: I can't find your post with the serial console output, could you repost?  I have many WRTs with the same problem but none with serial console, what we need is the kernel oops message to find _where_ the crash is happening, I think your serial console is key now or if we could find a proper way to easily trigger the problem as developers want.

lc: waiting (impatiently) for your console output.

solca wrote:
lc wrote:

I once had a unit with a serial port on a site to get the console output. A small selection of the results are in this thread.

lc: I can't find your post with the serial console output, could you repost?  I have many WRTs with the same problem but none with serial console, what we need is the kernel oops message to find _where_ the crash is happening, I think your serial console is key now or if we could find a proper way to easily trigger the problem as developers want.

lc: waiting (impatiently) for your console output.

Hi! haye has pointed out the links (thanks haye!). Is there anything you can see from those kernel panic messages which goes beyond that something messed up the memory?

lc wrote:
solca wrote:
lc wrote:

I once had a unit with a serial port on a site to get the console output. A small selection of the results are in this thread.

lc: I can't find your post with the serial console output, could you repost?  I have many WRTs with the same problem but none with serial console, what we need is the kernel oops message to find _where_ the crash is happening, I think your serial console is key now or if we could find a proper way to easily trigger the problem as developers want.

lc: waiting (impatiently) for your console output.

Hi! haye has pointed out the links (thanks haye!). Is there anything you can see from those kernel panic messages which goes beyond that something messed up the memory?

Ok thanks.

The oops from candlerb is for the wlc process which maybe is another issue, same for your wifi process, so let's discard both.
Let's concentrate in the swapper process, now we need to put those through ksymoops I think, beware I am not an expert in
kernel debugging, maybe openwrt developers will give us better advice (or a firmware with full debugging and System.map).

hi there,

i have one of two wrt54gs v1.0 routers with the same symptoms: sporadic reboots.

my config:

clients -- (lan) -- wrt54gs 1.0 "A" -- (wds) -- wrt54gs 1.0 "B" -- (wds) -- asus wl-500gP "C" -- (lan) -- clients
                                       (ap)                                   (ap)                                         (ap)
                                    clients                               clients                                    clients

- most clients attach to "A"
- "A" runs pppoe, dnsmasq, firewall
- all wireless links are wpa enabled, wds and clients
- all units run whiterussian RC6

- unit "B" is solid with an uptime of 113 days
- unit "A" reboots

nothing usable in syslog...

The discussion might have continued from here.