I'm posting this information here in case it's useful to anyone else.
I was having reliability problems with Soekris boxes running Kamikaze 7.09, with miniPCI atheros cards, making WPA2 client connections to a Cisco 877W access point. Quite often they would not establish the connection, and needed wpa_supplicant restarting by hand to make them work. Sometimes they would fail after a period of time too.
In summary, adding the following options to /etc/config/wireless seems to make things work much better:
option agmode bg
option bgscan 0
A bit more detail: I had three Soekris boxes on my desk for testing, one with a Compex bg card and two Aries abg cards. WIth a number of power cycles I found that quite often the unit with the bg card seemed to come up when the two abg units didn't. Disabling 802.11a (using "option agmode bg") seemed to improve things a lot.
I think that's what's happening is that wpa_supplicant's scan for access points is conflicting with the Atheros' own AP scanning. I modified /lib/wifi/madwifi.sh to log wpa_supplicant's output to a file. This meant I could compare the failing authentication with a successful one (after restarting wpa_supplicant)
When it was failing, it got into an infinite loop of trying to associate and then disconnecting:
$ grep -- 'State.*->\|[0-9] of [0-9]\|[0-9]/[0-9]' wpa.log.failing | head -30
State: DISCONNECTED -> SCANNING
State: SCANNING -> ASSOCIATING
State: ASSOCIATING -> ASSOCIATED
State: ASSOCIATED -> DISCONNECTED
State: DISCONNECTED -> 4WAY_HANDSHAKE
WPA: RX message 1 of 4-Way Handshake from 00:17:df:11:f5:91 (ver=2)
RSN: msg 1/4 key data - hexdump(len=22): dd 14 00 0f ac 04 37 97 bd 1f ad 6a 80 51 53 19 75 79 2f fd 74 e7
WPA: WPA IE for msg 2/4 - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 02 00 00
WPA: Sending EAPOL-Key 2/4
State: 4WAY_HANDSHAKE -> 4WAY_HANDSHAKE
WPA: RX message 1 of 4-Way Handshake from 00:17:df:11:f5:91 (ver=2)
RSN: msg 1/4 key data - hexdump(len=22): dd 14 00 0f ac 04 37 97 bd 1f ad 6a 80 51 53 19 75 79 2f fd 74 e7
WPA: WPA IE for msg 2/4 - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 02 00 00
WPA: Sending EAPOL-Key 2/4
State: 4WAY_HANDSHAKE -> 4WAY_HANDSHAKE
WPA: RX message 1 of 4-Way Handshake from 00:17:df:11:f5:91 (ver=2)
RSN: msg 1/4 key data - hexdump(len=22): dd 14 00 0f ac 04 37 97 bd 1f ad 6a 80 51 53 19 75 79 2f fd 74 e7
WPA: WPA IE for msg 2/4 - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 02 00 00
WPA: Sending EAPOL-Key 2/4
State: 4WAY_HANDSHAKE -> ASSOCIATING
State: ASSOCIATING -> DISCONNECTED
State: DISCONNECTED -> SCANNING
State: SCANNING -> ASSOCIATING
State: ASSOCIATING -> DISCONNECTED
State: DISCONNECTED -> SCANNING
State: SCANNING -> ASSOCIATING
State: ASSOCIATING -> DISCONNECTED
State: DISCONNECTED -> SCANNING
State: SCANNING -> ASSOCIATING
State: ASSOCIATING -> DISCONNECTED
...etc
Looking in more detail it appears to fail here:
RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP])
Wireless event: cmd=0x8b06 len=8
RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP])
Wireless event: cmd=0x8b04 len=12
RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP])
Wireless event: cmd=0x8b1a len=29
RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP])
Wireless event: cmd=0x8b19 len=8
Scan results did not fit - trying larger buffer (8192 bytes)
Received 4331 bytes of scan results (21 BSSes)
Scan results: 21
Selecting BSS from priority group 0
0: 00:19:07:c5:53:20 ssid='XXXXXXXX' wpa_ie_len=30 rsn_ie_len=28 caps=0x11
skip - SSID mismatch
1: 00:19:07:c5:53:23 ssid='XXXXXXXX' wpa_ie_len=30 rsn_ie_len=28 caps=0x11
skip - SSID mismatch
2: 00:19:07:c5:53:24 ssid='XXXXXXXX' wpa_ie_len=30 rsn_ie_len=28 caps=0x11
skip - SSID mismatch
3: 00:17:df:11:f5:91 ssid='PrivateSSID' wpa_ie_len=0 rsn_ie_len=22 caps=0x11
selected based on RSN IE
Already associated with the selected AP.
Authentication with 00:00:00:00:00:00 timed out.
Added BSSID 00:17:df:11:f5:91 into blacklist
State: ASSOCIATING -> DISCONNECTED
Given that something strange seemed to be going on with AP scanning, disabling the background AP scan (option bgscan 0) seemed to be a sensible thing to do as well, since these are non-mobile clients. I now can't replicate the problem on my desk by rebooting the clients or the AP, although that doesn't mean it's gone away completely!
Regards,
Brian.