[chirp_devel] [PATCH] [btech] Delayed retry on writing to radio in case of errornous response. Needed mostly on linux. Fixes issue #3993
# HG changeset patch # User Michael Wagner michael.wagner@gmx.at # Date 1474015004 -7200 # Fri Sep 16 10:36:44 2016 +0200 # Node ID d6ee4a5066611703778f4b341b0c8b3d5c51e1b4 # Parent a1b8b53606f6025fc1ad727331837cfc7759f178 [btech] Delayed retry on writing to radio in case of errornous response. Needed mostly on linux. Fixes issue #3993 If radio responds with the infamous '0x05', driver now retries, but delays after each sent byte. Happens mostly on linux, and on several radios of this family (found on a KT-8900R). Might also affect/fix also #3587 and #3635. Many Thanks to Pavel Milanes for his support (analysis, explainations of the driver, sharing his knowledge about this and similar bugs, helping me with python-codestyle), and for encouraging me to contribute this patch.
Michael Wagner, OE4AMW
diff -r a1b8b53606f6 -r d6ee4a506661 chirp/drivers/btech.py --- a/chirp/drivers/btech.py Sat Sep 10 11:34:01 2016 -0400 +++ b/chirp/drivers/btech.py Fri Sep 16 10:36:44 2016 +0200 @@ -21,6 +21,7 @@
LOG = logging.getLogger(__name__)
+from time import sleep from chirp import chirp_common, directory, memmap from chirp import bitwise, errors, util from chirp.settings import RadioSettingGroup, RadioSetting, \ @@ -236,7 +237,7 @@
# this var controls the verbosity in the debug and by default it's low (False) # make it True and you will to get a very verbose debug.log -debug = False +debug = True
# Power Levels NORMAL_POWER_LEVELS = [chirp_common.PowerLevel("High", watts=25), @@ -334,6 +335,9 @@ # magic string for all other models MSTRING = "\x55\x20\x15\x09\x20\x45\x4d\x02"
+# this variables controls the forced delay and retry on Linux OS mainly. Added by OE4AMW to workaround Issue 3993 +NEEDS_DELAY = False +RETRY_DELAYED = False
def _clean_buffer(radio): """Cleaning the read serial buffer, hard timeout to survive an infinite @@ -383,6 +387,11 @@ if len(data) < amount: LOG.warn("Short reading %d bytes from the %d requested." % (len(data), amount)) + # This problem can be and expression of the MCU getting stuck + # so from now own we must delay the write operations. + global NEEDS_DELAY + NEEDS_DELAY = True + LOG.debug("Delaying future writes.")
except: raise errors.RadioError("Error reading data from radio") @@ -396,11 +405,26 @@ try: for byte in data: radio.pipe.write(byte) + # Some OS (mainly Linux ones) are two fast on the serial and + # get the MCU inside the radio stuck in the early stages, this + # hits some models more than others. + # + # To cope with that we introduce a delay on the writes but only if + # we detect this problem, this was found by Michael Wagner who + # proposed a patch for it, well done. + if NEEDS_DELAY: + # 10 msec is proved to be safe, is better to be slow and right + # than fast and some times wrong. (5 msec is tested ok) + sleep(0.010)
# DEBUG if debug is True: + if NEEDS_DELAY: + LOG.debug("This write was delayed") + LOG.debug("==> (%d) bytes:\n\n%s" % (len(data), util.hexprint(data))) + except: raise errors.RadioError("Error sending data to radio")
@@ -435,9 +459,21 @@ # header validation c, a, l = struct.unpack(">BHB", block[1:5]) if a != addr or l != BLOCK_SIZE or c != ord("X"): - LOG.debug("Invalid header for block 0x%04x" % addr) + LOG.error("Invalid header for block 0x%04x" % addr) LOG.debug("CMD: %s ADDR: %04x SIZE: %02x" % (c, a, l)) - raise errors.RadioError("Invalid header for block 0x%04x:" % addr) + + global RETRY_DELAYED + if not RETRY_DELAYED: + # first try with header problems, forcing a write delay + LOG.warn("Failure occured, trying once again with delay") + RETRY_DELAYED = True + global NEEDS_DELAY + NEEDS_DELAY = True + return False + else: + # second try, now we fail. + LOG.debug("This was already a retry") + raise errors.RadioError("Invalid header for block 0x%04x:" % addr)
# return the data return block[5:] @@ -613,6 +649,16 @@ # read d = _recv(radio, addr)
+ if d == False: + # retry to get that block of data. + msg = "Previous block request failed." + msg += " Cleaning buffer and trying again." + LOG.info(msg) + _clean_buffer(radio) + d = _recv(radio, addr) + global RETRY_DELAYED + RETRY_DELAYED = False + # aggregate the data data += d
@@ -821,7 +867,16 @@
def sync_in(self): """Download from radio""" - data = _download(self) + try: + data = _download(self) + except errors.RadioError: + msg = "First download-attempt failed." + msg += " Retrying the whole procedure with delayed writes." + LOG.error(msg) + global NEEDS_DELAY + NEEDS_DELAY = True + data = _download(self) + self._mmap = memmap.MemoryMap(data) self.process_mmap()
@@ -1284,7 +1339,7 @@ if self.MODEL in ("UV-2501", "UV-5001"): vfomren = RadioSetting("settings2.vfomren", "VFO/MR switching", RadioSettingValueBoolean( - _mem.settings2.vfomren)) + not _mem.settings2.vfomren)) advanced.append(vfomren)
reseten = RadioSetting("settings2.reseten", "RESET", @@ -1569,6 +1624,8 @@ if element.has_apply_callback(): LOG.debug("Using apply callback") element.run_apply_callback() + elif setting == "vfomren": + setattr(obj, setting, not int(element.value)) elif element.value.get_mutable(): LOG.debug("Setting %s = %s" % (setting, element.value)) setattr(obj, setting, element.value)
# this var controls the verbosity in the debug and by default it's low (False) # make it True and you will to get a very verbose debug.log -debug = False +debug = True
You don't want to leave this on for all chirp users.
+# this variables controls the forced delay and retry on Linux OS mainly. Added by OE4AMW to workaround Issue 3993 +NEEDS_DELAY = False +RETRY_DELAYED = False
These being module-level means they'll live for the lifetime of the running process.
# Some OS (mainly Linux ones) are two fast on the serial and
# get the MCU inside the radio stuck in the early stages, this
# hits some models more than others.
#
# To cope with that we introduce a delay on the writes but only if
# we detect this problem, this was found by Michael Wagner who
# proposed a patch for it, well done.
if NEEDS_DELAY:
# 10 msec is proved to be safe, is better to be slow and right
# than fast and some times wrong. (5 msec is tested ok)
sleep(0.010)
10ms inter-byte latency is crazy high. I also don't like making this different on different platforms, even if we're doing it in response to a short read.
I would rather we try things like writing eight bytes followed by a delay or something like that.
if NEEDS_DELAY:
LOG.debug("This write was delayed")
This will spam everyone's logs with one line for each byte of the memory map. That's not reasonable, IMHO.
LOG.debug("==> (%d) bytes:\n\n%s" % (len(data), util.hexprint(data)))
Stray whitespace damage here.
@@ -1284,7 +1339,7 @@ if self.MODEL in ("UV-2501", "UV-5001"): vfomren = RadioSetting("settings2.vfomren", "VFO/MR switching", RadioSettingValueBoolean(
_mem.settings2.vfomren))
not _mem.settings2.vfomren)) advanced.append(vfomren) reseten = RadioSetting("settings2.reseten", "RESET",
@@ -1569,6 +1624,8 @@ if element.has_apply_callback(): LOG.debug("Using apply callback") element.run_apply_callback()
elif setting == "vfomren":
setattr(obj, setting, not int(element.value))
These hunks appear unrelated to the rest of the change?
--Dan
10ms inter-byte latency is crazy high. I also don't like making this different on different platforms, even if we're doing it in response to a short read.
I agree that this is not the best solution, but it´s worth to mention:
- as Pavel Milanes found out, the inter-byte delay is exactly what the
OEM-Software is doing ( http://chirp.danplanet.com/issues/3993#note-24 ), so obviously even the manufacturer knows that their radios are not able to handle 9600 baud (and has no better solution to deal with that error)
So then we should do it for all platforms and not do this detection logic.
- we are not doing it different on different platforms (this was the
first approach), but changed to: detect "short read" or "invalid-header"-errors, and slow down only then. This avoids the artificial delay with radios that are fast enough to handle pyserial on linux, and on the other hand fix the error also on windows-versions. (Where it may happen as well, but only very rarely).
We might as well simplify this code and not detect then, IMHO. It looks much more like ~2ms from the logs to me.
- The 10ms are indeed quite long. In a earlier version of this patch
only 5ms were used, and this was tested by Jim Unroe and other users on 5 different radios ( http://chirp.danplanet.com/issues/3993#note-22 ) and reported as OK. However, Pavel argued to use 10ms, to be safe. (The comment in the sources is his ;-) )
If the OEM software uses 10ms, then that's fine, but we should just do it exclusively. Simplicity always wins.
Stray whitespace damage here.
My I ask what exactly do you mean? (did I already mention that I am completely new to python and this project? ;-) )
It means that you touched a line that wasn't related to the change. It's common with whitespace-only lines where someone adds or removes a tab and doesn't realize it.
What is the correct process to add the fixed patch? Should I mail it as a reply to this thread? Or should I create a new patch using mercurial? (which will lead to a new thread in the mailing-list?)
New patch against current tip until something gets applied. New thread is fine, thanks.
--Dan
participants (2)
-
Dan Smith
-
Michael Wagner