Jim, I'd try playing with the latency timer. I think we've shot my previous two theories out of the water. However, buffering between the PC and dongle can alter timing and sometimes accounts for behavioral difference between dongles.
Separately, I like Dan's idea to try sending larger amounts of data at once. It may provide more data if nothing else. Like Dan said, there's very little guarantee small sleep values are accurate or efficient. There are even fewer guarantees that the timing you use to submit each byte in the application is preserved by the USB dongle. Data takes a much more complex path than if you were to use a HW UART. It should be a lot closer in that case.
-Nathan