This started out of the blue.
It looks like he got into the liquor cabinet.
The only additional symptoms I have are:
I had just completed a run of unit tests with a failure for SKILL_DATA. The write of the data failed trying to write 934 byte and only writing 704 bytes. So, there may have been a buffer overrun issue.
The voice command has been dropping out for some reason. A stray sound (like a cough) can turn it back on. But then it drops out again.
The battery was just recharged.
Tell me doc, will he make it? What's the matter?
Can we reassess the patient again?
Here's the state of affairs : https://youtu.be/cIR9SLpXLNw
Unsuccessful correctives:
I've upgraded the firmware using the Desktop app (1.1.7 and 1.1.9)
I've done a factory reset for both versions. The IMU calibration has been erratic, but I did get it to complete.
I need to ...
... get the bot into a stable working state
... get a valid, reliable build of the codebase
I just tested a behavior with 125 frames, which uses up the 2507 bytes allowed by the bufferLen. You need to modify the condition
|| cmdLen >= BUFF_LEN
to
|| cmdLen > BUFF_LEN
or download the latest code on GitHub and upload it via Arduino IDE. The Desktop app won't update very frequently.
The test file can be downloaded at: https://github.com/PetoiCamp/OpenCat/blob/main/SkillLibrary/Bittle/longPushUps_125frames.md
A longer skill may lead to a program crash.
Part 3
More Tests
An additional test is to remove the dependence on the order of precidence of the logical operators by adding parentheses to the expression. This insures the compiler sees the expression as we intend. My assuption is the actual intent was for the expression to be
bool is_overflowex(int8_t skill_token = T_SKILL) { bool result = ( (skill_token == token || T_INDEXED_SIMULTANEOUS_ASC == lowerToken || T_INDEXED_SEQUENTIAL_ASC == lowerToken) && (cmdLen >= spaceAfterStoringData || cmdLen >= BUFF_LEN)); return result; }
This shouldn't produce any new results
TEST_F(utfinput_overflow, on_inputex_SKILL_DATA_overflow_undetected) { token = T_SKILL_DATA; spaceAfterStoringData = BUFF_LEN / 2; cmdLen = spaceAfterStoringData + 1; EXPECT_TRUE(is_overflowex()) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; } 1>[ RUN ] utfinput_overflow.on_inputex_SKILL_DATA_overflow_undetected 1>G:\AppDev\Robotics\Petoi\test\utOpenCatEsp32\readoverflow.cpp(120): error : Value of: is_overflowex() 1> Actual: false 1>Expected: true 1>Overflow undetected for: 1> K : token 1> 17 : cmdLen 1> 16 : spaceAfterStoringData 1> 1>[ FAILED ] utfinput_overflow.on_inputex_SKILL_DATA_overflow_undetected (1 ms)
The test of T_SKILL should detect the oveflow
TEST_F(utfinput_overflow, is_overflowex_SKILL_overflow_detected) { token = T_SKILL; spaceAfterStoringData = BUFF_LEN / 2; cmdLen = spaceAfterStoringData + 1; EXPECT_TRUE(is_overflowex()) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; } 1>[ RUN ] utfinput_overflow.is_overflowex_SKILL_overflow_detected 1>[ OK ] utfinput_overflow.is_overflowex_SKILL_overflow_detected (0 ms)
These tests demonstrate adding parentheses doesn't fix the defect. The T_SKILL == token comparison cripples the expression.
As a last test, I removed the exception checks and performed a simple overflow test
ool is_overflow_simple() { bool result = BUFF_LEN <= cmdLen; return result; } TEST_F(utfinput_overflow, is_overflow_simple_SKILL_DATA_overflow_detected) { token = T_SKILL_DATA; spaceAfterStoringData = BUFF_LEN / 2; cmdLen = BUFF_LEN; EXPECT_TRUE(is_overflow_simple()) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; } TEST_F(utfinput_overflow, is_overflow_simple_SKILL_overflow_undetected) { token = T_SKILL; spaceAfterStoringData = BUFF_LEN / 2; cmdLen = BUFF_LEN; EXPECT_TRUE(is_overflow_simple()) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; } 1>[ RUN ] utfinput_overflow.is_overflow_simple_SKILL_DATA_overflow_detected 1>[ OK ] utfinput_overflow.is_overflow_simple_SKILL_DATA_overflow_detected (0 ms) 1>[ RUN ] utfinput_overflow.is_overflow_simple_SKILL_overflow_undetected 1>[ OK ] utfinput_overflow.is_overflow_simple_SKILL_overflow_undetected (0 ms)
This implies that the test of spaceAfterStoringData should be performed after reading the command. The read_seriial function only insures the raw command is completely read. I would also make the argument that read_serial should return a boolean result.
Conclusion
This ends the first part of the code analysis. I think this demonstrates the existence of the defect in the code. I said this wasn't definiative because there's an accomplice to this defect. Demonstrating that a buffer overflow might happen isn't the same as finding where it does happen. Luckily we know the name of the accomplice: spaceAfterStoringData. Part 2 will focus on tracking him down and finding where the buffer overrun happens.
In the conclusion I'll propose code changes to resolve the issue.
Part 2
Let's write a test
Before I spend energy and brain cells on code analysis, I want to write a test to veify my analysis. GTest makes this fairly easy to write. The test I want checks if the conditional statement
if ( (token == T_SKILL || lowerToken == T_INDEXED_SIMULTANEOUS_ASC || lowerToken == T_INDEXED_SEQUENTIAL_ASC) && cmdLen >= spaceAfterStoringData || cmdLen >= BUFF_LEN) { //... }
evaluates to true when a buffer oveflow would occur and false otherwise. The function is trivial to write
bool is_overflow() { bool result = ( (T_SKILL == token || T_INDEXED_SIMULTANEOUS_ASC == lowerToken || T_INDEXED_SEQUENTIAL_ASC == lowerToken) && spaceAfterStoringData <= cmdLen || BUFF_LEN <= cmdLen); return result; }
Style: I prefer comparison expressions operators to be left to right, smaller to larger, constants on the left;. logical expressions on separate lines.
The function uses global variables token, lowerToken, spaceAfterStoringData, and cmdLen, so we define and initialize them along with the constants they use.
const uint32_t BUFF_LEN{ 32 }; const int8_t T_SKILL{ 'k' }; const int8_t T_SKILL_DATA{ 'K' }; const int8_t T_INDEXED_SIMULTANEOUS_ASC{ 'i' }; const int8_t T_INDEXED_SEQUENTIAL_ASC{ 'm' }; const int8_t T_QUERY{ '?' }; int8_t token{ T_SKILL_DATA }; int8_t lowerToken{ int8_t(tolower(token)) }; uint32_t cmdLen{}; uint32_t spaceAfterStoringData{ BUFF_LEN };
We wrap the function and variables into a class, called a test fixture, that our test willl use.
class utfinput_overflow : public testing::Test { protected: int8_t token{ T_SKILL_DATA }; int8_t lowerToken{ int8_t(tolower(token)) }; uint32_t cmdLen{}; uint32_t spaceAfterStoringData{ BUFF_LEN }; public: utfinput_overflow() {} ~utfinput_overflow() {} bool is_overflow() { bool result = ( (T_SKILL == token || T_INDEXED_SIMULTANEOUS_ASC == lowerToken || T_INDEXED_SEQUENTIAL_ASC == lowerToken) && spaceAfterStoringData <= cmdLen || BUFF_LEN <= cmdLen); return result; } };
The class is derived from a class that GTest provides to handle the interface to the GTest framework. We aren't concerned about the details for that class here but we need it to connect our test to the fixture. The test is trivial
TEST_F(utfinput_overflow, SKILL_DATA_overflow_undetected) { token = T_SKILL_DATA; lowerToken = int8_t(tolower(token)); spaceAfterStoringData = BUFF_LEN/2; cmdLen = spaceAfterStoringData + 1; EXPECT_TRUE(is_overflow()) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; }
We build and run the test and get the output
1>[ RUN ] utfinput_overflow.SKILL_DATA_overflow_undetected 1>G:\AppDev\Robotics\Petoi\test\utOpenCatEsp32\readoverflow.cpp(70): error : Value of: is_overflow() 1> Actual: false 1>Expected: true 1>Overflow undetected for: 1> K : token 1> 17 : cmdLen 1> 16 : spaceAfterStoringData 1> 1>[ FAILED ] utfinput_overflow.SKILL_DATA_overflow_undetected (0 ms)
This confirms our analysis that the expression doesn't do it's job. The T_SKILL_DATA command can produce an undetected buffer overflow. It also confirms our analysis of the order of precedence for the logical operators. That's progress
When we test using the expected T_SKILL token, the test passes.
TEST_F(utfinput_overflow, SKILL_overflow_detected) { token = T_SKILL; spaceAfterStoringData = BUFF_LEN / 2; cmdLen = spaceAfterStoringData + 1; EXPECT_TRUE(is_overflow()) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; } output: 1>[ RUN ] utfinput_overflow.SKILL_overflow_detected 1>[ OK ] utfinput_overflow.SKILL_overflow_detected (0 ms)
However, the T_SKILL caommand uses a named, built-in skill and doesn't have variable data, This test case should never happen. If it does happen, it's detected but is a false-postive and means that spaceAfterStoringData isn't being maintained properly. That observation will be useful later on.
The test is helpful but could be more helpful and flexible if we could control the token value being sought in addition to the input token.This is easily accomplished by adding a default token value to the is_overflow function.
bool is_overflow(int8_t skill_token = T_SKILL) { bool result = ( (skill_token == token || T_INDEXED_SIMULTANEOUS_ASC == lowerToken || T_INDEXED_SEQUENTIAL_ASC == lowerToken) && cmdLen >= spaceAfterStoringData || cmdLen >= BUFF_LEN); return result; }
The current tests are unaffected and should behave as before but we can now test what would appear to be the obvious fix: change the test expression to use T_SKILL_DATA
TEST_F(utfinput_overflow, SKILL_DATA_overflow_detected) { token = T_SKILL_DATA; spaceAfterStoringData = BUFF_LEN / 2; cmdLen = spaceAfterStoringData + 1; EXPECT_TRUE(is_overflow(T_SKILL_DATA)) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; } output: 1>[ RUN ] utfinput_overflow.SKILL_DATA_overflow_detected 1>[ OK ] utfinput_overflow.SKILL_DATA_overflow_detected (0 ms)
That's great, right? Not exactly. We need to look at how T_SKILL behaves
TEST_F(utfinput_overflow, SKILL_overflow_undetected) { token = T_SKILL; spaceAfterStoringData = BUFF_LEN / 2; cmdLen = spaceAfterStoringData + 1; EXPECT_TRUE(is_overflow(T_SKILL_DATA)) << "Overflow undetected for:\n" << setw(4) << token << " : token\n" << setw(4) << cmdLen << " : cmdLen\n" << setw(4) << spaceAfterStoringData << " : spaceAfterStoringData\n" ; } output: 1>[ RUN ] utfinput_overflow.SKILL_overflow_undetected 1>G:\AppDev\Robotics\Petoi\test\utOpenCatEsp32\readoverflow.cpp(106): error : Value of: is_overflow(T_SKILL_DATA) 1> Actual: false 1>Expected: true 1>Overflow undetected for: 1> k : token 1> 17 : cmdLen 1> 16 : spaceAfterStoringData 1> 1>[ FAILED ] utfinput_overflow.SKILL_overflow_undetected (1 ms)
"Well." sez me, "that's not really a problem cuz T_SKILL doesn't have variable data so there won't be an overflow."
Disturbia
Maybe the programmer didn't make a typing error by using the value T_SKILL. Maybe they did it for a reason that we haven't discovered yet. There may be a case where the T_SKILL command uses the spaceAfterStoringData value.
We don't need to speculate about that because there are other commands that can have variable number of parameters that aren't included in the expression. In addition to T_INDEXED_SIMULTANEOUS_ASC and T_INDEXED_SEQUENTIAL_ASC there are matching binary versions that can also take a variable number of parameters. How many for all of them? It's not 16 but an arbitrary number, i.e., you can repeat the jointindex as long you wish. There's BEEP and BEEP_BIN. The T_JOINTS command takes a list of joints that can be arbitrarily long and have repeats as well, e.g., j 0 8 9 10 0 8 9 10 0 8 9 10 ... Not to mention any new commands added later on.
The point is the fragile nature of "exception coding" like that of the conditional expression.
More Disturbia.
What's the intent of the statement? We're trying to prevent buffer overflow when reading a command. Does it really matter what the command is that overflows the buffer? The T_READ command should only accept one pin number to read from but that doesn't mean you can't send a long list of pins. I don't know what it does with such a list because my test only checked a single pin but I don't expect it to "brick" the bot.
Up until now I've ignored what happens when a buffer overflow condition is detected. What happens is
if ( /* detect overflow */ ) { PTF("OVF"); beep(5, 100, 50, 5); do { serialPort->read(); } while (serialPort->available()); printToAllPorts(token); token = T_SKILL; strcpy(newCmd, "up"); cmdLen = 2; return; }
This doesn't look too bad until you look closer and think about it. Remember, this all happens as the sender is trying to send data and you're trying to read the input.
The PTF macro does I/O to the serial monitor, which is slow and uses the same line that the USB is on that the sender is writing to. I beleive the USB is full-duplex (can send an receive at the same time) but it is the speed that is of more concern. It's the call to beep that boggles me:
beep(5, 100, 50, 5)
says, "Play the note 5 for 100 ms, wait for 50 ms, and do that 5 times." That's 150 ms * 5 = 750 ms! Three quarters of a second! This is 5 times longer than our largest timeout for reads. If we timeout after 150 ms, what would you expect to happen on the sender side if we delay for 750 ms?
It's only at this point do we drain the input buffer; we probably should have done that first.
The command response is sent out by echoing the token to the serial port.
Finally, for added emphasis that something has gone wrong, the skill command for "stand up" is placed into the command buffer for subsequent processing by reaction(). From my POV, out of all the poses to take to indicate a problem, "stand up" isn't the one I'd select. "Rest" I would think might be more indicative. However, I don't like doing anything you weren't told to do. Doing nothing would be a safer course of action.
end Part 2
Preface: This is a 3 part post. You should leave now or settle in.
Part 1
Corrupted memory location
Before analyzing the code, I wanted to get an idea of the kind of memory that would be corrupted by a buffer overflow.This might help guage the severity and behavior symptoms. The definition of the buffer newCmd is
char *newCmd = new char[BUFF_LEN + 1];
In my experience, dynamic memory corruption is the diagnostic "kiss of death". Without a debugger, unless you're an expert on memory layout, et al., it's fruitless to try to trace. As the saying goes, "And it annoys the pig."
Analysis
The expression
(token == T_SKILL || lowerToken == T_INDEXED_SIMULTANEOUS_ASC || lowerToken == T_INDEXED_SEQUENTIAL_ASC) && cmdLen >= spaceAfterStoringData || cmdLen >= BUFF_LEN
needs to be examined in some detail in order to understand the conditions necessary to replicate the defect. First, I want to reorder the expression terms for clarity.
(T_SKILL == token || T_INDEXED_SIMULTANEOUS_ASC == lowerToken || T_INDEXED_SEQUENTIAL_ASC == lowerToken) && spaceAfterStoringData <= cmdLen || BUFF_LEN <= cmdLen
This code style brings "what I'm looking for" to the forefront when reading the code.This helps me keep my mental model of the algorithm being used.
The test case of the expression is: T_SKILL_DATA == token. The questions are:
What is the intent of the code?
How is the expression evaluated?
How do the values of lowerToken, spaceAfterStoringData, and cmdLen affect the expression.
What is the intent of the code?
The intent is to detect a buffer overflow for those commands that accept a variable length argument list.
How is the expression evaluated?
The code depends upon the order of precedence of the logial operators. (Parentesies would have made the intent unambiguous.) The order of precedence for the logical operators is: NOT (!), AND (&&), OR (||).
The expression is then equivalent to
( ( T_SKILL == token || T_INDEXED_SIMULTANEOUS_ASC == lowerToken || T_INDEXED_SEQUENTIAL_ASC == lowerToken ) && (spaceAfterStoringData <= cmdLen) ) || (BUFF_LEN <= cmdLen)
The precondition, T_SKILL_DATA == token ('K') controls the value of lowerToken, i.e., T_SKILL == lowerToken ('k'). Substitution these values into the first term of the AND expression gives,
(T_SKILL == T_SKILL_DATA || T_INDEXED_SIMULTANEOUS_ASC == T_SKILL || T_INDEXED_SEQUENTIAL_ASC == T_SKILL)
which evaluates to false.
This causes the entire expression to collapse to the last term
(BUFF_LEN <= cmdLen)
which is the check for buffer overflow. Isn't that what we want, after all?
Not rxactly. The reason is that the first term of the AND expression should be
(T_SKILL_DATA == T_SKILL_DATA || T_INDEXED_SIMULTANEOUS_ASC == T_SKILL || T_INDEXED_SEQUENTIAL_ASC == T_SKILL)
which evaluates to true. So the AND expression should be
(TRUE) && (spaceAfterStoringData <= cmdLen)
This shows the effect of spaceAfterStoringData on the test for buffer overflow. If cmdLen exceeds spaceAfterStoringData then buffer overflow will occur. Note that the code distinguishes between BUFF_LEN and spaceAfterStoringData. It certainly appears their relationship is: spaceAfterStoringData <= BUFF_LEN. The initialization of spaceAfterStoringData confirms this.
int spaceAfterStoringData = BUFF_LEN;
We want to know how and when spaceAfterStoringData is calculated. We know it doesn't happen as the input is read, so it should be calculated before the call to read_serial.
Disturbia
The data is coming through the protocol so we don't know the length or content until after we've received it. (We'd have to be psychic to do that.) However, we might, at the start of reading a command, reset spaceAfterStoringData back to it's initialization state. (Pssst! We don't.) But that doesn't really help us determine if the buffer will oveflow as we're reading the data. It's only after we've read the command and data, and we're processing it, that we can determine if the buffer will overflow. The use of spaceAfterStoringData during the read operation is suspicious.
We're almost ready to track down spaceAfterStoringData but there's another clue about it that we have. The buffer ovrflow check is restricted by the 3 token types. We can expect that spaceAfterStoringData is affected by or affects the processing of those tokens.
end Part 1
Timeout Tests
The Serial class ctor has a timeout argument that controls the total time of read and write operations before a timeout occurs. The default value I used was 10ms which matches SERIAL_TIMEOUT. Changing it to 150 ms had no effect.
On the bot side, the macros SERIAL_TIMEOUT and SERIAL_TIMEOUT_LONG affect only read operations. The setup function sets the timeout for the Serial object using the SERIAL_TIMEOUT macro and then clears the input buffer.
Serial.setTimeout(SERIAL_TIMEOUT); while (Serial.available() && Serial.read()) ; // empty buffer
The macros are used to set the global variable serialTimeout. For practically all of the commands sent to the bot, the serialTimeout value is SERIAL_TIMEOUT (10 ms). But if the bot receives either a T_SKILL_DATA or T_BEEP command, the value of serialTimeout is set to SERIAL_TIMEOUT_LONG (150 ms). This is because these two commands have variable length data attached to them. The T_BEEP command can specify a list of notes to play that can be long. The T_SKILL_DATA command specifies lists of servo angles for posture, gaits, and behaviors that can have dozens of bytes. When the bot receives a command it checks if it is either of these two and if so, changes serialTimeout to SERIAL_TIMEOUT_LONG to provide more time to read the data.
void read_serial() { Stream* serialPort = NULL; // ... set serialPort to Bluetooth or USB // if they have data ... if (serialPort) { // data available token = serialPort->read(); // read the command lowerToken = tolower(token); newCmdIdx = 2; delay(1); // leave enough time for serial read // capitalized tokens use binary encoding for // long data commands // '~' ASCII code = 126; may introduce bug when // the angle is 126 // so only use angles <= 125 terminator = (token >= 'A' && token <= 'Z') ? '~' : '\n'; serialTimeout = ( token == T_SKILL_DATA || lowerToken == T_BEEP) ? SERIAL_TIMEOUT_LONG : SERIAL_TIMEOUT; //...
The algorithm used to read command data consists of
lastSerialTime = millis(); do { if (serialPort->available()) { do { // overflow check ... // read input newCmd[cmdLen++] = serialPort->read(); } while (serialPort->available()); lastSerialTime = millis(); } } while ( newCmd[cmdLen - 1] != terminator && long(millis() - lastSerialTime) < serialTimeout);
The intent of the code (as I understand the forum post) is to handle Bluetooth input.that can be broken up into chunks before the command terminator is sent. The elapsed time between chunks was greater than serialTimeout and so a timeout occured. The solution was to increase the timeout period to allow Bluetooth more time to send the chunks before timeout. This also handles the case of no terminator ("no line ending") or "newline only".
There's a small "hole" here if there's an embeded command terminator (or multiple commands sent at once). If the inner loop stops on the embedded terminator, the command is partially complete and subsequent iterations may interpret the remaining data as new commands
Of more importance is the buffer overflow check. The condition for the check is
if ( (token == T_SKILL || lowerToken == T_INDEXED_SIMULTANEOUS_ASC || lowerToken == T_INDEXED_SEQUENTIAL_ASC) && cmdLen >= spaceAfterStoringData || cmdLen >= BUFF_LEN) { //... }
T_SKILL is the named skill ('k') but the skill with all the data is T_SKILL_DATA
I have found the very cause of Hamlet’s lunacy
- Hamlet, Act 2, Scene 2
This isn't definitive but goes a long way. I think it happens intermittently, but the effects seem to be persistent. It would be good to know if such an overflow would affect the program.
For completeness I should add this data point.
This is the list of tests I have that have been successfully running for the last 2-3 weeks. Test marked "DISABLED_" are either not fully tested, tested but too energetic for my test stand, or a known to update EEPROM and not suitable repeated unit testing.
ftfBittleXBehavior. angry DISABLED_backflip boxing cheers check come_here dig DISABLED_frontflip high_five good_boy handstand hug hi hand_shake hands_up jump kick leapover moon_walk nod play_dead pee push_ups pushups_one_hand recover roll scratch sniff table test wave_head zero Behavior_Data Behavior_Data_User ftfBittleX. intentional_fail ftfBittleXKata. Head_sweep ForeArm_sweep Arm_sweep Leg_sweep Hip_sweep ftfcmd_def_t. max_cmdid_t default_ctor_nothrow default_ctor_id_match default_ctor_cmd_empty default_ctor_description_empty default_ctor_data_empty ctor_id_match ctor_cmd_match ctor_description_match add_ASC_char_match add_ASC_string_match add_ASC_int_match add_ASC_vector_match add_BIN_char_match add_BIN_string_match add_BIN_int_match ftfBittleXGait. DISABLED_bound_forward backward backward_left backward_right backward_random crawl_forward crawl_Left crawl_right crawl_random gap_forward gap_Left gap_right gap_random halloween DISABLED_jump_forward push_forward push_left push_right push_random trot_forward trot_Left trot_right trot_random step spring_left spring_right spring_random walk_forward walk_left walk_right walk_random Gait_Data DISABLED_Gait_Data_User ftfBittleXPosture. Balance_skill ButtUp_skill Calibrate_skill Dropped Lifted Landing Sit Stretch Standup Zero Posture_Data Posture_Data_User ftfBittleXProtocol. CR_only LF_only CRLF_only invalid QUERY ABORT DISABLED_BEEP_volume DISABLED_BEEP_getmute DISABLED_BEEP_setmute DISABLED_BEEP_tune CALIBRATE REST INDEXED_SIMULTANEOUS_ASC_angle INDEXED_SIMULTANEOUS_ASC_list JOINTS_list JOINTS_index JOINTS_index_sweep INDEXED_SEQUENTIAL_ASC_angle INDEXED_SEQUENTIAL_BIN_angle INDEXED_SEQUENTIAL_ASC_list INDEXED_SEQUENTIAL_ASC_OWL INDEXED_SEQUENTIAL_BIN_list INDEXED_SIMULTANEOUS_BIN_angle INDEXED_SIMULTANEOUS_BIN_list INDEXED_SEQUENTIAL_ASC_list_repeat COLOR VERBOSELY_PRINT_GYRO PRINT_GYRO GYRO_BALANCE LISTED_BIN PAUSE TEMP DISABLED_RANDOM_MIND DIGITAL_READ ANALOG_READ DISABLED_SAVE GYRO_FINENESS DISABLED_MELODY SLOPE TILT MEOW DISABLED_SERVO_PWM DISABLED_ACCELERATE DISABLED_DECELERATE on_command_list DISABLED_example_py utfwin32. ctor_valid
The point here is that the T_SKILL_DATA test was the last unit I was working on when the issue started.
I need to know the answer to this question before I do it.
If I upload a test, say one of the Module test sketches, and run it. Then I run "Upgrade Firmware" from the Petori App
Will the bot be back to where I started, i.e, just as it is now.?
I have to apologize for the garbled posts I made yesterday, it was a bad day and I got impatient. But I also didn't communicate my message very well. I had trouble making the videos, using the forum interface to post my code, etc. The result was an unintelligible mess.
So, I'm going to try to clean up my and break it down into smaller chunks If it can edit my past to reformat the code, that would help. Otherwise, I can repost the important parts.
Again, I apologize for the confusing I caused.
Are you using the USB cable or bluetooth to send the long command? Is it send through Windows or Mac? For Bluetooth, the command will be sliced into small bucket, and it may reach the timeout and think the message has ended. If you have access to Arduino IDE, try to increase this timeout 150 to some larger values, such as 300 (in ms). It is more secure but will take response time longer. The current 150 is tested to work in most cases.
I wanted to note that I think there's an oddity in the anry behvior data.. There are 7 entries for the behvior, each entry consisting of 116 angles and followed by the action spec. On the BittleX, ther are no servos 1-7 yet the behavior specifies angles for those servos. IWhen I issue the joints comman for all joints I have seen values for those servos.
Last try, just the skill data command
[ RUN ] ftfBittleXBehavior.Behavior_Data ftBittleX::ftfBittleX::on_send TX command : kpu ftBittleX::ftfBittleX::on_response Command completed (normal) description : pushups cmd : k id : 9 data : pu 2 ms : RX_latency 4726 ms : RX_elapsed response : pu Progress: 1/10 Progress: 2/10 Progress: 3/10 Progress: 4/10 Progress: 5/10 Progress: 6/10 Progress: 7/10 Progress: 8/10 Progress: 9/10 Loop remaining: 2 Progress: 8/10 Progress: 9/10 Loop remaining: 1 Progress: 8/10 Progress: 9/10 Progress: 10/10 k k response : end ftBittleX::ftfBittleX::on_send TX command : j ftBittleX::ftfBittleX::on_response Command completed (normal) description : JOINTS cmd : j id : 8 10 ms : RX_latency 1 ms : RX_elapsed response : = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0, 0, 0, 0, 0, 0, 0, 0, 30, 30, 30, 30, 30, 30, 30, 30, j response : end ftBittleX::ftfBittleX::on_send TX command : kzero ftBittleX::ftfBittleX::on_response Command completed (normal) description : Zero cmd : k id : 9 data : zero 2 ms : RX_latency 232 ms : RX_elapsed response : zero k k response : end ftBittleX::ftfBittleX::on_send TX command : 4bfffffff6001783000000001e1e1e1e1e1e1e1e8000f00000001e23281532ff29c000f00000001e23281e32ffe100001e00000001b23283c32f142d10000f00000002a23283c1914143cc00000000000302d4b3c1425143cc000fffffff100000003c3c4646ff3c3c10000000000001e1e6e6e3c3c3c3cc1001e000000046465555ffffffceffffffce3c3c10000000000001e1e1e1e1e1e1e1e80007e ftBittleX::ftfBittleX::on_response Command completed (toggle) description : SKILL_DATA cmd : K id : 10 data : 24600178300000000303030303030303080001500000003035402150151541120001500000003035403050151514160003000000002735406050152045160001500000004235406025202060120000000000048457560203720601200024100000006060707015156060160000000000030301101106060606012100300000000707085852062066060160000000000030303030303030308000 346 ms : RX_latency 4453 ms : RX_elapsed response : Progress: 1/10 Progress: 2/10 Progress: 3/10 Progress: 4/10 Progress: 5/10 Progress: 6/10 Progress: 7/10 Progress: 8/10 Progress: 9/10 Loop remaining: 2 Progress: 8/10 Progress: 9/10 Loop remaining: 1 Progress: 8/10 Progress: 9/10 Progress: 10/10 k k response : end ftBittleX::ftfBittleX::on_verify ftBittleX::ftfBittleX::on_send TX command : j ftBittleX::ftfBittleX::on_response Command completed (normal) description : JOINTS cmd : j id : 8 11 ms : RX_latency 1 ms : RX_elapsed response : = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0, 0, 0, 0, 0, 0, 0, 0, 30, 30, 30, 30, 30, 30, 30, 30, j response : end ftBittleX::ftfBittleX::on_send TX command : kzero ftBittleX::ftfBittleX::on_response Command completed (normal) description : Zero cmd : k id : 9 data : zero 2 ms : RX_latency 240 ms : RX_elapsed response : zero k k response : end [ OK ] ftfBittleXBehavior.Behavior_Data (10151 ms) [ RUN ] ftfBittleXBehavior.Behavior_User ftBittleX::ftfBittleX::on_send TX command : kzero ftBittleX::ftfBittleX::on_response Command completed (normal) description : Zero cmd : k id : 9 data : zero 1 ms : RX_latency 40 ms : RX_elapsed response : zero k k response : end ftBittleX::ftfBittleX::on_send TX command : 4bfffffff9001343000000001e1e1e1e1e1e1e1e20000ffffffce02d0fffffffbfffffffb1414ffffffed2f475afffffff0ffffffc9292f10000ffffffce02d0fffffffbfffffffb1414ffffffba3c55134cffffffbc29ffffffe120000ffffffec02d0fffffffbfffffffb1414ffffff934161e44ffffffb31bfffffff530000ffffffac02d0fffffffbfffffffb1414ffffffb44161e5affffffc41bfffffff54040026ffffffb02d0fffffffdfffffffd33464e4616ffffffc9fffffff8ffffffddfffffffd106000ffffffb00000001e1e1e1e1e1e1e1e100007e ftBittleX::ftfBittleX::on_response Command completed (toggle) description : USER_SKILL cmd : K id : 10 data : 2490013430000000030303030303030303200020604502512512020237477190240201414716000206045025125120201866085197618841225320002360450251251202014765971468179272454800017204502512512020180659714901962724564400381764502532533370787022201248221253166000176000000303030303030303016000 356 ms : RX_latency 2868 ms : RX_elapsed response : Progress: 1/7 Progress: 2/7 Progress: 3/7 Progress: 4/7 Progress: 5/7 Loop remaining: 2 Progress: 4/7 Progress: 5/7 Loop remaining: 1 Progress: 4/7 Progress: 5/7 Progress: 6/7 Progress: 7/7 k k response : end ftBittleX::ftfBittleX::on_send TX command : kzero ftBittleX::ftfBittleX::on_response Command completed (normal) description : Zero cmd : k id : 9 data : zero 4 ms : RX_latency 680 ms : RX_elapsed response : zero k k response : end [ OK ] ftfBittleXBehavior.Behavior_User (3970 ms)
Edit: cleaning up my mess. The TX_command represents binary data
Can't attache the log file.
Well, I spent the better part of today writing my analysis of the log file only to run into a limitation of post sie on the forum. So this is going to be drastically abbreviated.
I upgraded the firmware and check if that fixed the problem. Initially, it appears to have worked. But ... only if the bot starts in practically the "rest" pose (Boot Test A). If I set the pose it in the "zero" position and powerup, the behavior repeats. (Boot Test B)
This behavior goes on for quite some time and doesn't seem to end. The bot seems like it's hunting around to get to a known pose and never making it. This makes me suspicious about the IMU. As an experiment, I picked up the robot and lifted it slightly. The result was the movement stopped but the pose wasn't reached.
Boot Test A https://youtube.com/shorts/kdnRSTnuEnI
Boot Test B https://youtube.com/shorts/UAxoWAS_BZo
I have two tests written in Google gtest for the T_SKILL_DATA protocol command.
TEST_F(ftfBittleXBehavior, Behavior_Data) { using std::this_thread::sleep_for; ASSERT_TRUE(on_pushups()); // standard, built-in behavior sleep_for(milliseconds(50)); // ~servo lag time vector <int8_t> expect_joints{}; ASSERT_TRUE(on_joint(expect_joints)); vector <joint_t> pose{ {HEAD, expect_joints[HEAD]} , {LARM, expect_joints[LARM]} , {RARM, expect_joints[RARM]} , {RHIP, expect_joints[RHIP]} , {LHIP, expect_joints[LHIP]} , {LFOREARM, expect_joints[LFOREARM]} , {RFOREARM, expect_joints[RFOREARM]} , {RLEG, expect_joints[RLEG]} , {LLEG, expect_joints[LLEG]} }; ASSERT_TRUE(on_zero()); ASSERT_TRUE(on_behavior_data()); EXPECT_TRUE(on_verify(pose)); } TEST_F(ftfBittleXBehavior, Behavior_Data_User) { const int8_t ang[] = { // anger -7, 0, 0, 1, 3, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 30, 30, 30, 30, 30, 30, 30, 30, 32, 0, 0, 0, -50, 0, 45, 0, -5, -5, 20, 20, -19, 47, 71, 90, -16, -55, 41, 47, 16, 0, 0, 0, -50, 0, 45, 0, -5, -5, 20, 20, -70, 60, 85, 19, 76, -68, 41, -31, 32, 0, 0, 0, -20, 0, 45, 0, -5, -5, 20, 20,-109, 65, 97, 14, 68, -77, 27, -11, 48, 0, 0, 0, -84, 0, 45, 0, -5, -5, 20, 20, -76, 65, 97, 14, 90, -60, 27, -11, 64, 4, 0, 0, 38, -80, 45, 0, -3, -3, 3, 3, 70, 78, 70, 22, -55, -8, -35, -3, 16, 6, 0, 0, 0, -80, 0, 0, 0, 0, 0, 0, 30, 30, 30, 30, 30, 30, 30, 30, 16, 0, 0, 0, }; ASSERT_TRUE(on_zero()); EXPECT_TRUE(on_skill_data(ang, sizeof(ang))); // todo: verify pose }
After I build the project, I run the tests 3 times. The first two run are successful but the third fails. The attached file is the log file out for the run.
[ RUN ] ftfBittleXBehavior.Behavior_Data ftBittleX::ftfBittleX::on_send TX command : kpu ftBittleX::ftfBittleX::on_response Command completed (normal) description : pushups ... ftBittleX::ftfBittleX::on_send TX command : j ftBittleX::ftfBittleX::on_response Command completed (normal) description : JOINTS ... ftBittleX::ftfBittleX::on_send TX command : kzero ftBittleX::ftfBittleX::on_response Command completed (normal) description : Zero ... ftBittleX::ftfBittleX::on_send TX command : 4bfffffff6001783000000001e1e1e1e1e1e1e1e8000f00000001e23281532ff29c000f00000001e23281e32ffe100001e00000001b23283c32f142d10000f00000002a23283c1914143cc00000000000302d4b3c1425143cc000fffffff100000003c3c4646ff3c3c10000000000001e1e6e6e3c3c3c3cc1001e000000046465555ffffffceffffffce3c3c10000000000001e1e1e1e1e1e1e1e80007e G:\AppDev\Robotics\Petoi\test\ftBittleX\ftBittleX.cpp(420): error : Value of: result Actual: false Expected: true ftSerial.write FAILED G:\AppDev\Robotics\Petoi\test\ftBittleX\ftBittleX.cpp(638): error : Value of: result Actual: false Expected: true G:\AppDev\Robotics\Petoi\test\ftBittleX\ftbehavior.cpp(160): error : Value of: on_behavior_data() Actual: false Expected: true ftBittleX::ftfBittleX::on_send
The failure occurs when trying to the skill data for the first test. Somehow that turns on the verbose gryo output and causes all the remaining tests to fail.
How this can cause the boot behavior I don't know
(Arrg! I can attch the file!) I'll try in the next post.
Edit: cleaning up my mess
Interesting... Thanks for including the video! So I see you're using a BiBoard but I need more context though, with an important caveat that I have not personally used the token T_SKILL_DATA aka 'K'.
Are you using unmodified OpenCatEsp32 source code? If not, I'll might have questions on what was modified.
Can you share what commands (tokens with any data) you sent in the lead up to this behavior? You mention SKILL_DATA so I gather it includes the 'K' token.
The token T_SKILL_DATA aka 'K' causes reaction() to call copydataFromBufferToI2cEeprom() in I2cEEPROM.h. That function will give a serial readback of "I2C EEPROM overflow! Delete some skills!" if "eeAddress + len >= EEPROM_SIZE" where EEPROM_SIZE = (65535 / 8). Did you see that message?
After writing the skill data to the EEPROM, reaction() then uses the skill data still in the newCmd variable to build/transform that data using code lines skill->buildSkill(); and skill->transformToSkill(skill->nearestFrame()); After that, reaction() uses strcpy(newCmd, "tmp"); to overwrite the contents of newCmd. This is kinda odd IMO since skill->buildSkill(); does the same thing... but the net effect is to "run the last skill", namely what you sent, since reaction() then sets the token to T_SKILL 'k'. No question here. Really just concluding that I would need to see the commands you sent, including that skill data.
I don't see a usb cable so are you sending over Bluetooth? If Bluetooth, what readbacks are you getting (if any) since Bluetooth can be finicky that way? Also, if Bluetooth, have you tried the same command sequence that let to this point but using the USB serial communication?