Fucking Cosmic Rays...

This is why my computer and peripherals are all enclosed in solid lead and cooled by by thermaltake for anti-cavitation goodness.



:spineyes:
 
Sexual Intercourse with Cosmic Rays is known to the state of California to cause prostate cancer in lab mice that were tested. .
 
Cisco still uses the excuse. I've heard it twice now in a 2 month period. Mainly hear it on random 7609/6513 card resets, not so much on CRS platforms, though.

Redback/Ericcson would use the cosmic ray excuse a lot on SE800s, their other excuse was microfractures caused by a miscalibrated mechanical arm when moving asic wafers in the fab.
 
I had a chunk of C code that was effectively
if(funcA() || funcB() )
do bad stuff
else
do good stuff.


Somehow the first path was taken even though logic and the database records implied that funcA and funcB returned 0 (which is good.)

This code is very weak against random bit corruption as any register or memory bits getting flipped effectively pushes the normal response into the error handling response.

I was able to replay transactions on a test database and the correct behavior was taken on the replay.

Very frustrating because in this case because I am not even convinced that buffered ram would have fixed this since it is just as likely that the corruption happened on the cpu.

A procedural workaround is to verify actual return codes so the code above would become something like

#define BAD (whatever)

int a = funcA()
int b = fincB()

if (a == BAD || b == BAD)

but then you have to check for exact affirmative replies as well and then toss an exception or whatever if the data makes no sense.

Edit: the machine in question had performed the above code correctly we figured around 18million times and then passed a 24 hour cpu register and mem test so hard to blame the hardware.
 
Last edited:
Back
Top