Tuesday, September 20, 2022

defensive coding

 

One of the things happening to people in testing positions is that every now and then we get to say "I told you so", usually around a bug report that was filed and closed as a "won't fix\not a bug\not important" and came back to bite us in the rear. While there's always the basic joy of being right (and more importantly, of other people being wrong), over the years I've learned to see those cases as a professional failures instead of sources of joy. After all, I saw the problem in advance, knew that it was a problem and maybe I could have done something differently to actually get it fixed. Maybe I could have presented the problem differently, talked to other people who could advocate it better for me, collected more evidence or perhaps it was only a matter of being more persistent in asking it to be fixed. In other cases, there was nothing I could do at the time since the reason for not addressing it is rooted in the organizational culture, that I now can start pushing towards. Saying "I told you so" is not the professional thing to do. 

Last week I had just such a case - Something didn't work for a customer, and upon further inspection - had never worked since deployment. Before we got to difficult debugging, we went over the short checklist of problems. Something quite equivalent of your ISP tech support asking you to reboot your router when you call. In our case, this list consisted on one thing - checking the configuration file on the server. With two relevant entries, it was a rather short glance - the authentication token looked fine and the destination URL pointed to the correct base path. So far, so good. 
Then, by sheer luck, something stirred my memory and I've noticed that the URL has been typed with schema+FQDN, you know - the way URLs are usually formatted. I recalled that when I've worked on that feature there was something odd regarding to having the schema provided in the file. A short trip to our bug tracking system, and indeed my memory was correct - if we provide the schema (for those less fluent in this specific terminology, that's the https:// part) , it won't work as the client consuming this configuration will add the schema themselves, and https://https://any.domain will fail. To make it that much more fun, there won't be any way to understand that from the logs. The ticket was logged in last December (about 9 months ago!) and in the discussion around it there were some acceptable reasons to not fixing it, a case could have been made for a fix anyway, but back then it would have been a harder battle to win. It's not that any fix would have been difficult -  the team configuring the server could add a regular expression validation to their tool, the server could reject the config or remove the schema, the client could do the same and log a meaningful error and all of us could be monitoring for this feature once it was deployed We could even change the name of the parameter so that instead of "...URL" it would be "...DOMAIN" and reduce the chance of errors. For almost all steps that we could have taken but did not there's a common reason: Optimistic coding. 
Optimistic coding is a state of mind where we assume that everything is going to be fine - the API is to be used only internally, so everyone will know what they should be doing. And if someone makes a mistake? Well,  it's their problem to fix.  
What we should be doing (and I intend to use this incident as leverage to push towards such behavior) is to create our software with a slightly more paranoid approach. Software is created to be used and operated by human beings, and human beings make mistakes. We need to assume that people operating, configuring and debugging the system will act very stupidly, not because they are stupid but because they are doing something else, under time-pressure and with a lot of distractions around. Most likely, at least some of those people will be our future selves. If we keep that in mind we can adopt a "Nothing gets passed me" approach - any mistake that can be detected in a given phase should be dealt with at this place - fix it if possible, return (or report) an error if fix is not possible.  and almost never let a problem pass from one component to the next.

Sunday, September 18, 2022

קידוד מתגונן

 




אחד הדברים שקורים למי שבודק תוכנה למחייתו הוא שמדי פעם עולה הזדמנות לומר "אמרתי לכם", בדרך כלל מעורב בזה באג שנסגר תחת "זה לא יקרה\לא בעיה\לא נתקן". יש בזה משהו מאוד מספק ברמה ילדותית שכזו, אבל לאורך השנים הרגלתי את עצמי להסתכל על אירועים כאלה כעל כישלון מקצועי. במקום ריקוד ניצחון קטן והידיעה שבפעם הזו הייתי צודק יותר ממי שזה לא יהיה שהתווכחתי איתו, אני מנסה להבין מה יכולתי לעשות כדי לא להגיע למצב הזה מלכתחילה. לפעמים זה עניין של להתנסח טוב יותר, לאסוף ראיות מתאימות או לדבר עם האדם הנכון. לפמים נדרש שינוי תרבותי משמעותי יותר כדי שבעיות מהסוג הזה יילקחו ברצינות. כך או אחרת, אם משהו הצליח להרגיז לקוח, זה לא מספיק לכסות את הישבן שלי ולומר "אבל אמרתי לכם", צריך להבין איך יכולתי לעזור לארגון להימנע מזה מלכתחילה. 
בשבוע שעבר קרה לי בדיוק מקרה כזה - משהו לא עבד בשביל לקוח, ומסתבר שלא עבד בכלל מהרגע בו הוא קיבל את המוצר. לפני שניגשנו לדבג את הבעיה ולחקור הכל כמו שצריך, התחלנו עם מעבר בסיסי על שטויות מטומטמות. קצת כמו שמתקשרים לתמיכה הטכנית ושואלים אם ניתקנו את הראוטר מהחשמל וחיברנו חזרה. הפעם, סט הבדיקות היה קצר למדי - בודקים שקבצי הקונפיגורציה בשרת תקינים, כל דבר יותר מזה כבר דורש לקבל אישור מהלקוח. לפיצ'ר הספציפי הזה היו שני פרמטרים רלוונטיים, מזהה כלשהו ופרמטר נוסף שמייצג URL אליו צריך להתחבר. שניהם נראים די סבבה בסך הכל. ואז, בתופעה של יותר מזל משכל, משהו בזיכרון שלי קפץ וצעק "רגע! רגע!" זכרתי שהיה משהו בו אם מספקים פרמטר של URL אבל כוללים בפנים גם את הסכמה (יעני, https://) אז יש בעיות. או להיפך, אם לא כוללים אותה. מפה לשם, בדקנו את מערכת תיעוד הבאגים שלנו ואכן, זכרתי נכון. בשדה שנקרא משהו משהו URL, הערך שצריך להכניס הוא בעצם שם הדומיין (ובלעז - FQDN). יותר מזה, בתיאור הבאג אי אז בדצמבר האחרון (לפני תשעה חודשים!) אפילו כתבתי שבגלל שם המשתנה מישהו עלול להתבלבל ולכלול את הסכמה בלי לדעת שזה יעשה צרות. סטטוס הטיקט - לא באג. האמת? היו כמה נימוקים סבירים לגמרי - זה משתנה שנערך רק על ידי צוות פנימי, יש לנו שדות אחרים בשם URL עם אותה מגבלה ממש, וככה היה החוזה שסגרו מול הצוותים השונים. זה שפשוט למדי לתקן את זה? אז פשוט. גם במבט לאחור, זה לא דיון שהייתי יכול להגיע בו לתוצאה אחרת באותו זמן, כי הגישה של "אם מישהו אחר בשרשרת טועה זו בעיה שלו" הייתה רווחת מאוד, וגם אילו הייתה לי דרך להסלים את זה (לא היה אז למי, היום אולי יש) - זה לא בהכרח משהו שכדאי לפתוח סביבו מאבק שיפגע ביכולת שלי לעבוד בהמשך עם האנשים. זה לא שחסרו דברים שיכולנו לעשות - אם הצוות שאחראי על קינפוג השרת היה מסדר ולידציה פשוטה בתהליך האוטומטי שמפעיל את זה, אם הצוות של השרת הזה מוריד את הסכמה אם היא קיימת, או אם הקליינט היה עושה אותו דבר, אם אחד הצוותים היה דואג לשורת לוג ברורה שהיינו טורחים לנטר או אפילו אם היינו פשוט משנים את שם המשתנה כך שבמקום URL תופיע המילה DOMAIN - סביר להניח שלא היינו נמצאים היום במצב בו לקוח כועס עלינו. 
הבעיה הייתה, ועודנה, אופטימיות יתר. מישהו טעה? המערכת קורסת? זה בגלל מי שטעה וזו בעיה שלו. אין לנו מערכת מסודרת להעברת כל הפרטים הקטנים האלה שהיו לי ברורים מאליהם ולא טרחתי לתעד? או להעביר את המידע הלאה גם אחרי ששמתי לב שיכולה להיות כאן אי-הבנה? אז אין מערכת, שישאל אם הוא לא בטוח. זו התרבות הנוכחית - חזקה יותר בצוותים מסויימים, פחות חמור בצוותים אחרים. 
בסופו של יום, צריך לזכור שקוד נכתב כדי שבני אדם ישתמשו בו. ובני אדם טועים. אנחנו רוצים לבנות מערכות שעוזרות לנו להתמודד עם המציאות הזו - לתקן טעויות אנוש אם אפשר, לעצור את הטעות אם אי אפשר ולא להעביר אותה לנקודה הבאה בשרשרת, ולהציף את השגיאה כמה שיותר מהר. הטענה "זה עומד בדרישות המוצר" פשוט לא מספיקה. 

Thursday, September 15, 2022

Book review - Radical Candor: Be a kick-ass boss without losing your humanity, by Kim Scott



Radical Candor is a book I got to after years of getting references to it from various places - blogs, other books, people. It means that I had high expectations *and* was pretty sure I wouldn't be surprised by the content. Not an easy place for a book to be. Despite those difficult starting conditions, it manages to live up to the reputation it has, and to pack the information in a useful, coherent way.

I've listened to an audiobook of the 2nd edition, and it starts with an attempt to defuse a common backlash of the 1st edition: Radical candor is not a permission to be an arsehole, nor is it an invitation to be cruel. The main reason of being truly candid with someone is because we care. It is this care that drives us to provide feedback even if it's painful, and to make sure the recipient is able to make use of it. The book cover is providing a neat summary of the main idea behind this book. Relevant behavior is measured on two axes: Caring personally and challenging directly. Caring personally is being interested in the well being of the person you are working with (in the book, the people you manage). Not offending them, providing them with opportunities to improve, etc. Challenging directly, on the other hand, is about getting things done - pointing out mistakes, being accurate and concise, regardless of how people feel about it. 

Those axes create four distinct categories:

  • Manipulative insincerity: Low caring, low challenging. This is where you show no care for the other person and for the mission - you avoid conflict by not giving someone difficult feedback, but having no problem backstabbing them when they are not around. You might want to be on that person's good side, to pat their ego or simply to get that person off your hands. If you see someone being sweet and enthusiastic with someone, only to sigh and roll their eyes the moment the other person leaves - that's it. 
  • Ruinous empathy: High caring, low challenging. The intentions behind this are kind - not making someone feel bad, giving them leeway since they are in a difficult time, or simply wanting to avoid conflict. The end result though is indistinguishable from manipulative insincerity, as in both cases you will keep silent or give underserved praise. The fact that you mean well doesn't really matter, as the road to hell is paved with good intentions and in this case,  hiding the mess under the carpet will come back to bite both of you in the rear.
  • Obnoxious aggression: Low caring, high challenging. Surprisingly, this is actually the second best place to be in. People getting the business end of this behavior might cry, stress out or feel attacked, but things actually get done and either improves or breaks completely. If people can grow the necessary thick skin to survive, they will get direct feedback and could build on it to improve. Don't expect employee retention to be high, though, as this assault on the employees ego and confidence will tire most of them enough to quit.
  • Radical candor: High caring, high challenging. This is the sweet spot between obnoxious aggression and ruinous empathy. You give employees actionable feedback and help them process it and improve from it. You take care not to say "This code is shit", but rather "This code isn't good enough, it should be broken to smaller functions, improve variable names and care more about log levels, you usually do better". 
It's important to notice that there isn't a recipe for being radically candid with someone, as what matters is not what was said but rather what was heard and understood.  One person will understand "this work is shit, take the time to do it well" as a personal judgement and will be discouraged, while another would see it as an honest evaluation of the work and an appreciation of their skills when not under pressure. The first might be at a loss about what is wrong and would require more direct guidance such as "it works, but will be hell to maintain unless we clean up the wording and build a better structure while the second might get annoyed with you explaining the obvious and micromanaging them. the difference could be rooted in personal preferences on getting feedback, but it can also be a result of how much the person trusts you, how confident they are in their current situation and skills or how they woke up this morning. Complicated? it sure is. You will make mistakes, and the only question is how many and how severe they will turn out to be. A strong relationship is a good buffer to absorb such mistakes, so it is worth investing in it from the onset. 

After the reader (listener, in my case) has understood both what is radical candor and why is it important, it's time to implement those ideas. The book goes on to discuss strategies of creating a radically candid culture around you, peppered with examples from the author's experience that helps understanding the theory as well as the need for anyone to devise their own strategies. I won't go into all of the details as the book does it way better than I could (plus, the author has put a lot of effort into it, go read her work, no mine) but all in all, it's definitely a book I'll listen to again, and  it helped me frame office (and personal) communication in a helpful manner.

At the end of the audiobook there was a preview of another book: "How to Root Out Bias, Prejudice, and Bullying to Build a Kick-ass Culture of Inclusivity". The text there is a little bit less polished and I'm not sure I got the the bottom of what was communicated to me, but even so it was very moving, which is what to be expected from an introduction section to a book (as I believe this text to be) I found myself wondering several times "What the F?? which decent human being would behave that way? there's bias, and there's straight out harassment". So, this is a solid  "maybe" with good potential.