אשרי אדם מפחד תמיד Happy is the man who always fears: 2022

Monday, November 21, 2022

לאן מתקדם בודק תוכנה?

כשמתכנת מתחיל את דרכו המקצועית יש עשרים אלף אופציות בהן הקריירה יכולה להתפתח - מערכות משובצות, שרתי-אינטרנט, מערכות הפעלה,מובייל, ועוד, בנוסף, כל תחום כזה יכול לבוא יחד עם התמחות בשפות תכנות ספציפיות כמו ג'אווה, סי, פיית'ון ואחרות. מתכנת מתחיל ימצא את עצמו בדרך כלל בצוות עם עוד מתכנתים שיש להם קצת יותר ניסיון ויתחיל לשמוע על כל הדברים שחסרים לו - כתיבת קוד מאובטח, תבניות עיצוב או היכרות עם ספריה כזו או אחרת. בכל פעם בה מתכנת מסיים קטע עבודה הוא מקבל משוב על טיבה - מישהו אחר בצוות יקרא את הקוד שלו, יעיר הערות ושני הצדדים ילמדו. אחרי זמן לא ארוך במיוחד יהיו למתכנת הזדמנויות לחזור לקוד שהוא כתב בעבר ולראות כמה הוא כבר השתפר ואיך טעויות שהוא עשה אז מפריעות לו עכשיו. בסך הכל - סביבת למידה מדהימה, גם אם אנחנו לא לוקחים בחשבון את מאות הקורסים הזמינים שמלמדים תוכן רלוונטי ברמה המתאימה.

ומה המצב אצל בודקי תוכנה? אפשר לומר שהוא כמעט הפוך. לא מאוד נדיר שבודק תוכנה מתחיל ימצא את המשרה הראשונה שלו באיזה סטארט-אפ בו הוא בודק התוכנה היחיד, או היחיד בצוות אליו הוא גוייס, רוב מי שהוא ייתקל בו, גם בודקי תוכנה מנוסים יותר, לא יודעים הרבה על הצד התיאורטי של בדיקות תוכנה, ובלא מעט מקומות, בודקי תוכנה נמצאים אי שם בחלק התחתון של שרשרת המזון. כמעט כל מי שמדבר על "לאן מתקדמים" מצביע על דרך התקדמות אל מחוץ למקצוע - לפיתוח, דבאופס, הנהלת מוצר או "סתם" לתפקיד ניהולי. אילו כישורים נדרשים מאדם כדי להיות בודק תוכנה מוצלח? יש כאן הרבה פרשנות ומעט מאוד קונצנזוס. אילו קורסים זמינים למי שרוצה להשתפר? יש עשרות קורסים מסביב לתוכן של ISTQB, אבל התוכן הזה פשוט לא טוב, ובטח לא משפר את המקצועיות של מי שטורח ולומד אותו (למה אני אומר את זה? אפשר לקרוא כאן). קורסים טובים? צריך לחפור מתחת לסלע. זה לא שאין בכלל, קיבלתי המלצות על כמה קורסים, אבל זה לא טריוויאלי למצוא אותם. ומבחינת משוב - גם כאן אין כזה באמת. לפעמים משהו מתפוצץ בשטח ואנחנו אומרים "כן, יכולנו לעשות עבודה טובה יותר", אבל מה עם כל הפעמים בהן לא היינו בסדר ושום דבר דרמטי לא קרה? או כשהיינו זהירים מדי ותקענו מקלות בגלגלים של כל תהליך הפיתוח? בדרך כלל המשוב שנקבל יהיה ספורדי, באיחור ודי מעורפל. גם כאן, יש את יחידי הסגולה שמצאו דרך לקבל משוב מהיר מהשטח על איכות הפיתוח - הם בדרך כלל לא באמת צריכים בודקי תוכנה.
בקיצור, מנקודת המבט של בודקי תוכנה מתחילים, כל מסלולי הקידום מובילים אל מחוץ למקצוע הבדיקות.חלק קטן מהבודקים האלה ימצאו במקרה מסלולים בתוך עולם הבדיקות (וחלק מתוך אלה יתקדמו בהם עד לנקודה בה תיאורי תפקיד רשמיים לא משנים מאוד), אבל רוב העובדים שטובים מספיק כדי להתקדם יכוונו את הקריירה שלהם למקום אחר. מקומות עבודה שמחפשים בודקי תוכנה מנוסים יצטרכו להתאמץ מאוד כדי למצוא מישהו שגם פנוי לעבודה, גם מנוסה וגם מסוגל לעשות עבודה טובה.

במקום בו אני עובד התמודדנו עם הקושי הזה על ידי גיוס עובדים מתחילים, תחת ההנחה שקל יותר ללמד אנשים איך להיות בודקי תוכנה טובים אם לא צריך לשבור להם הרגלים רעים. כמובן, ידענו שיש כמה חסרונות לגישה הזו - נצטרך להשקיע יותר בחניכה ואנחנו מהמרים על כל עובד בלי המון יכולת לנחש את סיכויי ההצלחה שלו מראש. הנחנו שנוכל להתמודד עם האתגרים האלה ולבנות צוות שיתאים למה שהחברה צריכה. אחד האתגרים שלא לקחנו בחשבון היה שימור עובדים. זה לא שהתעלמנו לחלוטין מהבעיה הזו, רק הנחנו שהקושי לשמר עובד יהיה דומה לזה שכל החברה חווה. נכון, לבדיקות תוכנה יש מוניטין לא טוב במיוחד, אבל אנחנו בונים צוות חזק שמאפשר לאנשים ללמוד המון ולהשפיע על הסביבה שלהם - הם יראו את זה ויישארו. לא? ובכן, מסתבר שלא מספיק. בודקי תוכנה מתחילים הגיעו למקצוע ממגוון סיבות, חלק גדול יחסית פשוט מנצל את רף הכניסה הנמוך יותר כדי לצבור ניסיון לקראת המשרה אליה הוא באמת מכוון. לא מעט מאלו שעזבו אותנו עשו את זה כי מנקודת המבט שלהם - אין לאן להתקדם. הם ידעו לומר שהם מתכנתים טובים יותר, אבל כל שאר הכישורים שהם צברו היו שקופים להם, וחשוב יותר - הם לא שיפרו את היכולת שלהם למצוא משרדה טובה יותר בהמשך הקריירה. משרות לפיתוח תוכנה יהיו ספציפיות - מומחה מסדי נתונים, מומחה מערכות משובצות, קרנל, או עוד עשרות תחומים. משרות לבודקי תוכנה יהיו דומות להחריד, לא משנה אם הן לבעלי ניסיון של שנתיים או של עשר והיכולת לומר "זה אני, זה הערך שאני מביא איתי ואלה המשרות שמעניינות אותי" דורשת חשיבה מעמיקה שלא זמינה בתחילת הקריירה, בטח לא במקום שעדיין בונה את תהליכי העבודה והקידום שלו.

אני רוצה לבנות מפה, משהו שיתאר מסלולי התקדמות אפשריים בתוך בדיקות תוכנה. משהו שיאפשר לאנשים להסתכל ולומר "אלה תחומי ההתמחות שלי". כלי כזה יוכל לתת לעובדים מתחילים תחושת התקדמות והתפתחות - במקום לומר "קודם בדקתי, היום אני בודק, מחר אבדוק", הם יוכלו לומר "קודם ידעתי את הבסיס, היום אני יודע קצת על בדיקות נגישות, קצת יותר על בדיקות ביצועים ואני טוב יותר בניתוח סיכונים מאשר הייתי לפני שנה". יותר מזה, הם יוכלו לומר "כשאהיה גדול אני רוצה להיות מומחה ניטור, ולכן אני צריך ללמוד ויזואליציה, עבודה עם נתוני עתק ואסטרטגיות אחסון מידע". עם קצת מזל, זה יכול להיות כלי שישמש חברות כשהן בונות מסלולי התקדמות לעובדים או אפילו בתהליך הגיוס שלהן.

כדי לתמוך במטרות האלה, בניתי מודל שמכיל שלוש רמות של הגדרות:

1. בכירות.
חלוקה כללית לדרגות - מי מקבל יותר כסף ויותר אחריות. אם אלה הדברים שמעניינים אותך (ולא, זה לא בהכרח עניין של שטחיות או סתם חמדנות - יש אנשים שנמצאים בתפקיד אחר: משאבי אנוש, כספים, ואפילו המנכ"ל - לא צריך להיות להם אכפת מה בדיוק אומר התפקיד שלך, רק כמה הוא בכיר), אפשר לעצור כאן. אבל, גם לכולנו נוח מדי פעם להכיר את החלוקה לרמות, גם במובן של שכר וגם כדי לדעת מה סוג האחריות שאנחנו צפויים לה.
במודל השתמשתי בשלוש רמות - ג'וניור, סיניור וprinciple (ולא, לא היה לי הגיוני לכתוב את זה בעברית) כשההפרדה היא לפי מידת השפעה מרמת הוא שעושה את העבודה שלו ועד לזה שמשפיע על כל הארגון. ככל הנראה, ארגונים קטנים מאוד יצטרכו רק שתי קטגוריות, ובארגוני ענק יהיה צורך ברמות נוספות.

2. תפקידים: אלה כותרות כלליות לסוגי עבודה שונים שאפשר לכלול תחת המטרייה הרחבה של "בדיקות תוכנה". בארגוני ענק יכול להיות שיהיו צוותים שונים שממלאים את הפונקציות האלה, אבל במקומות קטנים יותר (ובארגונים גדולים שבחרו לארגן את העבודה כך), סביר שאדם אחד ימלא מספר תפקידים. למשל - ארגון לא חייב להחזיק מומחה נגישות במשרה מלאה, אבל כדאי שיהיה מישהו שמבין בתחום ויכול לעזור במשימות קשורות כאשר הן צצות.

3. כישורים ויכולות: כדי לבצע תפקיד בהצלחה, נדרשים כישורים מסויימים. יהיה קשה מאוד לתפקד כמומחה ביצועים בלי להכיר כלי או שניים שמייצרים עומסים או בלי לדעת לנתח צריכת משאבים. עבור כל כישור ניתן לבנות מסלולי הכשרה - ביצוע משימות קשורות, קורסים, הרצאות וכדומה, ולהגדיר דרכים למדידת המיומנות של עובד בכישור מסויים.

העבודה שלי, באופן טבעי, רחוקה מלהיות מושלמת. אני משתמש בשפה שמובנת בעיקר לי, ואין ספק שאני מחמיץ תחומים שלמים. לכן, אני מבקש מכם להוסיף הערות - הייתי שמח אם המודל הזה יהיה משותף לקהילה כולה ויוכל לשמש גם את מקומות העבודה השונים בהגדרת התפקידים שהם מחפשים ומסלול ההתקדמות וגם את בודקי התוכנה עצמם כשהם מנסים להגדיר לעצמם את מסלול ההתפתחות שהם רוצים לנסות. המודל שלי נמצא כאן ויש הרשאות הערה לכל מי שניגש אליו. ספרו לי מה עובד בשבילכם היטב, אילו הנחות אני מניח ולא מתאימות לכם, מה חסר. שתפו את המודל הזה עם קולגות וחברים כדי שגם הם ייתנו משוב. זו גם ההזדמנות להודות לPerze Ababa שהשקיע לא מעט מזמנו ועזר לי לשים לב לכמה פערים ולראות גם חלק מנקודת המבט של ארגונים גדולים באמת.

What's the career path of a tester?

עברית

Link to the model

When programmers set their foot on their first job, there are quite a lot of options they can see and choose to specialize in: Embedded systems or highly distributed cloud architectures, specific programming language such as Java, Python or c, optimization and debugging, and that's even before we consider business domains. They will often find themselves with more experienced developers who will give them feedback on their work during design & code reviews, who will talk about concepts they still need to master such as secure code design, efficiency or design philosophies. They will have plenty of opportunities to go back to a piece of code they wrote and experience their mistakes biting in their respective behinds, just so that they can see how much progress they have made. All in all, that's an amazing learning environment even before we consider the plethora of easy to find courses and online resources that teach relevant skills in the appropriate level.

How is it for software testers? Well, one might say it's the complete opposite. It's not uncommon for testers to find themselves the sole tester in a team, or even in the whole start-up that hired them. Most of the people they meet, including some more experienced testers) won't know a lot about testing and in many places testers are so down the pecking order that it can seem that any career progress is pointing outside of testing - to programming, devOps, product management, or just to being a manager. What skills are required to function well as a function tester? One can find as many opinions as there are people, and most of them won't be actionable. Courses? By a simple search one will stumble upon a lot of ISTQB content, but this content is the opposite of helpful as it provides misconceptions and bad thought patterns (more on that here). There is some good content out there (I've heard good recommendations on BBST and RST, but can't attest yet to their content personally), but if you can find it - you have already burrowed deep enough into the rabbit hole that is the very specific testing community where I've been around for a while now. As for feedback on our work - what feedback? With the exception of an occasional blaming finger of "how did this bug escaped QA" (which we try to avoid), there's very little feedback on testing done right or wrong. What about the times where we were slacking but by chance there wasn't any critical blowup that happened? Or when we were too careful and slowed down the entire development process? Yes, there are outliers that have managed to build a way to get good, reliable feedback on their product from the real world, where it's deployed. But guess what? They usually don't really need testers.
In short, taking the view of an inexperienced software tester, it looks as if all growth paths are leading outside of testing. A small part of those testers will manage to find their way inside the testing world (and some of those will make progress to the level where formal titles do not limit them and matter a lot less) but most capable employees will find a way out of testing. As a result, places looking for experienced, high-skilled testers will have a hard time finding them, and those testers will have a difficult time finding a workplace that will appreciate their skills.

Where I work, we decided to tackle this difficulty initially by looking for inexperienced employees, assuming that it will be easier to teach people good testing if there aren't any bad habits that needs to be broken. We knew that this plan has some drawbacks: We'll have to invest more in training, and we are taking a gamble on their ability to develop the skills we need - which is more difficult than assessing the existing skills of a person. Those were challenges we assumed we could overcome. Naturally, we did not foresee all of the challenges we would face. One of which was employee attrition. To be fair, it's not that we completely ignored this risk, we just assumed that it would be at the same rate as in other teams in the company. Sure, software testing has bad reputation, but we're building a strong team and we can show employees that they can grow significantly and stay.
Well... no. Inexperienced software testers apply for their first (or second) positions for many reasons, and the most prominent I've encountered is that Software testing has a lower entry bar to "high-tech" jobs, and they assumed (with some degree of truth in it) that it would provide them with a good stepping stone towards the position they actually aim for. Talking with people who left us raised a common pattern: They decided that they can't build their career in testing, and since they did gain some relevant skills, they were ready to make a transition. They also didn't feel they were improving as testers - they could say they are better coders, but the rest of the skills they improved were invisible to them. More importantly, they couldn't see how those skills would help them find their next, hopefully better, job. This brings me back to the job market: Developer jobs are specific (see the list at the top for examples). Testing jobs are almost always "tester\QA\QE\SDET" - nothing that will help differentiate between testers with different experience. It means that the ability to articulate one's added value or interests requires deep thought that is not easy to do at the beginning of one's career, especially if they work in a place that is still building its career ladders and work procedures.

I want to build a map. Something that will describe possible progress paths inside the role of an individual contributor software tester. It might have exit paths to other roles or to management , but the focus is on software testers and the various kinds of tasks they might perform. Such a map will enable people to look and say "Those are my areas of expertise, this is what I want to do next". For beginners, it can help visualizing their progress, they could look and say "A year ago I had some basic fundamental skills, since I've gained knowledge of accessibility testing, delved a bit deeper into load generation and log filtering and I'm better at risk assessment than I was then". They could then go and say something like "I enjoyed trying to make sense out of heaps of data, so I might want to look into production monitoring and deeper into performance tests, for both I will need skills in data visualization, working with big data and data storage and retention strategies.
With some luck, companies could use it when they define their career ladders, or even in their recruiting processes and ads which will lead to more diversity in hiring "tester".

To support those goals, I built a three layered model:

Seniority.
A generic, non specialized, catch-it-all division. People who only care about prestige, compensation and responsibility can stop at this level (no, they are not shallow or indifferent, they might be from other professions, such as finance, HR or even your CEO), for the rest of us, it would still be helpful to know which level is any role - both in terms of compensation and in terms of the type of skills that would be relevant to it.
In the model I use 3 such categories - Junior, Senior and Principle that represent ever growing sphere of influence - from just doing one's own job, to having a company-wide impact. Smaller companies might only need two such categories, while larger corporations might need more.
Roles.
Those are titles to the types of work people with the title of "tester" are often doing and are not considered a distinct role (so, doing part time customer support does not count). Again, the size of the organization matters. Larger ones can have teams that focus on one such role, while smaller places, or corporations that chose to organize their work differently, will have people wearing multiple hats. For instance, not every place needs or can afford an accessibility expert, but most places can benefit from someone who has some accessibility expertise and help the rest of the team(s).
Skills and capabilities.
In order to be successful in each role, certain skills are needed. Since those are the things one can learn intentionally breaking down the roles to specific key competencies will be helpful to people who try to make their way through a specific career path. At a later phase, each such skill can be augmented with resources that can help acquiring it, and companies can decide on ways to measure them to decide whether someone has achieved the necessary level of a skill for the step they want to make.

My work, quite naturally, is far from being perfect or complete, I'm using language and terminology that makes sense to me, and I'm most certainly missing entire fields. This is the point where I'm asking for your help: I would be happy if this could become a model that is developed by the community and will be usable by both companies defining their processes and individuals crafting their career.
My model can be found here and anyone with access can comment. Tell me what works for you, what does not, which assumptions that I make are not suitable for you and what you can see missing there.
This is also the place to thank Perze Ababa that contributed from his time and helped me notice some key insights I was ignoring, as well as see the perspective of the larger corporate for a while.

Friday, October 14, 2022

Testing Magic 101

עברית

Every now and then, I get to hear about "X testing", where X might be API, mobile, embedded software or just anything that isn't a website. While I normally scoff at the relevance of such distinctions that usually can be narrowed to "understand your system, understand how to interact with it, the rest of the differences are nothing you haven't seen before", it is true that the good content will help you notice parameters you possibly didn't consider such as battery consumption for mobile, or heat and power fluctuation for embedded systems.

I got to listen to a recent episode of "The testing show" on AI testing which was, sadly, another one of the infomercial-like episode. It wasn't as terrible as the ones they bring in people from Qualitest, but definitely not one of their better ones. The topic was "AI\ML testing", and was mainly an iteration of the common pattern of "it's something completely new, and look at the challenges around here, also ML is really tricky".

This has prompted me to write this post and try to lay the basics for testing stuff related to ML, at least from my perspective.

The first thing you need to know is what you are testing - are you testing an ML engine? A product using (or even based on) an ML solution? the two can be quite different.

For the past decade, I've been testing a product that was based on some sort of machine learning - first a Bayesian risk-engine detecting credit-card fraud, and then an endpoint protection solution based on a deep neural net to detect (and then block) malware. The systems are quite different from each other, but they do have one aspect that was shared to them - The ML component was a complete black-box, not different than any 3rd party library that was included in it. True, that particular 3rd part was developed in-house and was the key differentiator between us and the competition, but even when you have a perfect answer to the relevant question (is this transaction fraudulent? is this file malicious?), there's still a lot of work to do in order to make that into a product - For the endpoint protection product it would be hooking to the filesystem to identify file-writes, identifying the file type, quarantining malicious ones and reporting the attack, all of which should be done smoothly while not taking up too much resources from the endpoint itself, not to mention the challenge of supporting various OS and versions deployed in the field. All of which have zero connection to the ML engine that powers everything. If you find yourself in a position similar to this (and for most products, this is exactly the case) - you are not testing ML, you are at most integrating with one, and can treat the actual ML component as a black box and even replace it with a simulator in most test scenarios.

There are cases, however, that one might find themselves actually testing the ML engine itself, in which case start by admittin that you don't have the necessary qualifications to do so (unless, of course, you do). Following that, we need to distinguish again between two kinds of algorithms - straightforward and opaque.

Straightforward algorithms are not necessarily simple, but a human can understand and predict their outcome given a specific input. For instance, in the first place I've been the ML was a Bayesian model with a few dozen parameters. The team testing the risk engine was using a lot of synthetic data - given specific weights for each "bucket" and a given input, verify the output is exactly X. In such cases, each step can be verified by regular functional tests they might require some math, but if a test fails we can see exactly where did the failure happen. In a Bayesian case, calculating a weighted score, normalizing it, recalculating "buckets" and assigning new weights are all separate, understandable steps that can be verified. If your algorithm is a straightforward one, "regular" testing is just what you need. You might need a lot of test data, but in order to verify the engine correctness, you just need to understand the rules by which it functions.

Opaque ML systems are a different creature. While it is possible to define the expected output of the algorithm given the state it's in (unless it also has a random effect as well), there's use to actually finding them since it would not help us understand why something was a specific answer was given. Notoriously, there are the deep neural networks that are nothing short of magic. We can explain the algorithm of transitioning between layers, the exact nature of the back-propagation function we use and the connections between "neurons", but even if we spot a mistake, there isn't much we can do besides feeding it through the back-propagation function and move on to the next data point.In fact, this is exactly what is being done on a massive scale while "training" the neural net. With opaque systems testing is basically the way they are created, so accepting them as fault-free is the best we can do.

That being said, ML algorithms are rarely fault free, and this brings us to the point we mentioned before - most of our product is about integrating ML component(s) to our system and we should focus on that The first thing is to see whether we can untangle it from the rest of our system. We could either mock the response we expect or use input that is known to provide certain result and see that our system works as expected given a specific result from the ML component.

Clever, just assume that correctness is someone else's problem, and we're all peachy. Right?
Well, not exactly. Even though in most cases there will be a team of data scientists (or is it data engineers now?) who are building and tuning the model, there are cases where we actually need to cover some gaps in figuring out whether it's good enough - Is our model actually as good as we think it is? Maybe we've purchased the ML component and we take the vendor's claims with a grain of salt.

A lot of the potential faults can be spotted when widening the scope from pure functionality to the wider world. Depending on what our ML solution does, there's a plethora of risks - from your chatbot turning Nazi to describing black people as apes to being gender-biased in hiring, that's without considering deliberate attacks that will run your self-driving car to the ditch. To avoid stupid mistakes that make us all look bad in retrospect, I like to go through a list of sanity questions - ones that have probably been addressed by the experts who've built this system, but just in case they got tunnel vision and forgot something - The list is quite short, and I'm probably missing a few key questions, but here's what I have in mind:

Does it even makes sense? Some claims (such as "Identifying criminals by facial features") are absurd even before we dive deeper into the data to find the problems that make the results superficially convincing).
The "Tay" question: Does the software continues to learn once in production? If it does - how trustworthy is the data? what kind of effort would be needed to subvert the learning ?
The Gorillas question: Where did we get the training data from? is it representative of what we're expecting to see in production?
Our world sucks question: Is there a real world bias that we might be encoding into our software? Teaching software to learn from human biased decisions will only serve to give this bias a stamp of algorithmic approval.
Pick on the poor question: Will this software create a misleading feedback loop? This idea came from Cathy O'Neil's "Weapons of Math Destruction" - Predictive policing algorithms meant that cops were sent to crime ridden neighborhoods, which is great. But now that there are more cops there, they will find more crimes - from drunken driving to petty crimes or jaywalking. So even if those areas are back to normal rate of crime, it will still get more attention from the police and will make the life of the residents there more difficult.
"Shh.. don't tell" question: Is the model using data that is unlawful to use? Is it using proxy measures to infer it? Imagine an alternative ML based credit score calculator. It helps those who don't have a traditional credit score to get better conditions for their credit. Can it factor in their sexual preference? And if they agree to disclose their social profiles for analysis, can we stop the algorithm from inferring their sexual preferences?

After asking those questions (that really should be asked before starting to build the solution, and every now and then afterwards), and understanding our model a bit better, we can come back to try and imagine risks to our system. In security testing (and more specifically, threat modeling) there's a method of identifying risks called "movie plotting" where we assemble a diverse team and ask them to plot attacks in a movie like "mission impossible". This idea could work well to identify risks in incorporating ML components to our business, with the only change is that the movie plot will be inspired by to "Terminator" or "The Matrix"

And yet, the problem still remains: Validating ML solutions is difficult and requires a different training than what most software testers have (or need). There are two tricks that I think can be useful.

Find an imperfect oracle: it could be that you have competitors that provide similar service, or that there's a human feedback on the expected outcome (you could even employ the Mechanical Turk). Select new (or recent) data points and compare your system results with the oracle ones - The oracle should be chosen such that every mismatch is not necessarily a problem, but that it's something that is worth investigating. Keep track on the percentage of differences. If it changes drastically, something is likely wrong on your side. Investigate a few differences to see if your oracle is good enough. In our case, we try a bunch of files that enough of our competitors claim to be malicious, and if we differ from that consensus, we assume it's a bug until proven otherwise.
Visualize. Sometimes, finding an oracle is not feasible. On the other hand, it could be that people can easily spot problems. Imagine that Google were placing a screen in several offices and projecting crawled images with what the ML thinks it actually is. It is possible that an employee would have seen it classifies people as apes way before real people were offended by it.

With that being said, I want to circle back to where I've started: While I hope you've gained an idea or two, testing is testing is testing. With any sort of system that you test, you should start by understanding just enough about it and how it interacts with your business needs, then figure out the risks that you care about and find the proper ways to address those risks.

מבוא לבדיקות קסם

English

מדי פעם יוצא לי לשמוע על "איך לעשות בדיקות X", כשאיקס יכול להיות כל דבר - מובייל, API, מערכות משובצות חומרה (אמבדד, בלע"ז) או תכל'ס כל דבר שהוא לא אתר אינטרנט. בדרך כלל אני מגחך לשמע כל ההייפ הזה מסביב למשהו נפוח ומלא בעצמו שבדרך כלל אפשר לצמצם ל"אני צריך להבין את המערכת שאני בודק, להבין איך אני מתממשק מולה, כל השאר זה שום דבר חדש". אם התוכן ממש טוב, אז יהיה בזה קצת ערך מוסף - יכולים להפנות את תשומת הלב שלי לפרמטרים שאני בדרך כלל לא חושב עליהם. כמו צריכת סוללה בבדיקות מובייל, או השפעות של תנודות בטמפרטורה ובזרם במערכות חומרה.

יצא לי להאזין לפרק של The testing show על בדיקות AI. הפרק היה, למרבה הצער, עוד פרק על גבול הפרסומי בו הביאו מישהי אחת לדבר על המוצר שהיא מוכרת. סליחה, על אילו אתגרים בבדיקות יש במוצר הזה. זה לא היה גרוע כמו בפעמים בהן מגיע מומחה מטעם Qualitest, אבל זה לא היה רחוק מזה. בכל מקרה, הנושא היה למידת מכונה, ומעט התוכן שהיה שם ולא היה סתם היכרות או קשקשת ידידותית, היה בעיקר חזרה על "יש פה משהו חדש לגמרי ושונה מכל מה שהכרנו עד היום, ותראו אילו אתגרים יש פה! כבר אמרתי שזה קשה?".

הפרק הזה גרם לי לרצות לכתוב טקסט עם קצת יותר תוכן שיעזור לאנשים שמרגישים קצת אבודים ויפרוש את מה שבעיני הוא הבסיס.

כנראה שהדבר הראשון שצריך לענות עליו הוא "מה אנחנו בודקים?" האם אנחנו בודקים מנוע לומד? האם אנחנו פשוט מחברים חתיכה של קסם למוצר שלנו ואומרים "זה יסדר לנו את החיים"? שני הדברים האלה שונים למדי ודורשים אסטרטגיה שונה.

בעשור האחרון עבדתי עם מוצרים שהיו מבוססים על למידת מכונה. קודם על מוצר שעוטף מודל בייסיאני כדי לאתר הונאות בכרטיסי אשראי, אחר כך מוצר שעוטף רשת נוירונים כדי לחסום רושעות. לא בטוחים מה אחד המונחים האלה אומר? אפשר לקרוא על זה בזמנכם הפנוי, כרגע מספיק אם נתייחס לזה כאל כישוף שחור 1 וכישוף שחור 2. למרות ששני המוצרים שונים מאוד זה מזה בכמעט כל פרמטר, מבחינת שניהם המודל היה סוג של קסם - זורקים עליו את הנתונים בצורה הנכונה ומקבלים בחזרה תשובה שתהיה חלק מעץ ההחלטות. נכון. הקסם הזה פותח באותה חברה, והוא גם היתרון היחסי על המתחרים, אבל גם כשיש תשובה מושלמת לשאלה (האם מישהו משתמש בכרטיס אשראי גנוב? האם הקובץ הזה הוא זדוני?) עדיין יש לא מעט לעשות כדי להפוך את זה למוצר - זה לא מספיק לדעת לזהות כל רושעה, צריך גם להתחבר למערכת ההפעלה כדי לתפוס קבצים שנכתבים, צריך לדעת לבודד אותם ולהתריע על המתקפה. כל זה צריך לעבוד בצורה חלקה בלי לגזול יותר מדי משאבים, לתמוך במגוון מערכות הפעלה ועוד. לכל הדברים ההיקפיים האלה יש בדיוק אפס קשר לבדיקות של למידת מכונה והם יהיו עיקר העבודה של מי שמוצא את עצמו בודק מוצר כזה. אלה לא בדיקות של למידת מכונה, אלה בדיקות אינטגרציה עם למידת מכונה, וזה די סבבה להתייחס לרכיב הזה כאל קופסה שחורה, או אפילו להחליף אותו עם איזה סימולטור ברוב המקרים.

מצד שני, לפעמים צריך לבדוק את המנוע עצמו. כשזה קורה חשוב לזכור שאין לנו את ההכשרה הנדרשת לזה (כלומר, לך אולי יש, אבל מתמטיקה ברמה הזו לא נדרשת לרוב תפקידי פיתוח התוכנה), אחרי שעשינו את זה, נחלק את העולם לשני סוגי אלגוריתמים - הגיוניים וסתומים.

אלגוריתמים הגיוניים הם כאלה שאנשים יכולים להבין בצורה אינטואיטיבית ולדעת בערך איך תיקון מסויים ישפיע עליהם. למשל, אם נסתכל על המודל הבייסיאני שעבדתי מולו בעבודה הראשונה שלי - הוא קיבל כמה עשרות פרמטרים, מיקם כל אחד מהם בתוך "דלי", לכל דלי כזה היה משקל, סכמנו את כל הדליים הרלוונטיים, והופ! יש לנו תשובה. הצוות שבדק את המנוע הזה לא היה צריך להתמודד עם כל המורכבויות של למידת מכונה. פשוט לקחת מודל סינתטי, לזרוק עליו הררים של נתונים מפוברקים, ולעקוב אחרי החישוב הצפוי. לארגן נתונים נוספים, להפעיל את חישוב המשקלות ולראות שהכל עובד היטב. ברמה בה הם עבדו, הכל היה דטרמיניסטי לחלוטין והאתגרים שם היו שונים מאשר אלו שמקובל לדבר עליהם לגבי למידת מכונה. כדי לוודא את נכונות המנוע צריך היה רק לעקוב אחרי החוקים המתמטיים. זה אולי מפרך, אבל לא קשה.

הסוג השני הוא האלגורתמים הסתומים. לא, לא במובן של מטומטמים, אלא במובן של נסתרים (כמו בביטוי "נסתם ממנו", או ההגדרה השנייה כאן). בעוד שטכנית אפשר לחשב באופן ידני את התוצאה אם ידוע לנו המצב הפנימי והקלט בדיוק כמו במודלים ההגיוניים (ואין חלקים אקראיים), כאן אין טעם לעשות את זה כי זה לא יוסיף שום דבר להבנה שלנו. ידועות לשמצה בחוסר המובנות שלהן הן רשתות הנוירונים שגם המומחים הגדולים מסתכלים עליהן ואומרים "עזוב, קסם" (זה לא לגמרי מדוייק, יש מחקר אקדמי שנועד להפוך אותן לנהירות יותר, הוא לא טוב מספיק עדיין). אנחנו אולי יכולים להסביר את ארכיטקטורת הרשת ולמה היא הגיונית באופן אינטואיטיבי, ואנחנו יכולים להסביר ואפילו לחשב את פונקציית ההפצה לאחור (back propagation), אבל גם אם יש לנו טעות - אין לנו מושג איך לתקן את התוצאה חוץ מאשר לאמן את המודל בעוד המון נתונים ולקוות שלא שברנו שום דבר שפעל קודם. עם אלגוריתמים סתומים, בדיקות הן בעצם חלק בלתי נפרד מתהליך הבנייה שלהן (מאמנים את המודל על סט אימון, בוחנים את הביצועים שלו על סט נתונים נפרד. אם הוא לא מספיק טוב, ממשיכים לנסות) ולכן האסטרטגיה הכי הגיונית מבחינתנו היא לקבל אותם כקופסה שחורה נטולת טעויות.

הממ.... להכריז על בדיקת הנכונות כעל בעיה של מישהו אחר וכל הבעיות שלנו נפתרו... נכון שזה מתוחכם?
אז זהו, שלא בדיוק. למרות שבדרך כלל יהיה צוות של מדעני נתונים שבודק את המודל ואת ההתאמה שלו לעסק, יהיו מקרים בהם נצטרך להשלים פערים מסויימים - האם המודל ממשיך לעבוד גם מחוץ למעבדה? אולי בכלל קנינו את הרכיב הזה ממישהו אחר ואנחנו לא סומכים במאה אחוז על הצהרות היצרן?

הרבה מהבעיות האפשריות עם למידת מכונה הן לא בעיות של תפקוד "לא נכון", כמו שהן בעיות של תפקוד לא צפוי, ולכן תקלות פוטנציאליות יהיו ברורות אם נרחיב קצת את המבט שלנו ונסתכל על ההקשר בו התוכנה שלנו פועלת. זה לא שחסרות תקלות "מפתיעות" שקשורות ללמידת מכונה - מצ'אטבוט נאצי, זיהוי אנשים שחורים כגורילות, הטיה מגדרית בסינון קו"ח וזה לפני שאנחנו סופרים מתקפות מכוונות שגורמות למכוניות אוטונומיות לעשות שטויות.כדי להימנע מתקריות שיביכו את החברה שלנו מאוד, אני אוהב לשאול כמה שאלות בסיסיות שצריכות להישאל עוד לפני שמתחילים לחשוב על שימוש בבינה מלאכותית לפתרון בעיה, והמומחים שבנו את המערכת כנראה כבר שאלו, אבל רק למקרה שהם החמיצו משהו (או, שוב, אם קונים מוצר מדף):

כמה זה מטומטם? יש טענות (כמו היכולת לזהות פושעים לפי תווי פנים) שהם מספיק לא הגיוניים עד שאין צורך לחפש את הבעיה בנתונים כדי לדעת שיש אחת כזו.
יש מלא נאצים שם בחוץ: האם המודל ממשיך ללמוד תוך כדי שהמוצר נמצא בשטח? אם כן, עד כמה המידע שם אמין? איזה מאמץ יידרש כדי להטות את המודל בצורה שלא תמצא חן בעיני?
מוצא האדם מן הקוף: מניין השגנו את הנתונים עליהם אימנו ובחנו את המודל? האם הם מייצגים את המציאות שהמוצר שלנו יפגוש?
העולם דפוק: האם יש איזו בעיה בעולם האמיתי שהאלגוריתם שלנו מנציח? אם כשבני אדם מקבלים החלטה הם עושים את זה בצורה מוטה ומפלה, ללמד תוכנה להחליט כמונו רק ייתן להחלטה המוטה סמכות כי "המחשב החליט".
בוא נציק לערבים: האם התוכנה שלנו תייצר מעגל משוב מזיק? הרעיון הגיע מהספר של קת'י אוניל "Weapons of Math Destruction" שם היא מדברת על מוצרים שנועדו לעזור לכוחות משטרה (בארה"ב) לחלק את כוח האדם המוגבל שלהם בצורה אפקטיבית יותר ושלח יותר שוטרים לשכונות מוכות פשע. אלא מה? נתוני הפשיעה מגיעים ממה שהמשטרה כבר מצאה. אז גם התחלנו עם הטייה נגד שחורים, וגם עכשיו נמשיך למצוא יותר פשיעה בשכונות האלה, כי מי שנוהג שיכור בשכונה אחרת פשוט לא ייתפס באותה מידה, אז נמשיך לשלוח שוטרים לאותן שכונות ולמצוא בהן יותר פשעים משכונות אחרות.
אל תשאל, לא תשמע שקרים: האם התוכנה שלנו משתמשת במידע אסור כדי לקבל החלטות? אם נחזור לסינון קורות החיים, אנחנו אולי חושבים שהתוכנה לא תפסול מועמדים רק כי הם השתחררו מצה"ל בדרגת רס"ן וטוחנים מילואים, אבל עם מספיק קורות חיים היא תמצא סימנים מקשרים (למשל, לימודים באמצע השירות הצבאי או כניסה מאוחרת לשוק העבודה) שיאפשרו לזהות מי קצין בכיר גם בלי שזה יהיה כתוב באופן מפורש. אם המודל שלנו סתום, הצלחה לנו עם למצוא את זה.

אחרי שקיבלנו תשובות טובות מספיק לשאלות האלה, כדאי לחזור רגע להגדרת המשימה - מה אנחנו מנסים לעשות? איך זה הולך להתפוצץ לנו בפנים? בעולם אבטחת התוכנה יש שיטה לגילוי סיכונים שקוראים לה "תסריטאות" - מושיבים אנשים עם רקע מגוון בחדר ואומרים להם "תתפרעו, אנחנו כותבים סרט בסגנון משימה-בלתי-אפשרית על לתקוף את המוצר שלנו". אפש לעשות משהו דומה גם כאן, רק שסגנון הסרט יהיה יותר בכיוון של "שליחות קטלנית" או המטריקס.

ועדיין, הבעיה נותרת בעינה: לוודא מערכות לומדות זו משימה מורכבת שדורשת הכשרה שונה מזו שיש לרובנו. יש עוד שני טריקים שאני מאמין שיכולים לעזור.

אוראקל לא מושלם: יכול להיות שיש לנו מתחרים שפותרים את אותה בעיה, או משוב שחוזר לגבי כמה התשובה שהאלגוריתם נתן מתאימה. אולי זו סטטיסטיקה כלשהי שאנחנו מודדים, אולי זה משוב אנושי (אפשר גם להפעיל את הטורקי המכני של אמאזון אם אנחנו ממש מגזימים). מה שחשוב הוא שתהיה לנו דרך לומר על מקרה מסויים אם התוצאה הייתה נכונה או שגוייה. אם מצאנו אוראקל לא פגום מדי, אז כל "טעות" היא משהו שמעניין לחקור. למשל - כדי לבדוק את היכולת שלנו לזהות זודנות, אנחנו סורקים קבצים שמספיק מהמתחרים שלנו טוענים שהם זדוניים. לפעמים קורה שאנחנו צודקים יותר, אבל ברוב המקרים, אם אנחנו לא מסכימים עם הקונצנזוס, יש לנו מה לתקן. שיחקנו קצת עם "כמה מתחרים צריכים לחשוב שקובץ הוא זדוני" כדי שברירת המחדל תהיה "אם יש חוסר הסכמה, זה באג עד שיוכח אחרת".
ויזואליזציה. לא תמיד אפשר למצוא אוראקל בקלות (כמה מתחרים יש לגוגל בסיווג תמונות לקטגוריות?), אבל בחלק מהמקרים בן אדם יכול לזהות בעיות במבט חטוף. כמה מהר גוגל היו מוצאים את בעיית הגורילות אילו הם היו מקרינים על מסכים במשרד תמונות ביחד עם התיוג שלהן? או אם הם היו שולחים לכל עובד דוא"ל יומי עם שלוש תמונות ובקשה לאישור?

לסיכום, אני רוצה לחזור על מה שכבר כתבתי כאן בהתחלה: בעוד שאני מקווה שקיבלתם רעיון או שניים, הבסיס נשאר אותו דבר: הבינו את המערכת שלכם, הבינו איך לתקשר איתה ואילו סיכונים רלוונטיים יש במערכת. אחרי שהבנתם את זה, מצאו את הדרך הטובה ביותר לחפש את הסיכונים האלה.

Tuesday, September 20, 2022

defensive coding

עברית

One of the things happening to people in testing positions is that every now and then we get to say "I told you so", usually around a bug report that was filed and closed as a "won't fix\not a bug\not important" and came back to bite us in the rear. While there's always the basic joy of being right (and more importantly, of other people being wrong), over the years I've learned to see those cases as a professional failures instead of sources of joy. After all, I saw the problem in advance, knew that it was a problem and maybe I could have done something differently to actually get it fixed. Maybe I could have presented the problem differently, talked to other people who could advocate it better for me, collected more evidence or perhaps it was only a matter of being more persistent in asking it to be fixed. In other cases, there was nothing I could do at the time since the reason for not addressing it is rooted in the organizational culture, that I now can start pushing towards. Saying "I told you so" is not the professional thing to do.

Last week I had just such a case - Something didn't work for a customer, and upon further inspection - had never worked since deployment. Before we got to difficult debugging, we went over the short checklist of problems. Something quite equivalent of your ISP tech support asking you to reboot your router when you call. In our case, this list consisted on one thing - checking the configuration file on the server. With two relevant entries, it was a rather short glance - the authentication token looked fine and the destination URL pointed to the correct base path. So far, so good.

Then, by sheer luck, something stirred my memory and I've noticed that the URL has been typed with schema+FQDN, you know - the way URLs are usually formatted. I recalled that when I've worked on that feature there was something odd regarding to having the schema provided in the file. A short trip to our bug tracking system, and indeed my memory was correct - if we provide the schema (for those less fluent in this specific terminology, that's the https:// part) , it won't work as the client consuming this configuration will add the schema themselves, and https://https://any.domain will fail. To make it that much more fun, there won't be any way to understand that from the logs. The ticket was logged in last December (about 9 months ago!) and in the discussion around it there were some acceptable reasons to not fixing it, a case could have been made for a fix anyway, but back then it would have been a harder battle to win. It's not that any fix would have been difficult - the team configuring the server could add a regular expression validation to their tool, the server could reject the config or remove the schema, the client could do the same and log a meaningful error and all of us could be monitoring for this feature once it was deployed We could even change the name of the parameter so that instead of "...URL" it would be "...DOMAIN" and reduce the chance of errors. For almost all steps that we could have taken but did not there's a common reason: Optimistic coding.

Optimistic coding is a state of mind where we assume that everything is going to be fine - the API is to be used only internally, so everyone will know what they should be doing. And if someone makes a mistake? Well, it's their problem to fix.

What we should be doing (and I intend to use this incident as leverage to push towards such behavior) is to create our software with a slightly more paranoid approach. Software is created to be used and operated by human beings, and human beings make mistakes. We need to assume that people operating, configuring and debugging the system will act very stupidly, not because they are stupid but because they are doing something else, under time-pressure and with a lot of distractions around. Most likely, at least some of those people will be our future selves. If we keep that in mind we can adopt a "Nothing gets passed me" approach - any mistake that can be detected in a given phase should be dealt with at this place - fix it if possible, return (or report) an error if fix is not possible. and almost never let a problem pass from one component to the next.

Sunday, September 18, 2022

קידוד מתגונן

English

אחד הדברים שקורים למי שבודק תוכנה למחייתו הוא שמדי פעם עולה הזדמנות לומר "אמרתי לכם", בדרך כלל מעורב בזה באג שנסגר תחת "זה לא יקרה\לא בעיה\לא נתקן". יש בזה משהו מאוד מספק ברמה ילדותית שכזו, אבל לאורך השנים הרגלתי את עצמי להסתכל על אירועים כאלה כעל כישלון מקצועי. במקום ריקוד ניצחון קטן והידיעה שבפעם הזו הייתי צודק יותר ממי שזה לא יהיה שהתווכחתי איתו, אני מנסה להבין מה יכולתי לעשות כדי לא להגיע למצב הזה מלכתחילה. לפעמים זה עניין של להתנסח טוב יותר, לאסוף ראיות מתאימות או לדבר עם האדם הנכון. לפמים נדרש שינוי תרבותי משמעותי יותר כדי שבעיות מהסוג הזה יילקחו ברצינות. כך או אחרת, אם משהו הצליח להרגיז לקוח, זה לא מספיק לכסות את הישבן שלי ולומר "אבל אמרתי לכם", צריך להבין איך יכולתי לעזור לארגון להימנע מזה מלכתחילה.

בשבוע שעבר קרה לי בדיוק מקרה כזה - משהו לא עבד בשביל לקוח, ומסתבר שלא עבד בכלל מהרגע בו הוא קיבל את המוצר. לפני שניגשנו לדבג את הבעיה ולחקור הכל כמו שצריך, התחלנו עם מעבר בסיסי על שטויות מטומטמות. קצת כמו שמתקשרים לתמיכה הטכנית ושואלים אם ניתקנו את הראוטר מהחשמל וחיברנו חזרה. הפעם, סט הבדיקות היה קצר למדי - בודקים שקבצי הקונפיגורציה בשרת תקינים, כל דבר יותר מזה כבר דורש לקבל אישור מהלקוח. לפיצ'ר הספציפי הזה היו שני פרמטרים רלוונטיים, מזהה כלשהו ופרמטר נוסף שמייצג URL אליו צריך להתחבר. שניהם נראים די סבבה בסך הכל. ואז, בתופעה של יותר מזל משכל, משהו בזיכרון שלי קפץ וצעק "רגע! רגע!" זכרתי שהיה משהו בו אם מספקים פרמטר של URL אבל כוללים בפנים גם את הסכמה (יעני, https://) אז יש בעיות. או להיפך, אם לא כוללים אותה. מפה לשם, בדקנו את מערכת תיעוד הבאגים שלנו ואכן, זכרתי נכון. בשדה שנקרא משהו משהו URL, הערך שצריך להכניס הוא בעצם שם הדומיין (ובלעז - FQDN). יותר מזה, בתיאור הבאג אי אז בדצמבר האחרון (לפני תשעה חודשים!) אפילו כתבתי שבגלל שם המשתנה מישהו עלול להתבלבל ולכלול את הסכמה בלי לדעת שזה יעשה צרות. סטטוס הטיקט - לא באג. האמת? היו כמה נימוקים סבירים לגמרי - זה משתנה שנערך רק על ידי צוות פנימי, יש לנו שדות אחרים בשם URL עם אותה מגבלה ממש, וככה היה החוזה שסגרו מול הצוותים השונים. זה שפשוט למדי לתקן את זה? אז פשוט. גם במבט לאחור, זה לא דיון שהייתי יכול להגיע בו לתוצאה אחרת באותו זמן, כי הגישה של "אם מישהו אחר בשרשרת טועה זו בעיה שלו" הייתה רווחת מאוד, וגם אילו הייתה לי דרך להסלים את זה (לא היה אז למי, היום אולי יש) - זה לא בהכרח משהו שכדאי לפתוח סביבו מאבק שיפגע ביכולת שלי לעבוד בהמשך עם האנשים. זה לא שחסרו דברים שיכולנו לעשות - אם הצוות שאחראי על קינפוג השרת היה מסדר ולידציה פשוטה בתהליך האוטומטי שמפעיל את זה, אם הצוות של השרת הזה מוריד את הסכמה אם היא קיימת, או אם הקליינט היה עושה אותו דבר, אם אחד הצוותים היה דואג לשורת לוג ברורה שהיינו טורחים לנטר או אפילו אם היינו פשוט משנים את שם המשתנה כך שבמקום URL תופיע המילה DOMAIN - סביר להניח שלא היינו נמצאים היום במצב בו לקוח כועס עלינו.

הבעיה הייתה, ועודנה, אופטימיות יתר. מישהו טעה? המערכת קורסת? זה בגלל מי שטעה וזו בעיה שלו. אין לנו מערכת מסודרת להעברת כל הפרטים הקטנים האלה שהיו לי ברורים מאליהם ולא טרחתי לתעד? או להעביר את המידע הלאה גם אחרי ששמתי לב שיכולה להיות כאן אי-הבנה? אז אין מערכת, שישאל אם הוא לא בטוח. זו התרבות הנוכחית - חזקה יותר בצוותים מסויימים, פחות חמור בצוותים אחרים.

בסופו של יום, צריך לזכור שקוד נכתב כדי שבני אדם ישתמשו בו. ובני אדם טועים. אנחנו רוצים לבנות מערכות שעוזרות לנו להתמודד עם המציאות הזו - לתקן טעויות אנוש אם אפשר, לעצור את הטעות אם אי אפשר ולא להעביר אותה לנקודה הבאה בשרשרת, ולהציף את השגיאה כמה שיותר מהר. הטענה "זה עומד בדרישות המוצר" פשוט לא מספיקה.

Thursday, September 15, 2022

Book review - Radical Candor: Be a kick-ass boss without losing your humanity, by Kim Scott

Radical Candor is a book I got to after years of getting references to it from various places - blogs, other books, people. It means that I had high expectations *and* was pretty sure I wouldn't be surprised by the content. Not an easy place for a book to be. Despite those difficult starting conditions, it manages to live up to the reputation it has, and to pack the information in a useful, coherent way.

I've listened to an audiobook of the 2nd edition, and it starts with an attempt to defuse a common backlash of the 1st edition: Radical candor is not a permission to be an arsehole, nor is it an invitation to be cruel. The main reason of being truly candid with someone is because we care. It is this care that drives us to provide feedback even if it's painful, and to make sure the recipient is able to make use of it. The book cover is providing a neat summary of the main idea behind this book. Relevant behavior is measured on two axes: Caring personally and challenging directly. Caring personally is being interested in the well being of the person you are working with (in the book, the people you manage). Not offending them, providing them with opportunities to improve, etc. Challenging directly, on the other hand, is about getting things done - pointing out mistakes, being accurate and concise, regardless of how people feel about it.

Those axes create four distinct categories:

Manipulative insincerity: Low caring, low challenging. This is where you show no care for the other person and for the mission - you avoid conflict by not giving someone difficult feedback, but having no problem backstabbing them when they are not around. You might want to be on that person's good side, to pat their ego or simply to get that person off your hands. If you see someone being sweet and enthusiastic with someone, only to sigh and roll their eyes the moment the other person leaves - that's it.
Ruinous empathy: High caring, low challenging. The intentions behind this are kind - not making someone feel bad, giving them leeway since they are in a difficult time, or simply wanting to avoid conflict. The end result though is indistinguishable from manipulative insincerity, as in both cases you will keep silent or give underserved praise. The fact that you mean well doesn't really matter, as the road to hell is paved with good intentions and in this case, hiding the mess under the carpet will come back to bite both of you in the rear.
Obnoxious aggression: Low caring, high challenging. Surprisingly, this is actually the second best place to be in. People getting the business end of this behavior might cry, stress out or feel attacked, but things actually get done and either improves or breaks completely. If people can grow the necessary thick skin to survive, they will get direct feedback and could build on it to improve. Don't expect employee retention to be high, though, as this assault on the employees ego and confidence will tire most of them enough to quit.
Radical candor: High caring, high challenging. This is the sweet spot between obnoxious aggression and ruinous empathy. You give employees actionable feedback and help them process it and improve from it. You take care not to say "This code is shit", but rather "This code isn't good enough, it should be broken to smaller functions, improve variable names and care more about log levels, you usually do better".

It's important to notice that there isn't a recipe for being radically candid with someone, as what matters is not what was said but rather what was heard and understood. One person will understand "this work is shit, take the time to do it well" as a personal judgement and will be discouraged, while another would see it as an honest evaluation of the work and an appreciation of their skills when not under pressure. The first might be at a loss about what is wrong and would require more direct guidance such as "it works, but will be hell to maintain unless we clean up the wording and build a better structure while the second might get annoyed with you explaining the obvious and micromanaging them. the difference could be rooted in personal preferences on getting feedback, but it can also be a result of how much the person trusts you, how confident they are in their current situation and skills or how they woke up this morning. Complicated? it sure is. You will make mistakes, and the only question is how many and how severe they will turn out to be. A strong relationship is a good buffer to absorb such mistakes, so it is worth investing in it from the onset.

After the reader (listener, in my case) has understood both what is radical candor and why is it important, it's time to implement those ideas. The book goes on to discuss strategies of creating a radically candid culture around you, peppered with examples from the author's experience that helps understanding the theory as well as the need for anyone to devise their own strategies. I won't go into all of the details as the book does it way better than I could (plus, the author has put a lot of effort into it, go read her work, no mine) but all in all, it's definitely a book I'll listen to again, and it helped me frame office (and personal) communication in a helpful manner.

At the end of the audiobook there was a preview of another book: "How to Root Out Bias, Prejudice, and Bullying to Build a Kick-ass Culture of Inclusivity". The text there is a little bit less polished and I'm not sure I got the the bottom of what was communicated to me, but even so it was very moving, which is what to be expected from an introduction section to a book (as I believe this text to be) I found myself wondering several times "What the F?? which decent human being would behave that way? there's bias, and there's straight out harassment". So, this is a solid "maybe" with good potential.

Saturday, July 16, 2022

How to botch a consulting gig

גרסה עברית

The company where I work in is undergoing a process change. More specifically, it's branded as an "Agile transformation", which in my eyes is a rather necessary step, as we are dealing with the pains of growing from a small startup to a larger organization, and the informal channels that were so great before are now creating chaos and making it that much harder to get work done. In order to do this properly, we have brought some external coaches to help us figure out the right process for us. It started pretty well: The VP of engineering has stated our goal in a town-hall meeting, and introduced the consultants who were to start working soon. At least to me, the message was pretty clear, and it was nice to focus our attention on our working process.

Then, at least from my perspective it went downhill. However, before I go on to identify the mistakes I saw I want to clarify my point of view - I am an individual contributor in the company, one of about 70 (If I have my numbers right). I'm not part of the leadership team and did not attend many of the meetings, so I'm assuming I don't see many of the constraints.

With this disclaimer out in the open, the list of botches is short but seemingly fatal.

After the first announcement came more than a month of silence. We've seen the consultants wander around in the corridors or sitting with upper management in meetings, but nothing was transmitted down. We were left in the dark about the prep work they were probably doing. This long period of time meant that momentum was lost after a good pitch and it also gave people time to speculate, to worry and to build a negative image of the consultants. After all, when you see someone every office day, you expect to have at least a rough idea about what they do. If you can't get that, assuming this person does nothing is the default. This dark period was long enough that I went to the PM (who had a strong part in coordinating this transition) when they plan to start working, which I only did after someone in my team asked me whether such conduct is normal.

They were, probably, gathering information and building a strategy for the organization, which leads me to the second botch: They talked almost exclusively to management. I know that they asked for some names of non-managers to talk to, but have no idea whether they followed up with any of them. I know that my name was given to them and that I was not approached . Talking to non managers in the information gathering phase is critical, as it enables you to have a more complete picture of the organizational state, as well as make sure people feel their voice is heard.

It's not very surprising that the next fail point was the solution presentation.

On the face of it, it was done as it should - they have set time with each team to present the new way we plan to work, giving each team time to ask questions, to understand how it should impact their work and focus on the points most relevant to them. However, the result was adding injury to insult. The meeting was based on a template presentation that looked like it was taken from scrum-101, dwelling on explaining the basic scrum terms and process and ignoring the elephant in the room: The problem all teams are feeling very painfully is the communication disconnect between various teams and groups. Any decent solution would at least acknowledge this problem, even if it was only to say "after completing this first step, here's our current thoughts of what we are hoping to do, and that's why it is not the first step. Another problem in the meeting was that it all felt like a shallow sprinkle of scrum on the structural and procedural problems we have,trying no to rock the boat (I'm guessing) and hoping for a miracle. Under the very wide umbrella of "this is the first step, let's roll with it and improve on what we get" we are keeping the silo-structured teams, putting the team leads into both scrum-master and product owner roles and did not even hint about how will the backlog be populated or prioritized, not to mention how do we plan to help teams start working together on the smallest value unit that can arrive to production - which we call "Epic", just to confuse people who have some previous scrum experience. Add to that a consultant that is not well versed in the relevant literature to the domain (when I mentioned that having the same person act as scrum master and product owner is usually considered as problematic and asked for the reason behind this choice, I got a response of "I'm not familiar with such articles", which meant I had to spend a whole 30 seconds of google to find 3 such examples) and you might understand that different teams came out of this meeting with either with a feeling of "ok, so nothing is going to change" or just disappointed and feeling that the whole charade is not addressing any of the important issues.

Whatever credit the consultants might have had before this meeting was now thoroughly obliterated.

Despite that, I'm optimistic about the process we are undergoing. Even if the consultants won't be able to recover from this poor start, and even if they will make every other possible mistake, Just by bringing them in the organization has created a space to talk about the process, to introspect and see the pain points that teams are experiencing, and ultimately, we have some very bright people with diverse experience working together and those two facts alone will lead the organization to a better place.

So, to sum the points you might want to remember in your next consultancy gig:

1. Collect information from all levels of the org. Meeting with 12 teams to a 2 hours round table each will take you roughly a week. You probably need less than that.

2. Transparency isn't enough - be vocal about your work. People, not only those that write you the check, should have the feeling that you are working and have an idea what about.

3.Understand the problems people feel, and address them even if you believe the problem is something else. In the latter case, share your understanding and test it.

4. Tailor your out-of-the-box content to your audience.

איך להיכשל כיועץ

English version

התחלנו שינוי ארגוני בעבודה. ספציפית, אנחנו עוברים למשהו שדומה יותר לסקראם. האמת? יש לנו לא מעט מה להרוויח מכזה מעבר - זה עוד צעד שאנחנו עושים כחלק מהמעבר מסטארט-אפ קטן שרץ מהר ושובר הכל לחברה בוגרת שמסוגלת לתכנן קדימה בצורה אחראית כלפי הלקוחות שלה. אפשר לומר שערוצי התקשורת הלא רשמיים שעבדו נהדר בחברה קטנה כבר לא מחזיקים מים לנוכח הררי העבודה שאנחנו עושים. יש יותר מוצרים, יותר צוותים, יותר לקוחות וכל זה אומר שיש הרבה יותר מקום להפיל דברים בין הכיסאות, כך ששינוי כלשהו בהחלט נדרש. כדי לבצע את השינוי כהלכה, הבאנו יועצים חיצוניים שיעזרו לנו להימנע מטעויות נפוצות ולמצוא את התהליך שיתאים לצרכים הספציפיים שלנו. אחלה.אפילו התחלנו ממש ברגל ימין - בפגישה של כל המחלקה הציג המנהל את הכיוון החדש שאנחנו רוצים ללכת בו, איך זה הולך לעזור לנו והציג את היועצים שיתחילו ממש או-טו-טו. המסר היה ברור והמיקוד בשיפור התהליך היה מאוד חיובי בעיני.

משם, עד כמה שאני יכול לומר. הכל הידרדר.
לפני שאתחיל לפרט מה בדיוק השתבש, חשוב לי לומר שאני מציין דברים מנקודת המבט שלי - עובד מן המניין שאינו מנהל, כך שלא הייתי חשוף לחלק ניכר מהדיונים או לאילוצים השונים שהם פעלו תחתיהם. בהחלט ייתכן שמה שאני רואה כהחמצה אדירה זו התוצאה האפשרית הטובה ביותר בהינתן המצב.

בכל מקרה, אחרי ההבהרה הזו, אני רוצה לפרט את רשימת הכשלים, שבעוד שהיא קצרה, היא גם קטלנית לתהליכים כאלה.

בראש ובראשונה, היה יותר מחודש של דממה. ראינו את היועצים מסתובבים במשרד, יושבים לפגישות עם ההנהלה, מציגים מצגות, אבל בפועל - הם דיברו עם מעט מאוד אנשים (אני לא יודע על אחד שאינו מנהל שדיברו איתו, אני יודע שהם קיבלו כמה שמות ולכן בהחלט ייתכן שהם עשו זאת, אבל אחד השמות האלה היה שלי, אז הם בוודאות לא דיברו עם כל מי שהם קיבלו את שמו), שום דברים לא קרה ולאף אחד לא היה מושג מה הם עושים כאן. עבודת ההכנה שהם ככל הנראה עשו נשארה במסתרים. תקופת הדממה הזו לא באה בחינם. בכל רגע שעבר הם איבדו עוד קצת מהמומנטום הראשוני ונתנו עוד זמן לאנשים לתהות "אז מה היועצים האלה עושים כאן?" לדאוג מהשינוי המתקרב, להעלות השערות לא מבוססות ולבנות תמונה שלילית של היועצים. אחרי הכל, אנחנו יודעים, בערך, מה העבודה של כמעט כל מי שאיתנו במשרד. אם אין לנו אפילו תמונה כללית של "מה עושה ישראל ישראלי בעבודה" אנחנו מניחים שהוא בטלן שלא עושה כלום. למעשה, אחרי זמן מה ואחרי שאלה מצד אחד מחברי הצוות (שרצה לדעת אם התנהלות כזו היא נורמלית אצל יועצים), הלכתי לprogram manager שלנו (איך מתרגמים את זה לעברית?) שיש לו חלק משמעותי במעבר הזה ושאלתי אותו מתי הם מתחילים לעבוד. במקרה, זמן קצר אחר כך נתקלתי באחד היועצים במטבח ושאלתי אותו את אותה שאלה, בעיקר כדי לשקף לו מה התחושה בקרב העובדים שהוא לא מדבר איתם.

כאמור, מה שהם כנראה עשו בזמן שמתחילת העבודה שלהם ועד אותו רגע היה כנראה לאסוף מידע ולבנות תוכנית, מה שמוביל לכשל השני. הם דיברו בעיקר עם מנהלים. לאסוף מידע מהצוותים ולהבין את נקודת המבט שלהם ואת הלחצים המופעלים על הצוותים השונים היה יכול לתת להם תמונה שלמה יותר של עולם הבעיה, אבל גם לתת לאנשים תחושה שמקשיבים להם ושמנסים לפתור את הבעיות שמציקות להם. שיתוף מוקדם, גם אם הוא מזוייף לגמרי, מונע תחושה של "הנחתה" ומאפשר לנו לבדוק את הרעיונות שלנו לפני שהתחייבנו אליהם. אולי הצגה מוקדמת של הרעיונות הייתה יכולה להציף נקודות שכדאי להתייחס אליהן, אולי שיחה עם הצוותים הייתה מצביעה על הפיל שבחדר. זה גם לא מאוד יקר לאסוף את המידע הזה - אם מקדישים שעתיים לשולחן עגול עם כל אחד משנים עשר הצוותים אנחנו מקבלים עשרים וארבע שעות. זה שבוע של אדם אחד. לגמרי משהו ששלושה אנשים יכולים לעשות ביומיים.

זה, כמובן, מביא אותנו לכשל האחרון (עד כה) - הצגת הפיתרון.

לא הרבה אחרי ששאלתי מתי הם מתחילים לעבוד נקבעה לכל צוות ישיבת התנעה בה יוצג הכיוון שאליו אנחנו רוצים ללכת. באופן שטחי, זה בדיוק הדבר הנכון לעשות: לתת לכל צוות את המרחב שלו לשאול, להבין, להשמיע חששות ולקבל מענה לדאגות הספציפיות שלו. בפועל, התוצאה הסופית הייתה גרועה יותר מאי עשייה. הפגישות היו מצגת "מבוא לסקראם תיאורטי" גנריות והתעכבו על הסבר של כל מיני מונחים ותפקידים, ואיך עובד סקראם בתוך צוות. מה לא היה שם? הפיל שבחדר. שום דבר שרלוונטי להקשר הספציפי שלנו. הבעיה הכי משמעותית שמשפיעה על כל הצוותים. קצרים בתקשורת בין-צוותית ובין קבוצתית. כל פיתרון שנעשה עם מינימום של השקעה היה לכל הפחות מתייחס לזה. גם אם רק כדי לומר "חבר'ה, את הבעיה הזו נצטרך לפתור אחר כך, הנה רעיונות שאולי יתאימו, נראה אחרי השלמת הצעד הראשון. אם אתם רוצים לדעת למה זה לא הדבר הראשון שאנחנו מטפלים בו, נשמח לדבר איתכם".
בעיה נוספת שהייתה במפגישה היא שלא הייתה בה שום בשורה - רוב האנשים יצאו ממנה בתחושה של חוסר משמעות. שאנחנו הולכים להתיז קצת בושם בניחוח סקראם על המבנה הקיים שלנו ולקוות שבדרך נס זה יפתור את הבעיות שאנחנו חווים. זה היה נראה כאילו מטרת הצעד הראשון הייתה "לא לטלטל יותר מדי את הסירה" נשארנו עם מבנה של צוותים מבודדים, אנשי הנהלת המוצר אינם חלק רשמי מהתהליך וכל ראש צוות אמור לתפקד גם בתור ראש צוות, גם כסקראם מאסטר וגם כProdocut owner -ו מי שמכיר סקראם ידע לומר שאלו שלושה תפקידים שבדרך כלל נזהרים מאוד שלא לערבב אותם - גם מפאת הזמן שכל תפקיד לוקח, אבל גם בגלל סתירות פנימיות בין התפקידים (יש לציין, כשהעליתי את הנקודה בפני היועץ ושאלתי למה החלטנו להיות חריגים בנקודה הזו, הוא טען שהוא לא מכיר טענות כאלה. לקח לי שלושים שניות בגוגל למצוא לו דוגמאות, אז אשאיר את זה כתרגיל לקורא). איך כל ראש צוות אמור לתפקד כבעל מוצר כשיש תלויות עם צוותים אחרים? מי מתחזק את הבקלוג? איך מתמודדים עם בלת"מים ועיכובים? זה, כמובן, לא היה בפגישה. מי שנכח בפגישה יצא עם תחושה שהכל הולך להישאר בדיוק אותו דבר. מי שהגיע עם ציפיות לשיפור יצא מאוכזב מכך שכל ההצגה הזו לא מנסה אפילו לטפל בנושאים שכואבים לו.

בקיצור - כל טיפה של קרדיט שאולי הייתה עדיין ליועצים בשלב הזה התאיידה לה אל חלל האוויר.

למרות זאת, אני די אופטימי ביחס לשינוי שאנחנו עושים בעבודה. גם אם היועצים ימשיכו לעשות עבודה חובבנית ולגרום נזקים, וגם אם הם יעשו כל טעות אפשרית, עצם העובדה שהביאו אותם לארגון יצרה מקום בו אפשר לדבר על התהליך, לבחון לעומק ולראות את הנקודות שכואבות לנו כמחלקה. בסופו של יום, יש לנו אנשים חכמים עם ניסיון מגוון וברגע בו אנחנו מנסים לשפר את התהליך שלנו, זה רק עניין של זמן עד שנצליח.

אז, אם אתם מוצאים את עצמכם בתפקידי ייעוץ, הנה כמה טיפים שכדאי לא לפספס:

אספו מידע מכל מי שרלוונטי. או לפחות ממספיק גורמים מגוונים.
שקיפות זה לא מספיק. היו קולניים לגבי העבודה שלכם. אנשים, ולא רק אלו שחותמים על הצ'ק שלכם, צריכים להרגיש שהם יודעים בערך מה אתם עושים.
הבינו מה הבעיות שאנשים חושבים שיש להם. גם אם תגיעו למסקנה שהבעיות האמיתיות נמצאות במקום אחר, תוכלו להתייחס לתחושות האלה. אם אתם חושבים שהבעיה נמצאת במקום אחר - השמיעו קול, תנו לאנשים להגיב ובחנו את התיאוריות שלכם.
בלי תוכן מוכן מראש. לכל הפחות דאגו להתאים את התוכן שלכם למקם שאתם מייעצים לו.

Saturday, February 12, 2022

Troubleshooting

גרסה עברית

Last week I faced a moment of frustration. And not the good "why isn't it working" kind of frustration. That kind I'm used to, and I normally end up learning something. This time I was frustrated with the people I work with. The story is quite simple - we had (yet another) problem in our systems, and they set out to investigate it. I was busy with a super-urgent-clock-is-ticking kind of task, so I wasn't available to help in this investigation. I did try to ask some guiding question, such as "what error do you see in the log", or "can you reproduce it on your system?" but other than that I was doing my best not to get involved.

After a while they have been struggling with the problem, my manager asked me to time-box 20 minutes for this, as it was blocking most of the team. After checking that the urgent task can wait this long, I took a look. Then I got upset. Two people have been looking on this, and the best they could come up with was to quote a partial error and the step which was failing. No guess of what could have happened, no narrowing down of the problem, a simple "it fails with this message". Yet, when I took a look, it was less than 30 seconds to figure out what was wrong, then perhaps 15 more minutes to find a workaround that is actually working.

I reflected a bit on this negative feeling - somewhere between disappointment and annoyance - and figured out why I was so upset, and this helped me notice something I didn't see before.

I was upset because I always assume that the people I'm working with are smart and capable people who are doing the best they can., and any contrary example is touching a raw nerve for me and makes me wonder why I'm bothering investing so much time and effort trying to collaborate with them instead of working individually. Then, after processing it a bit more and recalling the fundamental attribution error I could say that it's probably not that the people who failed in a task I found trivial are not smart or that they don't try their best, it's more likely that there are some other factors I'm not aware of that make this behavior reasonable. Both of them had other tasks putting pressure on them, and both are fairly inexperienced - between them they have less than 18 months of experience. In addition, it reminded me that troubleshooting is a skill that needs practice and learning, which prompted this post - I want to share the way I approach troubleshooting, hoping it might help people.

The first thing worth noticing about troubleshooting is that almost anyone related to software development need to do this quite a lot - programmers, testers, CS, Operations, and however you might call the team managing your pipelines. The second thing worth noticing is that it looks a lot like bug investigation, so being a better troubleshooter will make you a better tester as well. In fact, the main difference between troubleshooting and bug investigation is the goal we have: troubleshooting is about making a problem go away, or at least find a way to do our task around it, where bug investigation is more about understanding the cause and potential impact of such problem, so if a bug just flickers away we'll hunt it down.

So, how do I troubleshoot? Here's a short guide:

Is it obvious? Sometimes the errors I get or the symptoms I experience are detailed enough that no investigation is actually needed. I can easily tell what has happened and skip directly to fixing or finding a workaround.
Can I reproduce it? Does it happen again? if not - great, problem gone. It might come back later, in which case I might try a bit harder to reproduce it or trace its cause, but usually, a first time problem that isn't easily reproducible doesn't really need shooting. I skip to "done for now" and continue with whatever it is that needs doing.
Create a mental model - what was supposed to happen? Which components are talking with which systems? What relevant environmental factors should be considered? What has recently changed?
Investigation loop:

gather information. Google the error message or symptom, gain visibility on the relevant flow, ask around if anyone have seen such a problem, etc.
Hypothesize. Guess what might be causing the problem.
Try to disprove the hypothesis:

Create a minimal reproduction of the problem
Find contrary evidence in log file, side effects, etc.

Tweak. Based on my current guesses, try working around or mitigate the cause of failure. I suspect a code change? I'll revert to a previous state. Server can't be reached? I'll tweak in order to gain more information. I might check for ping, or DNS resolution.
Check to see if problem has gone away. If so - update the theory of what happened and finish.
Update and narrow the model. Using the information I gained, zoom in on the relevant part of the model and elaborate it. For example, a model starting with "I can't install the product", might narrow to "I have a remnants from a faulty uninstall that are preventing a critical operation" or to "the installation requires active internet connection and the computer has been plugged out", it can be more complicated than that.
If I can't narrow down the model, or can't come up with a next theory of what might be wrong, I still have two options:

Hail Mary - I'll change a random thing that I don't expect to help but is related in some way. For instance, I might follow instructions on the internet to find a relevant configuration change, or reboot the system. Who knows? I might be lucky and gain more information, or even make the problem go away for a while.
Ask for help. Find someone who might have more knowledge than me, or just a fresh perspective, and share my model, failed attempts and guesses I couldn't or didn't act upon, and we'll look at the problem with that person's knowledge and tools.

Now we know what's wrong, or at least we're confident enough that we know, time to shoot down the problem. Find a way to configure a tool we were using and was causing problems, change the way we operate, sanitize our environment, or whatever will work to our satisfaction.

That's it. I hope this flow will be helpful, at least to some extent. If you have additional tips on troubleshooting - I'd be happy to hear about them.

תפעול תקלות

English version

בשבוע שעבר היה לי רגע של תסכול. לא תסכול טוב של "אבל למה זה לא עובד??" אלא תסכול שקשור לאנשים שאני עובד איתם. הסיפור, בתכל'ס די פשוט - הייתה לנו (עוד) תקלה עם סביבת ההרצה שלנו, ושני אנשים מהצוות ניסו לתקן אותה. אני, מצידי, הייתי עסוק במשימה סופר-דחופה-הדדליין-זה-אתמול, אז לא הייתי זמין לעזור גם כשהם ביקשו עזרה. כן ניסיתי לזרוק איזו שאלה מנחה או שתיים - דברים כמו האם הסתכלתם בלוג? מה הוא מלמד אתכם? האם אתם יכולים לשחזר את הבעיה בסביבה שלכם? האם אתם יכולים להגדיר את הבעיה בצורה יותר מדוייקת מאשר "זה לא עובד" ? אבל חוץ מזה דחיתי בתקיפות את בקשות העזרה שלהם.

אחרי איזשהו זמן המנהלת שלי ביקשה שאשים את המשימה הדחופה בצד לעשרים דקות כדי לראות אם אני יכול לקדם אותם קצת, כי הבעיה שהם מנסים להתמודד איתה תוקעת את כל הצוות. אז ביררתי שאפשר להתעכב בשעה (כי עד שחוזרים בחזרה לריכוז גם לוקח זמן) ואז הסתכלתי לראות מה קורה שם. ואז התרגזתי. לקח לי שלושים שניות כדי להבין מה הבעיה, ולראות בלוג שיש הפנייה לטיקט ספציפי בגיטהאב (שנכנס לפני שבוע) של אחד הכלים שאנחנו משתמשים בהם והצעה למעקף. אז אוקי, המעקף לא עובד, אבל ברגע שזיהיתי את זה לקח לי עוד רבע שעה למצוא איך להוריד גרסה של הכלי הנ"ל ולפתור את הבעיה. ועדיין, הכי טוב שקיבלתי משני אנשים שישבו לחקור את הבעיה היה "זה נופל, הנה הודעת השגיאה" (מנותקת מכל ההקשר). לא הבנה טובה יותר של מה קורה שם, לא דברים שהם כבר ניסו. כלום.

אני לא אוהב לכעוס על שותפים לעבודה, אז המחשבה הזו נשארה איתי קצת, והקדשתי לעניין הזה תשומת לב. הבנתי למה זה כל כך הציק לי, וקיבלתי תזכורת למשהו שחמק ממני.הסיבה שמשהו כל כך טריוויאלי (כולה רבע שעה מהחיים שלי, ועוד בזמן עבודה) היא שאני פועל תמיד מתוך הנחה שכל מי שאני עובד איתו הוא גם חכם וגם עושה את המקסימום שהוא יכול. זו הנחה חזקה יחסית, אבל היא חלק חשוב ממה שגורם לי ליהנות בעבודה. כשדוגמאות בסגנון הזה מערערות אותה, אני תוהה בשביל מה אני משקיע כל כך הרבה מאמץ בניסיון לעבוד יחד איתם במקום פשוט להתקדם לבד עם משימות, ושיסתדרו. התבשלתי קצת במיץ של עצמי, ואחרי שעיבדתי את הנושא עוד קצת והזכרתי לעצמי את טעות הייחוס הבסיסית (ההסבר בויקיפדיה קצת מעורפל, כאן יש הסבר טוב יותר), הבנתי שנכון יותר לומר ששני האנשים שלא הצליחו להתמודד עם משימה שאני מקטלג כטריוויאלית לא חיפפו, אלא עשו כמיטב יכולתם בנסיבות בהן מצאו את עצמם. יש לפחות שני גורמים שאני יכול לחשוב עליהם שתורמים לתוצאה הזו - לשניהם יש מלא משימות אחרות דחופות לא פחות מאשר המכשול הזה שנפל עליהם, ולשניהם ביחד יש רק טיפה יותר משנה של ניסיון, כך שהם עדיין לא ראו ארבע מאות דברים דומים. חוץ מזה, זה הזכיר לי שפתרון תקלות הוא כישור שצריך לפתח, לא משהו שקורה באופן אוטומטי, מה שמביא אותי למטרת הפוסט - אני רוצה לחלוק עם כל מי שיגיע לכאן (כנראה בטעות) את הדרך בה אני מתמודד עם תקלות טכניות בתקווה שזה יעזור למישהו.

הדבר הראשון שכדאי לשים לב אליו כשמדברים על תפעול תקלות הוא שכמעט כל מי שמתעסק עם תוכנה יצטרך לעשות את זה, ולא מעט. מתכנתים, בודקי תוכנה אנשי DevOps (או איך שלא תקראו לצוות הזה שמתחזק את ג'נקינס בשביל כל הארגון) וכמובן שגם אנשי תמיכה. כולם עושים את זה - כל הזמן. הדבר השני הוא שיש לא מעט קווי דמיון בין תפעול תקלות לבין חקירת באגים, כך ששיפור ביכולת תפעול התקלות תהפוך אתכם גם לבודקי תוכנה טובים יותר. למעשה, ההבדל המרכזי בין שתי הפעילויות הוא המטרה: כשאנחנו מתפעלים תקלה יש משהו שלא עובד לנו, הוא חוסם לנו את הדרך למשהו שאנחנו צריכים לעשות, ואנחנו רוצים לגרום לו לא להיות כאן יותר, או לפחות למצוא דרך לעקוף את המכשול המציק הזה. בחקירת באגים, מצד שני, נרצה להבין מה הם הגורמים לתקלה, מה הנזק הפוטנציאלי וכמה הוא חשוב. כך שאם באג נעלם ככה סתם, אנחנו נצוד אותו, אבל אם תקלה נעלמת בלי שאנחנו יודעים למה, נגיד תודה רבה ונמשיך בחיינו.

אז, איך אני ניגש לתקלה? הנה מדריך קצר:

האם זה ברור מאליו? לפעמים הודעת השגיאה טובה מספיק, או שראינו כבר משהו ממש דומה, או שהתסמינים פשוט צועקים "זו הבעיה". במקרים כאלה אפשר לדלג על שלב החקירה ולהתחיל לחפש פתרונות.
האם אני מצליח לשחזר את הבעיה? עדיף - בסביבת העבודה שלי. לא? מצויין, הבעיה הלכה למקום אחר ולא צריך לטפל בה. יכול להיות שהבעיה תחזור אחר כך, ואז אולי אשקיע קצת יותר זמן בניסיונות לשחזר אותה - כי בעיה שחוזרת פעם בחודש ושורפת לי זמן תעלה לי יותר מאשר להשקיע בה יומיים ולפתור אותה אחת ולתמיד.
בניית מודל.
אני מתחיל לבנות לעצמי תמונה בראש - אילו חלקים נעים במערכת? מי מדבר עם מי? מה היה צריך לקרות ובאילו שלבים? אילו משתני סביבה משפיעים על מה שקורה? מה השתנה לאחרונה ויכול להיות קשור?
לולאת החקירה:

איסוף מידע. אפשר לחפש את הודעת השגיאה בגוגל, לצלול לתוך הלוגים בחיפושים אחרי מידע נוסף, לנסות ולעקוב אחרי המידע (למשל, שימוש בפרוקסי כדי לראות תעבורת רשת, או הדפסות מסויימות שאמורות לקרות לפני נקודת הכשל)
ניחוש - על בסיס המודל שיש לנו בידיים, מה יכול לגרום לבעיה?
ניסיון להפריך את הניחוש:

שחזור מינימלי שיאפשר לי לבודד את גורם התקלה ולראות שבלעדיו זה לא קורה (אם כן - זה לא זה)
חיפוש ראיות סותרות במידע שיש לי.

שפצורים - אני מתחיל לשחק עם כל מיני פרמטרים קשורים לניחוש בתקווה שאחד מהם יעזור לפתור את הבעיה.
האם זה עבד? האם השינוי שעשיתי גרם לבעיה להיעלם? אם כן, סבבה. הבעיה נפתרה.
עדכון המודל עם המידע החדש שנאסף בשלבים הקודמים.
לפעמים נגמרים הרעיונות - חקרתי את כל הכיוונים שהצלחתי לחשוב עליהם, בחנתי את הכל הראיות שמצאתי, וזה לא מתקדם לשום מקום. עדיין יש לי שני כלים בארגז שאפשר להפעיל:

אללה-הוא-אכבר: אני אעשה משהו שאין לו באמת סיכוי לעבוד, או אשנה איזה פרמטר אקראי שמצאתי בגוגל ולא נראה קשור.מי יודע, אולי יהיה לי מזל וזה יוסיף לי מידע, או אפילו, חלילה, יפתור את הבעיה?
לבקש עזרה - אין לי את כל הידע בעולם, ולא כל הרעיונות נובעים מהראש שלי. אולי למישהו אחר יהיו רעיונות חדשים שיקדמו אותי? אולי הוא יודע להפעיל כלי רלוונטי שאני לא מכיר? אני אתפוס מישהו, אציג לו בקצרה את ההתקדמות שעשיתי עד עכשיו (את המודל הנוכחי, וכמה צעדים שעשיתי כדי לבסס אותו) ונשב לעבוד ביחד על הבעיה.

זהו, בשלב הזה אנחנו צריכים כבר לדעת מה הבעיה - גם אם לא את הפיתרון עבורה. עכשיו זה רק החלק של למצוא איך לעקוף את גורם הבעיה ולגרום למה שאנחנו צריכים לעבוד. זה יכול להיות משהו קצר ופשוט כמו "לוודא שתיקייה מסויימת קיימת לפני שמריצים פקודה" או משהו קצת יותר כבד כמו לקמפל מחדש ספריית קוד-פתוח שאנחנו משתמשים בה כי יש שם באג. בדרך כלל, אם יודעים מה הבעיה, לא מאוד מורכב לפתור אותה.

זהו, פלוס מינוס. אני מקווה שזה יהיה שימושי למישהו. וכמובן - אם יש לכם דרכים נוספות להתמודד עם תקלות, אשמח לשמוע עליהן.

Sunday, January 23, 2022

A world class test team?

Source: XKCD

I've read Simon Prior's blog post Building a world class test team, and I didn't like what I've seen.

There's the overuse of my least favorite 4 letter word, but putting that aside, I found that I'm disagreeing with the definitions he's using. His idea of a "good test team" is a safety net role that enables others to disregard their responsibilities and directly contribute to worse software. The "great" test team is the first thing that starts to get close to an almost decent team, as he's added "preventing defects" which I chose to interpret generously as being involved in crafting the requirements and planning the work with the other members of the software engineering group.
Then, the "World class test team" is complete contradiction. It's described as "all of the above" AND some properties of what I would call pretty good testing - not being the scapegoat, coaching, improving other functions work - a lot of things that are not very compatible with owning the large part of the testing effort. Sure, I can imagine this working in some places, but it would be like hammering a bolt that doesn't fit well into place - it will work in the end, but you've invested more effort than you should have, and it will be a mess to deal with it every time you need to change something.

Instead, I want to suggest another definition - in almost all cases, a world-class testing team is one that has disbanded (or is actively working towards that goal). As much as I like being a tester by role as well as by identification, my main aspiration is to bring the place I work in to a state where they don't need dedicated testers (note - I said "don't need",I didn't mean "just get rid of") - the way to achieve this is to embed the testing mindset and activities as part of every step in the process, from feature definition to deployment. Something very much like Dan Ashby's model of where testing fit in DevOps. It will take a while to help build the necessary infrastructure needed for self-testing and even longer to instill a culture where testing is part of being a professional, but if we want "world class" testing, one that people can learn from, it's this property we are looking for.

In addition to this, I found two points of disagreement with the practical advice of building a good test team. The first one is using "9 to 5 testers" as a derogatory term. It's covered beneath some layers of "there's nothing wrong with it", but still confounds working reasonable hours with box-ticking and lack of development. When building a team, respecting their private time is super important, and accommodating improvement within regular working hours is key to building the team. want to fend off stagnation and rigidness? find a term that will not push towards expecting people to put their free time for the business.

The second issue is the advice to diversify with hiring. It's not a bad idea, but in many cases it's not a viable option - by my experience, finding senior testers is way more difficult than finding experienced coders, and it seems that it is so in other countries as well

Monday, January 17, 2022

Deep work - Book review

Following a recommendation from Tomer Cohen (He wrote in Hebrew), I finally got to listen to Cal Newport's "Deep Work: Rules for Focused Success in a Distracted World", and my reaction to it is quite ambivalent.

The book starts with a very tempting, yet quite weak proposition of value: Deep work is valuable, and its value increases with the proliferation of knowledge work in today's world. On the other hand, increasing connectivity and interruptions makes this particular skill rarer, so it can make you stand out in your profession and earn more. It then moves on to claim that deep work will also make you happier. Basically - it's a panacea to all of the modern world's ailments. And, like the lovely book "Calling Bullshit:The art of skepticism in a data driven world" has reminded us - great claims requires great deal of evidence. Not only that this book does not provide that sort of evidence, it goes on to actually show a counter-example, acknowledging that deep work is not the only way to wealth, and dismissing it with a statement along the lines of "even though there are other ways to wealth, it does not mean that deep work isn't a way to put you ahead of everyone else (perhaps except the CEO that provided the counter-example, but for most of us it's not an option)". Convinced? Neither was I.

But, before I go on rambling about deep work, it might be worth it to define the term. The definition used in the book is: "Professional activities performed in a state of distraction-free
concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate". long story short - deep work is hard, valuable mental work.

Anyway, while the evidence fall short of the grand claims, the book actually tells a compelling story, and if we accept some of the assumptions made hastily - its conclusions are inevitable. For instance, one reason that deep work should make one happier goes as follows: Quote the work in Winifred Gallagher's RAPT to base that people happiness has more to do with what they focus on than with what objectively happens to them, state (assume) that deep work is focusing on more meaningful activity than the shallow trudge of interrupts, conclude that since one spends more time focused on meaningful events, they will feel a greater sense of meaning in life, or quote Mihaly Csikszentmihalyi's "Flow", assume that deep work is more prone to lead a person to the sweet spot between challenge and success to greatly increase one's chances of experiencing flow, and since being in a state of flow is known to increase happiness, so does deep work.
As you might have noticed, the assumptions are quite plausible. good enough to be motivating. One issue it sidesteps completely is the question "are there paths other than deep work that are as likely to generate comparable levels of happiness or value? How do we identify we are on such a path or that we might benefit more from one?"

The rest of the book is about tips to incorporate deep work in our day to day. Most advice here is at least convincing, and the author takes sufficient time to base some of the claims - shutting down interruptions and educating your environment can work in most contexts, focus is a skill that needs constant training and is prone to deterioration, actually resting and not doing any work after the work day is a way to increase productivity, and so on. My unease on this part has more to do with the totality of the approaches described in the book. Commitment to deep work is subjugating one's entire being: The default is being in a state of deep work, pausing only for as brief periods as necessary. A lot of it feels like what you might expect in a time-management book, only that the focus is not about "not wasting time", but rather about constantly training your deep working skill. I got tired only from trying to grok the message.

This fatalistic approach is a bit much, but it helps to emphasize that deep work is demanding, even if I believe that it can be done in a less extreme way it is still useful to learn the purists approach, where it is the clearest. It also helps to see various examples of building a schedule for deep work. It can be getting up early to spend a couple of hours in full focus before the day starts for other people who will create interference, it can be planning work so that we divide our calendar time between weeks of solitary focus and those of interruption heavy busy-work. Then there's the part where I felt I was being cheated - after explaining at length how deep work is something that takes time to start (research I think I recall from other places states that it takes about 15 minutes to enter "the zone") and that 4 half-hours of deep work are not as effective as 2 straight hours of it, the author goes to describe the "journalistic" approach which is based on the assumption that people trained in deep work can skip this time and just dive into deep work immediately whenever they have a spare 10 minutes or more. It really felt as a way to say "I'm doing deep work" while ditching aside all of the principles mentioned in the book before. It was also an unpleasant surprise to hear that this is the author's main mode of work - Preaching for deep work and claiming to be quite adept at it, then redefining it as "I do deep work when I decide to do it and have free time from my other distractions" sure does feel like hypocrisy.

That being said, I still took some insights with me.

First of all, I'm still debating with myself about the actual value of deep work in my context - much of my day is about helping others, jumping for a short period and dropping a question that might set things on a better path, mentoring others to do the actual work themselves. Sure, some of the work is done by me, and I find great joy in being able to still contribute directly, but my added value there is not always significant - I might do things faster or a bit better than the less experienced people in my team, but in the time it takes me to do one unit of work myself, I can probably help three team mates to do better work and complete 3 units of work that will be 80% as good as if I would have done it. So, where do I see more value? If I'd try to frame this in deep-work lingo, I'd borrow some of Kahneman's Thinking fast and slow and claim that by practicing, I managed to move some of the skills that require deep work and concentration to the more automatic parts of my mind and now I provide value by doing deep work in a shallow fashion - using those automatic skills to improve my environment. This means that my skill is based on deep work, and even if I believe that most of the value I provide is collaborative and interruption heavy (collaboration can, in rare cases, be deep work in itself, but the book usually treats it as solitary work), I should still invest some time in deep work to expand the base I'm building upon and to make sure I'm still connected to the work.

It also serves as reminder - not everything I do is equally important, and there are some things such as e-mail or instant messaging that can be pushed aside instead of sapping my attention.
Another thing that intrigues me is the claim that we are addicted to interruptions - I'm definitely going to try some of the tactics to train my mind to be more focused, such as defining breaks from concentration (perhaps using pomodoro) and learn to stick to them - no breaks until the buzzer.

I think that the most fitting analogy to this book would be an olympic runner sharing tips on how to become a better runner. This athlete will surely recognize that the amount of effort needed to get to their level is ridiculous, but even when they will try to dial down the effort, they cannot unsee what they know - What you eat, how you sleep, how balanced are your core muscles, all of this has an impact on your running. Getting advice from someone like that will create the impression that there is nothing more important in life, which is true for very specific cases.
This is why I probably won't invest the effort to become a professional deep-worker who's managed by the hunt for deep work, it might be relevant for highly competitive fields such as academia or writing, but I suspect the reward is much smaller in software development where good employees are rather hard to find. A focused deep worker might be a lot better than me, but the difficulty of measuring productivity and the fact that most places require the sort of work that my skills are more than enough for - it will be like purchasing a Ferrari to stand in traffic. It *can* get to silly speed, but you would do just the same with a car tenth of that price. I need and want to be good at my work, trying to be the best of the best is not worth the effort. Instead, I'll treat it the same way I treat my bicycle. It's fun, it builds some sort of muscle, and I gain a lot from this as a hobby.