Friday, July 17, 2020

Being right about the wrong things


So, following a link I encountered on Josh Grant's blog I came across a post named "against testing" where the author (someone named Ted, I think) has some very strong claims about tests and why most people shouldn't write them. On the face of it, most of those arguments are sound and represent a well thought of observations. It is true that most automated tests, especially unit tests, rarely find bugs, and that they are tightly coupled to the existing implementation in a way that means that whenever you come to refactor a module you'll find yourself in need of refactoring the tests as well.It is also true that there are a lot of projects out there where testing something in isolation is simply not that easy to do (There's a reason why Michael Feathers defines legacy code as "code without tests" ) and investigating a test fail requires putting a lot of time into understanding code that won't help you when you come to develop the next feature. Furthermore, having test code does introduce the risk of delivering it to production, where all sorts of nasty things might happen. No test code? This risk does not exist. 
All of those seem to follow a sound logic chain resulting in one conclusion - don't write tests. Not surprisingly, I disagree.
The easiest way to dismiss most of those claims is to respond with "most of those problems indicate that your tests are not good", and while I do hold true that a well written test should be clear to understand, at just about the right level of abstraction to minimize refactoring pain and targeting the tested unit commitments rather than implementation, I also know that those are hard to write, and that most people writing test code are not doing such a pristine job most of the time. In fact, for my response I'm going to assume that people write mediocre to trivial tests, simply because that's the most likely scenario. Most people indeed don't learn to write proper tests, and don't practice it. They get "write tests" as a task they must do to complete their "real" work and thus do the bare minimum they must. 

From my perspective, the post is wrong right at the beginning, stating that "In order to be effective, a test needs to exist for some condition not handled by the code", that is - a test is meant to "find bugs". For me, that's a secondary goal at best. I see tests as scaffolding - making the promises of a piece of code explicit, and here to help people refactoring or using that piece of code. If someone is working in a TDDish manner (no need to be strict about it) they can use this scaffolding to figure out earlier how their code should look like - when an internal logic that totally makes sense when implementing is just too cumbersome to use or when we need some extra dependencies. It is also a nice way to put things I don't want to forget in a place where I'll be reminded on. 
But, that's assuming TDD, and not enough people are using this method to justify writing tests, or to not delete them once I'm done, and that's when I get to two of the most common tasks a developer faces: refactoring code and investigating bugs. Starting with the fun part - refactoring. When refactoring a piece of code, there's one single worry - did I break something? Testing alone does not answer this question, but it does help in reducing it, especially in a language without a strict compiler. Imagine a simple python project, where there is a utility module that is being called extensively. I go and change the type of one of the parameters from string-duck to Object-duck (let's imagine I'm assuming a .foo() method to be available). It is already the case in 99% of the project, but not necessarily in all. If I wasn't using proper type hinting (as is sadly way too common), the only way I'll find this will be if I'll run the specific piece of code with the faulty code. a 100% line coverage increases my chances of finding it. Too far fetched? ok. What about a piece of code that is not straightforward? one that has what the linters like to call "high complexity". Just keeping those intricate conditions in mind is heavy lifting, so why not put those in a place where I'll get feedback if my refactor broke something?
Those types of functions are also a nightmare to debug or fix, and here I want to share an experience I had. In my previous workplace we had a table that was aggregating purchases - if they matched on certain fields we called them equal and would merge them. Choosing what to display had a rather complex decision tree due to what matched our business. I got a task of fixing something minor in it (I don't recall exactly what, I think it was a wrong value in a single column or something like that). Frankly? The code was complicated. complicated enough so that I wasn't sure I could find the right place. So I added a condition to an existing test. It wasn't a good test to begin with, and in fact, when I first approached it, it couldn't fail because it asserted that "expected" was equal to "expected" (it was a bit difficult to see that, though), Once I added the expected result to the test I could just run it, fix the problematic scenario, and move on to the next one. The existing tests did remind me of a flow I completely forgot about (did  I mention it was a very complicated decision tree?). 

Another useful way to use tests is as an investigative tool. In my current workplace we are using python (Short advice - don't do it without a good reason). Moreover, we are using python 3.6. we do a lot of work with JSON messages, and as such it's nice to be able to deserialize a message into a proper object, such as can be done with Jackson or Gson in Java. However, since python has a "native" support for json, I didn't manage to find such a tool (not saying there isn't), so in order to avoid string literals, we defined a new type of class that takes a dictionary and translate it to an object. Combined with type hints - and we have an easy to use auto-complete friendly object (In python 3.8 they introduced a "data class" that might do what we need,  but that's less relevant here). To do that we've overrode some of the "magic" methods ( __getattr__, for instance), which means we don't really know what we did here and what side effects there are. What we did know were our intended uses - we wanted to serialize and deserialize some objects, with nesting objects of various types. So, after the first bug manifested, we added some tests - we found out that our solution could cause an endless call-loop and that we don't really need to support deserializing a tuple, since a json string can only be a simple value, a list or a dictionary (not something we thought about when we started implementing the serialization part, so we saved some time by not writing this useless piece of code). Whenever we were unsure about how would our code behave - we added a test about it. Also, since we didn't really understand what we were doing, we managed to fix one thing while breaking another several times. Each time our existing tests showed us we had a problem. 

There is, however, one point on which I agree with the author - writing unit tests does change the way your code is written. It might add some abstraction (though that's not necessarily the case with the existing mocking tools) and it does push towards a specific type of design. In fact, when I wrote tests for a side project I'm working on, I broke a web controller to 5 different classes just because I noticed that I had to instantiate a lot of uninteresting dependencies for each test. I'm happy with that change since I see it as something that showed me that my single class was actually doing 5 different things, albeit quite similar ones. As a result of this change, I can be more confident about the possible impact a specific change can have - it won't affect all 5 classes, but only one of them, and this class has a very specific task I know how to access and which flows it's involved in. Changing an existing code this way does introduce risk, so everyone need to decide whether the rewards are worth taking those risks. If all you expect from your tests is to find bugs or defend against regression - the reward is indeed small. I believe that if you consider the other benefits I mentioned - having an investigative tool that will work for others touching the code, being explicit about what a piece of code promises and in the long run having smaller components with defined (supported) interactions between them - it starts to make much more sense. 


So, to sum up what I wrote above in a short paragraph - the author is completely right in claiming that if tests are meant to find bugs and defend against regression, they don't do a very good job. But, treating tests in such a way is to claim that a hammer is not very effective because it does poor job in pinning screws to a wall. Tests can sometimes find bugs, and they can defend against some of the bugs introduced by refactoring. but they don't do these tasks very well. What tests do best is mostly about people. They communicate (and enforce) explicit commitments, they help investigate and remember tasks and they save a ton of time wasted on stupid mistakes and leaves more time to deal with the real logic difficulties a code presents. I think that by looking at those properties of having tests, their value is represented better, and it also becomes easier to write better tests. 

Saturday, July 4, 2020

A "testing" competition

 

So, last week I've participated in a "testing" contest. You know, this event where you're given an app you've never seen before, and asked to "find bugs" for the next so and so hours. A lot has been written before me on why is there's basically no connection between such events and proper software testing, so I won't bore you with another such piece of text. Instead, I want to write about the things I did learn by participating in this event. 

First, since every such competition is different, I want to lay out the opening conditions for this event: 
To be short and blunt, it was an epic fail from the organizers side, In fact, I believe it was the level of amateurism involved in the event that drove me to write this post in the first place, just so that I could rant a bit.Furthermore, I'll make an exception and will name and shame this event - the Israeli Software Testing Cup.  
Anyway, what bothered me? First thing - while the competition does have a site, it's quite difficult to find and it contains just about zero useful information. No rules of engagement,  No mention to what we're allowed to do and what we're not - is it ok to look for security vulnerabilities in the app? do they care about usability issues? Also, no info on what is the basis for scoring, nothing whatsoever. On the day of the event itself we've heard for the first time "and make sure to send your report to this address". A report? what should be in it? who's the (real or imagined) client who'll be reading this? Should it be a business report? a detailed technical  report? A worded evaluation of the product with impressions and  general advice? Even directly asking that did not provide an answer, since "it's part of the competition and you should know for yourself what sort of report to send". Even as the results were announced, the teams were ranked, but what categories were used to score? no mention of it. Might have been shuffled in random as far as we know. 
Next we go to the application under test  - which was a nice idea but the app simply didn't work. It might have been a shoddy server spun up for the competition, or the product itself was in its pre-alpha stage but the fact is that many teams were having trouble just getting past the login\registration screen. In short - this should have been better.

Despite all of that, I managed to learn a few things: 
First of all, events such as this are a way to teach oneself on new things and catch-up with changes to familiar fields that are now out of focus. As I was preparing for the competition tries to capture my phone's traffic with a network proxy. Well, it seems that android users an't install their own certificates without have root access on android devices running android 7 or higher. You can still do that if you have a rooted device, but last time I checked, those mechanisms were not yet in place (I did have an older phone and it was a few years back) so now I know there's another thing to take care of whenever I'll have to approach mobile testing in the future.
The second thing I've learned was about the importance of experience - that which I had and that which I did not. I could leverage my past experience for faster orientation in the product, and knowing what I wanted to do even if I didn't know how to do it, one example is asking "where can I read the logs", this situation was a good chance for knowledge transfer, since my partner did know how to read application logs using logcat, so he could catch me up on that. The flip-side of that are all the things I didn't know. Perhaps with enough time I would have examined things such as app permissions or power consumption, but those didn't even pass through my mind while at the competition, since I lacked practice and didn't know the tooling around it so the time cost was just too big to even consider. 
Next thing - prep, prep prep. When interacting with an application, we are chasing electrical currents in various levels of abstraction - bits, text, various communication protocols, and effect on screen. Whenever we want to inspect a piece of program, it is useful to peel off one layer of abstraction just to see how things are under the hood - move from the nice GUI to the HTTP (or network) traffic, check out memory access, and so on. But unless you work routinely on a similar piece of technology, you probably don't have the necessary tools installed, and you might not even know what those tools are. A few hours can help you significantly reduce this gap. I spent a few hours of getting my environment up - downloaded some emulators, opned up ADB and while doing that I learned how to set my phone to developer mode (it's hidden in a very annoying way. I can understand why it was done, but really - seven taps on an unrelated field?)
Next is a reminder that no plan survives contact with reality. We had a nice first 30 minutes planned - orientation, some smoke checks and so on, but once we encountered the broken application, we scratched the plan and winged it altogether. Perhaps with some practice we could learn to work with a plan and adjust it on the fly, but when working in a time-boxed situation, I learned it's really important to keep check on where you are and what's important. 
The last thing that I was reminded of is the importance of modeling, and how unaviodable it is. As the competition went through I noticed myself creating multiple models - what is the business goal of the application (so that I'll know which severity to assign issues), how things might be implemented (so that I'll know if a problem I saw is something new or connected to other things I saw), Everything we do is based on a model, and once you started seeing them around you can practice in creating them - faster, more tailored to your needs, focused this way or the other. 

So, this is what I've learned from this competition. Can I take something back to my professional life? Not direcly, I believe. But, since everything I experience can help me gain new perspective or knowledge on what we do when testing, I can draw some lesson out of it as well. There are some similarities between this competition and a "bug bash", so I can take the mistakes I've seen done here and make sure to prepare for them if I get involved in organising one such event, and I also gained first hand knowledge on why we might want to do such a costly thing (Mainly, I believe it would be helpful in directing the spotlight to some of the problems we have in our product and help people outside of the testing team to experience them, so that we'll make fewer of those errors in the future). 
One thing that surprised me when I've noticed it was the difference between this circus-show and real testing work, and through this difference I can better define what I'm doing and what's important - the main difference was that in this situation there's a complete disconnect between my actions (wandering around the application, reporting various things) and the rest of the company - There's no feedback coming from the product team: There's no access to the product manager that can say "this is very important to me,. is there anything else like that?" or "That' very cool from a technical perspective, but it has little business impact", there's no access to the developers in order to find out what pains them, there's no view of the development process and nothing can be done to actually improve things around. All in all, it's a bug-finding charade. It was a great reminder that unlike "coding", testing is an activity best defined by a context in which it exist, rather than as distinct set of activities. 

That being said, I do recommend participating in such an event if you come across one (don't go at great lengths for it, though) - not because of any educational or professional value it brings, but rather because it can be fun, especially if the one you happen to find is better organised than the one I participated in. 

תחרות "בדיקות"



ביום שישי האחרון השתתפתי בתחרות "בדיקות", נו, אתם יודעים, הדבר הזה בו נותנים לכם אפליקציה אקראית עם מעט מאוד הקשר ואומרים לכם "לכו למצוא באגים". הרבה אנשים כבר כתבו על למה הקשר בין אירועים כאלה לבין בדיקות תוכנה מקרי בהחלט, אז לא אלאה אתכם בקשקושים על החלק הזה. במקום זאת אני רוצה לדבר על הדברים שלמדתי תוך כדי השתתפות באירוע. 
נתחיל קודם כל עם תנאי הפתיחה, כדי שיהיה קל יותר לעקוב - בארגון התחרות היו כמה כשלים מאוד משמעותיים מבחינתי, מספיק משמעותיים כדי שtאחרוג ממנהגי ואזכיר את שם האירוע  - תחרות הבדיקות הישראלית שמתארחת כבכל שנה בתוך DevGeekWeek של ג'ון ברייס. למה אני טורח להזכיר אותם? שיתביישו להם. אז הייתה אפליקציה לבדוק, והייתה מערכת לדיווח תקלות. סבבה, לזה הם דאגו. לשום דבר אחר - לא באמת. 
על מה אני מדבר? קודם כל, לא הצלחתי למצוא בשום מקום את חוקי התחרות. הכוונה בסיסית על מה מעניין יותר ומה מעניין פחות. מה מותר ומה אסור (למשל, האם מותר לי לחטט בקוד המקור של האפליקציה כדי למצוא בעיות אבטחה? לא ברור), הייתה עמימות משמעותית מסביב לקריטריונים לפיהם נמדדים הצוותים: כמות תקלות שנמצאה, רלוונטיות התקלות, איכות הדיווח, מורכבות תרחישי הבדיקה, היכולת לספור עד שבע במנדרינית - יוק. שנית, בתחילת התחרות סיפרו לנו שצריך לשלוח "דו"ח מסכם", למה? מי הלקוחות שלו? מה התוכן שמצפים שיהיה בו? האם "היה לי כיף" זה דו"ח מסכם? האם הם רוצים לראות טבלה עם מספר התקלות שדווחו? סתם תדפיס? כשניסיתי לשאול, לא הצלחתי לקבל תשובה חוץ מאשר "זה חלק ממה שמנקדים עליו, ההבנה שלך של מהו דו"ח מסכם". בקיצור, כל חוקי התחרות חסויים וסודיים. יתר על כן - גם לאחר שהוכרזו התוצאות, לא היה פירוט על מה נוקדו הצוותים. בכנות, זה נראה כאילו שלפו שמות של צוותים באקראי. 
שנית, בעיות באפליקציה הנבדקת. קודם כל, מסתבר שאחד הפרמטרים החשובים ביותר מבחינת האפליקציה הוא מיקום, והעובדה שלאור מגבלות הקורונה עבדנו כל אחד מביתו הגבילה אותנו. מישהו חשב לספר על זה? שנית, האפליקציה הגיעה אל המשתתפים במצב רעוע - בין אם בגלל שרת שלא סוחב, אפליקציה בשלבים מוקדמים מדי של פיתוח או מזל רע, כל לחיצת כפתור שלישית הובילה להיתקעות של האפליקציה, ולא היה ניתן לעשות איתה שום דבר. במצב כזה, כל תקלה שנצפית היא לא מעניינת, והרבה מאוד זמן של צוותים הלך לאיבוד כי הם לא הצליחו אפילו לעבור את מסך הכניסה. לא רציני. 

אבל בכל זאת, למדתי כמה דברים. 
קודם כל, אירועים מהסוג הזה הם דרך אחת להשלים פערים טכנולוגיים - תוך כדי ההכנות לתחרות התעסקתי קצת עם הכנת סביבת עבודה למכשירים ניידים, ולמדתי מגוון דברים שלא ידעתי קודם או שהשתנו מאז הפעם האחרונה בה הסתכלתי - קודם כל, מסתבר שאי אפשר להתקין סרטיפיקטים על מכשירי אנדרואיד החל מגרסה 7. כלומר, אלא אם יש מכשיר שהוא rooted, וגם אז יש כמה חישוקים שצריך לעבור דרכם.
דבר שני, הוא היכולת לנצל כישורים קיימים כדי לתפקד בסביבות זרות - עד היום, לא יצא לי להתעסק בבדיקות מובייל בשום צורה רצינית (כן אספתי כמה כישורים קטנים על הדרך), אבל למרות זאת, הניסיון שאספתי עד כה הוכיח את עצמו כרלוונטי - ידעתי מה אני רוצה לחפש, ויכולתי לחפש את הדרך לעשות את זה. הצד השני של המטבע הוא פערי היכולת בין מי שמתעסק באופן יומיומי עם הטכנולוגיה הנבדקת: יש מגוון דברים שאפשר לעשות יחסית בקלות אם יודעים איך, או שבכלל צריך. למשל - בהינתן קצת יותר זמן יכול להיות שהייתי מסתכל על דברים כמו צריכת סוללה או הרשאות האפליקציה, אבל זה לא משהו שהיה לי במודעות בזמן הקצר בו שיחקנו עם האפליקציה.
הדבר השלישי שלמדתי הוא על חשיבותה של הכנה מראש לאירוע כזה. במקצוע שלנו אנחנו רודפים אחרי אותו חשמליים שרצים בין מערכות שונות, עם המון שכבות אבסטרקציה. כשבודקים מוצר צריך לקלף שכבה או שתיים כדי לגלות מה יש מתחת למכסה המנוע. אין זמן במהלך התחרות להתקין את כל הכלים, ואלא אם אתם עובדים באופן יומיומי על טכנולוגיה דומה, אין לכם בהכרח מושג מה הם הכלים שיכולים לעזור לכם. זה פער שאפשר לצמצם באופן משמעותי בכמה שעות של הכנה. במקרה שלנו - אמולטורים, חיבור לADB והעברת הטלפון למצב מפתח (זה מגוחך עד כמה זה נעשה מסובך בגרסאות האחרונות של אנדרואיד. אני מבין את הסיבה, אבל בחייאת ראבאק).
תכנון זה נחמד, אבל זה הולך מהר מאוד לפח. לפני הפגישה עשינו תכנון זמנים קצר כדי למקסם אפקטיביות. צעד ראשון - סיור באפליקציה ומיפוי הפונקציונליות. מה קרה? האפליקציה לא באמת עובדת. קצת הלכנו לאיבוד ולא זכרנו את התוכנית של עצמנו. אני מניח שזה משתפר ככל שמתרגלים סיטואציה כזו יותר. 
דבר אחרון, והכי חשוב - מודלים. גם לנוכח האפליקציה השבורה כל הזמן רצו לנו בראש מודלים למגוון רחב של דרים איתם התמודדנו - מודל של "מה מטרת האפילקציה" כדי לקבוע מה חשוב, מודל של "איפה דברים יכולים להיות שבורים" כדי למקד את החיפוש שלנו, מודל של "מה עומד מאחורי התקלה שאנחנו רואים" כדי לדעת אם להמשיך לחפור באותו מקום ולחפש תקלות נוספות או אולי עדיף לחפש במקום אחר, ואם התקלה שאנחנו רואים כרגע קשורה לתקלה קודמת שראינו. 

אז אלו הדברים שלמדתי על תחרויות מהסוג הזה. האם יש משהו שאפשר להשליך מכאן על החיים המקצועיים שלי? לא באופן ישיר. השתתפות בתחרות הזו בהחלט מלמדת אותי דבר או שניים על טעויות אפשריות אם נחליט לארגן ציד-תקלות על המוצרים שלנו ועל הסיבות בגללן אולי נרצה לעשות את זה (בעיקר, אני חושב שיתרון אחד שיכול להיות מזה הוא שינוי הלך המחשבה של מי שאינם בצוות הבדיקות והפניית הזרקור לסוגי בעיות שונים שיש לנו, כך שיווצרו פחות מהן). חוץ מזה, אני חושב שחידדתי אצלי בראש את ההבדל בין בדיקות תוכנה לבין התחרות הזו - המבדיל העיקרי בין הפארסה שמתרחשת בתחרויות מהסוג הזה לבין עבודה רצינית הוא המשוב הקבוע ממפתחי המוצר - אין גישה למנהל המוצר שיאמר "זה חשוב, חפשו עוד כאלה" או "זה מגניב ברמה הטכנית, אבל לא אכפת לי", אין גישה למתכנתים השונים כדי להבין מה קשה ומה קל להם, אין התייחסות לתהליך הפיתוח ושיפור שלו. יש כאן קרקס של "מצאתי תקלות במוצר". יופי נחמה.
בסופו של יום, זו הייתה תזכורת מצויינת לכך שיותר מכל דבר אחר, בדיקות תוכנה מוגדרות על ידי ההקשר בו הן מתקיימות ולא על ידי סט כזה או אחר של פעולות. 

למרות הכל, אני שמח שהשתתפתי בתחרות הזו, לא כי היה לה ערך מקצועי כלשהו, אלא כי היא הייתה בילוי נחמד לכמה שעות, אם יוצא לכם להיתקל בתחרות כזו (בתקווה, אחת שמאורגנת טוב יותר מהפארסה בה השתתפתי), אני ממליץ בחום.