אשרי אדם מפחד תמיד Happy is the man who always fears: 2024

Monday, October 7, 2024

Pay to stay?

So, I stumbled upon (or rather, saw it in Alan Page's five for friday) this blog post explaining why companies should "pay their people to stay" that is - offer the people who are already working and having good impact on the company a competitive salary. I read the argument there, and it's convincing at first glance, at least up until the point you start thinking about it. For those who don't want to read the entire post, I'll sum it up - People currently working for the company are becoming more valuable as they stay, since they gain domain knowledge and become more effective in the specific thing the company does. Therefore, the argument goes, it's worth paying those individuals above market value for those skills.

So, what's wrong in this argument? Oh so many things.

First of all, the assumption that if you pay well people won't leave is not realistic. People leave for a lot of different reasons - they might look for a position that is not available, they might move to live elsewhere, or even just feel that they want to change domain or learn a new area of technology. That's assuming they don't run away from a bad boss. So, let's just put this on the table. You might be paying well, only to retain 10% of the people you want (or it might be 90%, but I doubt it).

Second, let's assume this strategy works - if we're doing it, other will too. In this case, let's look on the options our star employee with 5 years of tenure. Before we implemented the suggested change, their salary would get a nice 20% bump if they would be hired for their current level. Now, we've implemented a tenure-based salary increase that compensates for that, and their salary is 10% above market. They interviewed with foo-corp, which finds them a perfect match for what they are looking for. They can keep on looking, knowing that many other suitable candidates might have the same conditions, or they can offer above market rates, resulting in the market value of the job rising and our formula will need to be updated. Since companies do have some limits on their budget, this race to the impossible salary will stop somewhere, and on the way might have other side effects (such as "no raises for the first three years" or just underpaying juniors and deepening the imbalance between higher and lower paid workers). Another possible side effect if such practices are common, people might be hesitant to interview people who have been in their current workplace for over 5 years, pushing people who care about their career to start looking sooner just to avoid being labeled this way if they do need to change jobs in the future.

So, the financial model is not really convincing. But let's assume for the sake of the argument that it does, that we've found a magic formula that actually helps retaining people in significant numbers in a financially reasonable way. Do we actually want this? The post assumes that tenured people are more valuable, but that's far from being universally true. There are several costs to having people, even good ones, stay. Here are some disadvantages:

Reducing our bus factor. If we nurture those super-valuable people and they stay forever, they become more and more central to our organization, despite honest attempts they will gather more knowledge and expertise, and more people will rely on them, when they finally leave, or retire, it will be ten times more painful than if they left after only five years.
They prevent change. The people who are being retained are central to the current way of working. The learn the organization and grow accustomed to it. When they say "we've always done it this way", or "we've tried it and it didn't work" it will carry even more gravitas.
Less mobility means less ideas travel in the industry - yes, we might meet in conventions and meetups, but nothing can compare to working in a different setting and seeing your previous ideas about what's right just burn to the ground in a new context.
Slower growth to the others: When people who are central to the organization leave, there's some sort of vacuum left behind them in all the things they were just the go-to person. When they leave, this vacuum is filled by one or more people, who now need to fill boots bigger than they wore before, and most of the time they grow to do this well, and give it their own spin. Thus the industry gains more skilled people, the company gets to have the same tasks benefit from multiple viewpoints and the people themselves are gaining skills and sometimes a promotion. Yes, there are growing pains, and the right person leaving in the wrong time might mean those pains can topple a company, but by having multiple, smaller, holes to fill, we avoid the giant crater.

One final point - I'm not advocating to push your senior people out (I keep this recommendation to cases where you're starting to think "Uh, oh - if this person leaves we're all doomed"). If you're able to create a workplace that has a good balance between compensation, culture and professional interest people will stay and you will benefit from that extra amount of domain knowledge they learn, knowing that other companies will have a hard time matching you on such a fine balance. But don't fret if they leave. The graveyards, after all, are filled with people who couldn't be replaced.

Thursday, September 26, 2024

Wiring the winning organization - Book review

Book cover

Or, in its full name: Wring the Winning Organization: Liberating Our Collective greatness through Slowification, Simplification and Amplification. By Gene Kim and Steven J. Spear.

It's a bit of a mouthful, but this matches perfectly the ambition of the book's authors: To devise a theory of everything. Or, at least, of business operational success - any business. Naturally, in order to do that, they are abstracting quite a bit, but despite everything they manage to create something that sounds convincing and powerful. It's a great book, but it suffers from a major drawback for me - it's a managers oriented book, which fits their theory (short version - it's all about management) but is not as actionable for me as I would like it to be. I can still use the insights from it to reason with my managers, and to set some goals I will want to advocate for, but it puts one heck of a nail in the coffin of "driving a change bottom-up" idea.

So, to the theory of everything.

The authors define 3 layers of "stuff" that happens in just about any business:

Layer one: "Technical Object". Or, in other words, the doing. This is what the skilled people are doing - be it coding, chiseling a piece of rock or selling timeshares to elderly people.

Layer two: The tools. Those are the tools used to do the things in layer one, the hammers and jigsaws, the build systems and expensive electronic microscope.

Layer three: The social circuitry. This is how your organization is structured, how data flows, how dependencies are managed and how decisions are being made.

The radical claim the authors make is that while there are a lot of differences between organizations in their layers 1&2 needs, in layer 3 it's pretty much all the same. Therefore, it stands to reason that in this layer, good practices can be transferred between industries, which we can see happening in some places (the one I remember is the Toyota manufacturing approach being an inspiration to devops movement in software development). The book's main message is that in order to increase productivity & quality while decreasing costs and making employees happy and motivated, you "only" need to do three things. They name them "slowification, simplification & amplification". Do those, and you're on the right path. Note - there's a distinction between "simple" and "easy". Each of these words is packing a whole lot of work within it.

The first term is a word they made up to carry the meaning they were looking for of "slowing down to make things better" (they say that in German there is "Verbesserung", which my limited knowledge just says means "improve", but I might miss some context). The idea is that since when under pressure we revert to our habits and instincts instead of thinking (in Daniel Kahnman's terms that would be "system 1(fast) thinking"), we need to slow down to engage our brains (system 2, or "slow" thinking). This can happen at many levels - some phases, such as planning\design are slow by nature, and might only need some small nudges to be more conscious, we can use a testing environment that is low-risk and we can control to a greater degree, and we can find ways to slow even the performance environment (that's the name they use for "production", as it's more broadly applicable). Let's take for example a basketball game - before a game, the coach might go over some tactics with the team, draw some sketches on the board, so that players won't have to communicate the plan while they are running after the ball. This way they have time to suggest adjustments and deliberate alternatives. Then, when in practice, they might stop everything to perfect the way they pass the ball around, and during the game, a player can take a couple of seconds dribbling to re-orient and plan the next move. Slowing down is not always possible - a player jumping for the rebound ball can't pause to think, so in this instance they will have to rely on the instincts they've built during practice.

Simplification, thankfully, retains its dictionary meaning. in order to succeed, we want to solve easier problems. They mention three techniques to achieve this: Modularization is the way we partition a large problem to smaller yet coherent problems. Incrementalization is their way of saying "baby steps", where we establish some solid known basis (through standards, automation, documentation and so on), and invest efforts in learning the next new thing and linearization is creating a sequence where actions are depending at one another.

Most of the book is alternating between explaining the three concepts (or is it principles?) and analyzing success and failure examples according to them. They show, for instance, how the Wright brothers were able to get an airplane flying in a fraction of the cost of the (failing) competition because instead of trying to build one complete plane after another, they did a lot of experiments on various parts such as testing only the wing design elevation power, without attaching it to an actual plane, or using a kite in order to test the ability to steer by curving the wings. Those examples make a pretty good case for the main theory, but the one thing that was probably the reason for the book being that convincing was the opening "vignette" which is available in google books for free (link, I hope), they do a much better job of describing it than I could, but the main idea is describing how poor management and social circuitry are making a mess of something relatively simple such as managing moving furniture out of hotel rooms and painting them.
In between all of those, there are a few insights that are worth mentioning, such as the realization that the social circuitry should match the way the technical layers are organized (which, if we think about it, is a take on Conway's law that reverses the causal direction. It's not that "since our org structure is X, all of our systems will be X compatible", but rather "since we want to produce X, we need to structure our org in a manner that will support it"), or a motivational explanation of why the framework presented in the book works (in short: simpler, more isolated problems enables more minds to engage with the problems at hand). The most depressing insight they share, which I have already mentioned, is that those things don't come bottom-up, to really have an impact, one must have executive power. I'm guessing that as an individual, I can use some of these ideas to structure the work around me to some limited extent, but the truth is - simplifying most of the problems I deal with will require change in multiple other teams, so I can nudge, convince, and haggle - but I can't really control what's going over there, so my effect will be limited.

So, all in all - great book, especially if you're somewhere in the C-suite. I wonder if it should be taught in management courses.

Wednesday, July 3, 2024

I have released an open source library

עברית

In my past two jobs we always talked about collecting some tests metadata to analyze later, but never had time to collect it. Well, I had a few moments between jobs to create an easy to extend (I hope) library for pytest. It might or might not help me in future workplace, but I'm hoping it will help someone.

Taking the perspective of building a library for unknown users was an interesting experience - I obviously needed to put more emphasis on documentation, but more importantly, I needed to think "how can a user customize this for their needs?" One result is that I chose to omit functionality - for instance, I'm not providing a database storage utility, but leaving that for the user who knows which DB and schema are they using. I still consider creating an example project using this library, but we'll see if I have time and attention for that.

Anyway, its free to use, modify and so on. If you need help, ping me (or better - open an issue)

PyPi: https://pypi.org/project/pytest-stats/

Github: https://github.com/amitwer/pytest-stats

יצרתי ספריית קוד פתוח

English

בשני מקומות העבודה האחרונים שלי נורא רצינו לאסוף נתונים על המבדקים שאנחנו מריצים כדי שנוכל לעשות אנליזה רוחבית אחר כך, אבל אף פעם לא היה לנו את הזמן לבנות את התשתית לזה. ובכן, היה לי קצת זמן בין העבודות, אז בניתי תשתית איסוף נתונים עבור pytest. האם זה יעזור לי בעתיד? אלוהים גדול. אני מקווה שזה יעזור למישהו.

לבנות ספרייה עבור משתמשים שאין לי מושג מי הם זו חוייה מעניינת. מעבר לצורך הברור בתיעוד טוב יותר, פתאום מצאתי את עצמי חושב על "איך המשתמש יכול להתאים את זה לצרכיו?" אחת התוצאות היא שזרקתי החוצה חלק מהפונקציונליות. למשל - לא כתבתי מימוש מחדלי לשום מסד נתונים, כי אין לי מושג איך המשתמש ירצה שדברים ייראו אצלו, באיזה סוג מסד נתונים הוא משתמש או אילו נתונים הוא ירצה לשמור. תופעת לוואי של זה הייתה פרוייקט פשוט יותר למימוש - אני דואג לאסוף את הנתונים, והכתיבה היא בעיה של מישהו אחר. אולי בעתיד אצור פרוייקט לדוגמה שישמור נתונים איפשהו רק כדי להדגים איך עושים את זה, אבל נראה אם יהיה לי זמן או קשב כדי לעשות את זה.

בכל מקרה, מוזמנים להשתמש בעצמכם. אם אתם צריכים עזרה, צרו קשר (או, עדיף, פתחו באג\שאלה בגיטהאב).

PyPi: https://pypi.org/project/pytest-stats/

Github: https://github.com/amitwer/pytest-stats

Tuesday, July 2, 2024

closing 5 years, retrospect

After just a bit more than five years, today was my last day at Deep Instinct, and that's a great time for some reflection. I've seen some good, I've seen some bad, and I managed to learn from both. It's a bit daunting to try and pack five whole years into a single post, so I'll just paint an image in wide brush strokes. It will be inaccurate, and I'll miss a lot, but it is what it is.

Also, it will be long, bear with me.

Year 1

This was a year of growing, and of adjusting expectations. it was my first time working in a start-up, with only one previous workplace, and I was up for a surprise. There were no working procedures, minimal infrastructure, and the very strange part - it seemed like everyone were ok with that. In this year I learned how does it look to have a project that can make or break the company, I learned that building communication channels takes a lot longer than I anticipated, and that what I believed were industry standards were not necessarily common.

Our main focus on this year has been to build a team and to create the necessary infrastructure for system testing our products. So we spent some time hiring (note for readers - if you aim above the standard skill-set in your area, you have a long and arduous process ahead of you), and I found out that with more than 5 junior people to shuffle around (I wouldn't really call this mentoring) I was overburdened. I came with an idea on how should our framework look like, and was lucky enough to have someone in the team come up with a better suited idea. There was a lot of work done on the technical side, and not a lot done on integrating with the other teams or the product work. We simply were not yet ready. by the end of this period we had a working system test framework, a team of ~12 people, and we've proven our value to the organization.

Years 2 & 3

Those are not actually full calendar years, but it's a nice title for the period after we've made enough catching up and focused on integrating with our environment. Some of this effort was being done during the first period, but this was more in the way of laying some foundations for the future. Now we've started turning up the notch on talking with other groups. This is where we've started feeling the missing procedures and culture around us - getting invited to a design review is quite easy, but getting invited to a design review that does not happen is a different thing altogether. We've hacked around and compromised a lot - starting with just asking for someone to give us some sort of a handover in lieu of the design review would be one such example. In other places, where we did join the table, we made sure to provide value so that we will be invited again, and we volunteered to help other groups use our infrastructure for their end, partially to gain a reputation boost (but mostly because it was needed for the company). We had a great opportunity when a new product was starting from scratch (ish) and we provided them with a dedicated person. We then used this team as a model to say "this is how we think we should work", thus initiating the third phase of our 4 steps plan (worded a bit differently, I wrote about it here. Roughly labeling, the steps are stabilize, grow, disperse and disband) - we didn't actually complete the growing part: we didn't have a good enough infrastructure to share, and most of the people we had were not skilled enough in testing yet, but reality is messy, you don't get to complete things cleanly before moving on.

We also struggled with our own growth - when our team size neared 20, and the number of products grew, we started feeling the pain of stepping on each other's toes and trying to focus on too many things. So we split into two teams, which in turn required us to split our code-base to match - a realization we got to a few months after the team split, since we now had people focusing on one task and we didn't want us distracted by noises from the other side of the split. Another aspect of this split was that I found out I'm a difficult employee - as part of the split my then manager became a group lead, and a team lead was recruited. At the point where she has arrived, all of the team were people I either recruited or welcomed to the team, and I was a nexus of knowledge about most of the things happening around our team. That is to say - while the authority was hers, I had way more social power. It took us a while to sync, and there was this one time where I used my excessive power and clashed with her in front of the entire team - I knew I was wrong a few minutes after doing so, but the damage was done. I got a good lesson on the difficulty of apologizing publicly, and later we've both synced and balanced our power in a more suitable way, so I'm guessing it all worked out in the end, but I hope I will remember this lesson for the future.
We have also faced another problem - A lot of people were leaving us to pursue other opportunities. There have been a lot of factors to this - we have hired mostly juniors out of university, we were a small start-up with a lot of changes, but there were two factors that bothered me. First, there was the feeling that in order to progress in one's career, one had to get out of testing. The second was that in our company's culture, testers are seen as a second class employees - I'm pretty sure no one will admit to this even to themselves, but it can be seen in all of the tiny decisions - who gets invited to early discussions, how often do people feel comfortable telling you how to do your job, who gets credit for work done, and so on. It ties back nicely to the first problem, but it was noticeable enough to deserve its own place. It took me a while, but the first problem has led me down the path that led to building an in-testing career growth model. The second part of the problem was one I did not address at all, and when I look back on, there were probably some actions I could have attempted. The way I see it, there are 3 main reasons for testers to be on the bottom of the food chain: First is the reputation of the profession in the outside world, which is beyond me to change. Second is the fact that in most places, testers who write code are less capable than "proper developers" (which leads to a vicious cycle of hiring with low standards and reinforcing the conception of testers having inferior skills) and the third is that testing have no explicit contribution: outcomes such as risk reduction, placing safeguards, and maintaining test code are part of other roles as well, especially when actively pushing towards full integration of testing into the teams. I'm still bashing my head occasionally against this question, with recent thoughts influenced by "Wiring the winning organization" (A book review will come soon, I hope).

All in all, there were a lot of challenges during this period, but it was a good time.

Year 4

Year 4 has started on January 2nd, 2022. those with a keen eye and access to our HR system might notice that there are 2 months missing between my actual 4th anniversary on April 2nd and this date, but as I mentioned, the previous years have been so only in the most general way. In that date, we had a new VP engineering (it's formally named VP R&D, but we do have a separate research department) join us and replace the person who was filling this role at the time.

I learned one thing during this time - While it's true that Rome wasn't built in a day, it didn't take that long to burn it down. At first, I thought it's just a style of communication and trying to prove that he's the boss, but soon enough there were so many examples of bad leadership that I couldn't push down the fact that I was dealing with a bully. A few red flags for the future, and perhaps for the readers as well:

Everyone who were here before the bully are stupid, as is every decision made.
People that are still here need a firm sheriff to teach them how to do their work.
All decisions should go through the bully, if there was a decision made without him, it will be reversed.
Listening is for other people.

I could see the damage really fast - the number of crises that required people to stay up late and to work on weekends skyrocketed, communication between teams was reduced, and everyone tried to make sure that when something goes wrong, they will have someone to point at and say "I did my part, so it must be those people over there". Add to that a replacement of 2 of the 3 group leaders within 3 months (one quit, the other was fired in the most insensitive manner), and it's no surprise that things were a mess. About a month after the VP has joined, he announced on two major projects - a complete rewrite of our SaaS product, and a shift to scrum. Both failed spectacularly.
Let's start with the technical project - As it is with many companies, there are more urgent tasks than there is time or people, so shifting people to work on the project would mean slowing down on the roadmap, which can feel fatal for a startup. Also, do recall point #2 - people other than the VP are incompetent by definition. We also have no real documentation of the system we are aiming to rebuild and no experience is creating good requirements or specification even for smaller projects. The solution? To hire an external company to build part of our core business components. Yes, it is stupid as it sounds. Yes, it was said in some circles at the time. Saying it to the VP would have been hopeless, and an invitation to be bulldozed.

So, while this project is happening, the other project can't happen without the people in the company, right? Moving to Scrum is a people and processes task. Well, this has botched as well, and I wrote about it in some length already here. Since that time, the project has concluded and I got to talk with one of the consultants and asked him why some very basic things didn't happen, or at least mentioned as goals and his answer was very respectful and general, as he was masking the fact that the VP who hired them also blocked most of their initiatives (I did ask if I understood correctly, and got the "I can't answer this directly" kind of answer).

So, why did I stay? At first, I was mostly shielded by my skip-level manager, the one remaining group leader. In July I approached him to tell that I have a job offer, but if he can promise me that he won't break in 6 months, I'm staying. His response was that while everything is dynamic and he can't make promises, at the moment he has no thoughts of leaving and is not looking around for alternatives. A month later he called me to tell that he got a job offer and is accepting it. Well, that's life, and I wished him well. Then I took my time - things were not terrible for me personally, so I thought to choose a place I'd be happy to. In January 10th 2023 I got an offer from a place that I thought could be good enough, but I took some time to deliberate. The 4th year has ended at January 15th 2023, when we were told that the VP has chosen to pursue other opportunities and will be leaving effective immediately. The next day I told the place that offered me a position that I'm staying.

Naturally, there was a lot more going this year. For instance, I got to do some close mentoring with some people, which was very interesting - I worked formally with two people, one who joined us in order to stretch his abilities, and was very receptive to my attempts in teaching. I found out that I need to learn how to map the skills and gaps of my colleagues and then find proper way to teach those. For instance, I did manage to see that some of my mentees were struggling with a top-down code design, and then found that I'm not sure how to teach that (tried to force TDD, I think it worked to some extent), other skills such as modeling or communicating confidence - I don't think I managed to teach as much.

It was also the year where I could put my knowledge in testing theory to use, even if a bit violently. The story was as follows: After my skip-level manager has skedaddled, another was brought in his place, though without either his skills, his care or his work ethics (though he does hold the record of being the fastest person to be fired that I've seen). Shortly after that, there was a crisis (did I mention there were a lot of those?) a product that has been developed for an entire version without any sort of testing or attention of testers (all of the testers were busy on the previous crisis) and without mitigating this problem by telling the people working "ok, you don't have testers, do your best", was now blocked by all of the tickets in a "ready for testing" state. The new group lead, eager to show... something, has stated (or, perhaps, iterated the bully) "no compromises on testing, I want full test plans", which, given the deadlines we were facing was simply not possible, so I did some rough estimation and got to somewhere between 13 and 40 days just to write the test plans for the 42 features, let alone executing them, to which I got the lovely response "we need to be more agile". I'm still quite proud of not responding snarkily "agile doesn't mean cutting corners and doing a shoddy work", instead I suggested that we'd use SBTM, which won't solve our problems of not having enough time, but will mean that we'll start working and finding problems faster. I might have exaggerated its benefits a bit, but I can now say that I got to win an argument only because I was more educated on testing than the other side and could slap some important looking documents at them (I have James Bach to thank for creating this document) , so yay me. Oh, also there was the part where we didn't waste a ton of time.

Year 5

So, the bully has gone. There were some rumors around the cause of him getting fired, and sadly, "someone noticed he's menacing the entire organization" was not one of the options (in fact, another rumor had it that they actually had another VP coaching him on the bullying part, which, if true, saddens me greatly because it means his behavior was known). Now we had a chance to start a healing process and make up for the year of moving backwards (my feeling was that our culture went back to a state which is roughly equivalent to what was a year before the bully, but that's just a guess). We had several challenges to overcome - we had a beaten down department, we had lost a huge number of people (while I can't tell exactly the numbers, today, about half of our engineering department were hired in the year and a half after he was fired), the group leaders we had were hired specifically to match his management style - to relay his decisions and to have as little agency as possible and make no decisions themselves. Personally they were nice, but didn't have the skills to lead a healing process, and finally is the person who has replaced the bully, that I think is a good manager, but took on an impossible task - managing two departments (he was, and still is, our VP of research) with over 100 people at the time, and having the style of a good manager, which is to delegate a lot to his direct reports, which would have been great had we the right people and the right structure in place. Add to that the market pressure as funds of the last round were slowly but decisively dwindling, and the result is that we had an absent VP. On top of all that, since last October we got a war in Israel, with many people, the new VP included, were called for months of reserve duty, risking their very lives defending the citizens here.

So, not the ideal conditions for healing. But, there were opportunities as well - before the bully left, he declared that we're moving to (finally) disband the QA department, sure - as always, it was done for the wrong reasons and seemed like he was going to do that the wrong way, but once he left we talked to the group leaders and understood he did not share with them more details than he had shared with us (not a lot, in case you've wondered, only a general statement), so we started discussing how to do this transition safely, and where should the responsibilities that were currently held by the test teams be. This task proved too much for our group leaders, so they decided to leave the status quo more or less as is, with some cosmetic changes perhaps, this left my manager to start and dedicate more of our team to specific products, which is a smaller step then what I aimed for, but at least it was in the right direction. What we both did not know at the time, was that the bully, after clashing with my manager who was defending the team quite admirably, has started the process of firing her, which while halted due to him being let go, did paint her as a troublemaker in front of HR and upper management. The difficult discussions around the re-org were the last straw, and despite the team rallying up behind her, she was let go.

this was the point where I had to assume another role - the team's anchor. Several people approached me with concerns and wondered whether they should start looking for another place. I spent some time talking with them and trying to help them figure out what they need and how to get that here. I was feeling very ambiguous about this - on one hand, I wanted to join their rant and tell them I agree, on the other, I was the face of the system, and had a clear incentive of convincing them to stay. I was very careful not to lie, but there were some questions I had to dance around. There were some nicer aspects to this, though. Once I realized this is part of my role, I looked on the 2nd test team - we had a major problem there, a few months before my skip-level manager left, their team lead made a lateral move to another department, and the team was left without a manager. As long as my skip level was there, he was able to provide some guidance while looking for another team lead, but this effort stopped when he left, and the team of mostly juniors was left without a leader, and the few seniors there left or detached shortly after. So, now we have an orphaned team for over a year. People were leaving. I did a quick assessment of the people still there, and found that while most of the people there are ok, there is one that I really didn't want leaving, so I approached her and asked how she was feeling and what she was missing, which has led to us meeting weekly (ish) and me trying to be the professional authority she could learn from or consult with. That ended up being really fun.

On the work side, with a new VP, we finally started to work on the shadow project declared by the bully, only to find that, quite unsurprisingly, the contracting company who were now getting paid for over a year has managed to produce some lengthy architecture documents that were not really suited to what we needed, and that the bulk of work still needs doing. Surprised? I wasn't. So, teams started scrambling in order to get up to speed, and the added difficulty of coordinating with a contracting team made it that much more difficult. I was brought in to assess the "testing" they have done, just to see some demo robot-framework tests done against a mockup, with no thought on the SDLC, and no one on the contracting team who even understood my basic questions. So it was up to me to devise a testing strategy for this new product, and for the modifications of the other parts that needed to change in order to support this rewrite.

At any rate, reality was not waiting on us to sort our mess, with post-covid funding being harder to get and some major customers backing out of deals in the last moment, our company figured out that in order to stay in business, it is better to update the income forecast to be more on the pessimistic side, which meant a reduction in force - one of my mentees, whom I just told a week earlier that probably had a month to improve before termination was an option was the first to be named, and another, more appreciated employee was laid off as well. Again, I found myself as the go-to person for people disturbed by the change, and this time I could only tell them "the company claims that after this reduction we should be clear for at least a year, so if the only reason is job stability, keep an eye, but this need not be the fatal sign. As part of this, the company has pivoted and changed its market focus, so the entire rewrite project was called off, a year and a half too late. We returned to the previous initiative of a gradual upgrade of the current system, because, well, it made more sense. so new strategy and cool buzzword technologies? not anymore. At least this time when I did damage control I could be honest and say that my only problem with aborting this project was that it happened too late.

Another thing that happened is that management declared that "we have a quality problem", and while they also said "quality is everyone's responsibility", the way they behaved was more in the way of "we have a QA group, so they should patch quality in the end". We found out about this when we were told a "QA expert" consultant was hired and will be joining us. Yes, it was a slap in our face and ego, but we decided to let it pass and try to use this opportunity to further push our goals. This consultant, unlike the agile transition ones, was quite good - She did her research, did some coalition building, and started to look for ways to push for a change. The main difficulty? her hands were tied by the limited scope - fixing only the test team and not the surrounding problems we have could make a minor improvement at best. To make the problem worse, resource and time allocation were not really made (there were promises for time, but they were not scheduled or considered by the other projects) and we had a new test team lead (for the orphan team) that, at the very least, wasn't aligned with the rest of the company - but we still needed to push through his incessant resistance.

I've learned a lot from the consultant. I learned how to take a vague goal and make it concrete and presentable to management, I learned from our failure on the importance of setting clear milestones, and of saying "I don't need 10% of the department time as a constant, I need 5% now, in the form of one specific person, and in 30 days I'll need 20% of all hands on deck", not doing so meant that we had people bored while we were preparing, and schedules tight when we actually needed the people. One thing we tried to do but didn't get to an agreement was on our definition of "quality" (which is a term I really hate for its vagueness) and following that, on the goals we could theoretically achieve.

However, not all slaps in the face were as successful as this one, there were a few others, and two that had a significant impact, both were a case of a problematic hire. The first was a team leader for the orphaned team - it just so happened that the hiring manager, a developer in his past, had no idea how to assess testing skills, and thus, for most candidates in this position, he asked the other test team lead we had to interview, and due to the nature of the market - there were a lot of candidates that fit on paper, but had no relevant skills, and they were failing the simple skill checks. Then, there was this one candidate who did not get interviewed by said team lead - the benign explanation is that the team lead was under a time crunch and needed some space, and the hiring manager decided to skip this phase despite the obvious gaps in his filtering. The candidate that was hired was a very wrong fit - he didn't have the skills to help his team improve (they needed a strong coder) and he wasn't able to find someone else to provide this sort of service. He tried to bring a solution that worked for him in a completely different place without understanding the situation and without talking to his team, and he had the wrong mindset for what we were trying to achieve. All of this is because someone assumed they can hire a testing professional without at least having a conversation with the local testing experts (there were two of us, at the time, so I know it didn't happen) to understand what should they be looking for, and how to check for that. Failing to do that, the first person who knows to parrot some stuff that sound like testing will have an easy was to slip in.

The second slap is pretty recent - a new QA group lead was brought along. This is a double slap in our face - first, because bringing an outsider without letting someone from within to apply for the position is showing a lack of confidence in the people you have (this was also the case with the team lead position). Second, because bringing a group lead was the opposite of what we were doing here for the past few years, and finally, because no one talked to us, again. Now, it's complicated to have one interview their future manager, so it makes a lot of sense to leave us out of the hiring loop, but after so many bad choices, why not ask for our input? maybe we'll be able to convince you you don't need this function, maybe we won't and you'll explain to us what is it that you're trying to achieve, maybe we'll agree that in our current state we need to find a team lead that will have the explicit goal of disbanding this group within one year and then build a community of practice instead - but this didn't happen, and I found myself facing another manager that I don't appreciate professionally. In this specific case, it took the manager less than a month to make some unforgivable mistakes that were a clear signal for me that on top of what might be just professional differences between us and not sheer incompetence on his side, there are also a severe issue with his management style and skills that I can't accept. To be fair - this particular problem is something that is difficult to interview for, and I'm not sure it could have been avoided. It is still not a pleasant experience.

The one last thing that sticks in my mind is the most recent project I was involved in - a new, AWS native product that is supposed to be a major revenue boost for the company. It started while I was busy putting off some fires, or as we call it here "releasing a version". When I tried to get involved in the discussions with AWS, I got a pushback, and since I was busy with the fires, didn't insist. It turned out to be quite a mistake, since by the time I was ready to join, the initial architecture was already decided, and aside from being quite complex for reasons it took me a while to learn, there was zero thought on how to manage this coding project or how to test it, and I had to spend perhaps a month just wrapping my head around all of the new things I had to learn just to come up with a strategy, then break it down to a plan, and do this while everyone is asking "how soon can you start writing system tests?" and answer that we need about a month to change our infrastructure, and that it is more difficult to do because the product has no deployment scheme of the latest code, so each step is just taking five times the time and effort it should. Dealing with the constraints I had was tough, and I wrote about it in my last blog post. I'm quite happy with the result, but sadly I won't see how well it works, or be there to help push for the strategy to be fulfilled - there are a lot of ideas that are new to the company (sadly, one of them is unit tests), and there is surely some tinkering required with the new mechanisms (such as contract tests and managing the various contracts), but that's someone else's task now.

I'm off to find new challenges, and will be starting tomorrow in a new and exciting place.

Tuesday, June 18, 2024

Testing a cloud-native product

עברית

In the past several months the company I work for has been building its first cloud native product, and I was tasked with figuring out how to test this, given the limitations developing in the cloud poses for our kind of application. This posed a few challenges that made everything that much more complicated, and while there's a lot of materials online about tools that can help with testing stuff on the cloud, and specifically on AWS (which was my focus) I couldn't find a holistic overview of how to approach testing such a project. I hope this post will help remediating this gap, and hope even more that I'll annoy enough people who will point me to articles that already exist and I missed just to prove me wrong.

I had noted the following difficulties, that might, or might not be relevant to your project as well:

Our product deals with AWS accounts as its basic unit of operation - we deploy a service that protects the account's data. As such, each environment needs to have its own account, so deploying multiple environments for testing (and development) is not really feasible - it's expensive, long, and deleted accounts stay live for 90 days or so.
The product is designed as several services communicating with each other, maintained by several different teams, and using several repositories.

So, how can we bake feedback into our SDLC? It didn't help that we didn't yet define said SDLC, but we're not here to help, are we? Besides, this is also an opportunity to define said behavior in a way that will enable feedback instead of being an obstacle.

The solution that made sense in my mind was the one I found in Dave Farley's book "Modern Software Engineering" (I wrote about it here) that basically said testing the entire product before it is deployed to production is something you don't get to do in a services architecture. Really cool concept - having multiple, individually deployable components and have each of them reach production in its own time. There was only one caveat to this approach - there's no way in hell I'll manage to convince the organization its the right thing to do, and even less of a chance to get the necessary changes in our processes and thinking in place. However, I might be able to use the constraints we do have to help me push at least partially in that direction.

The plan, if I'm honest, is assuming the old and familiar test pyramid, only this time presented as "well, we can't do it in the ice-cream cone we're used to, we don't have enough environments."

So, the vision I'm trying to "sell" to the organization is as follows:

Yes, you can have your nightly "end-to-end" system wide regression test suite. However, unlike other products where the bulk of testing actually happens on this level, the nightly will be focused on answering the question "is it really true that all of the parts really can work together?" For this purpose, we'd want a small number of happy-flow tests to run through the various features we have.
The bulk of testing should be done in the unit test level. While our organization still has a lot to learn on how to use unit tests properly (starting with "let's have them as a standard"), at least for the small services it's really easy to cover all of the edge cases we can think of in unit tests. It also happens that for AWS there's an awesome mocking library called "moto" which provides a pretty decent and super easy to use mock for a lot of AWS functionality without needing to change the already existing code. In this layer we'll verify our logic, error handling, and anything else we can think of and can test at this level.
Still, not everything can be unit tested, and on top of our logic, we want to check that things work on the actual cloud before merging. Therefore, we are attempting to build a suite of component tests that will deploy parts of our system and trigger their sub-flows. For example, we have a component that scans a file and sends the verdict (malicious or benign) to another component to execute the relevant action - so we can deploy only the action executing lambda alongside a bucket and start sending it instructions to see that it works.One benefit we get, apart from being able to tell that our lambda code can actually run on the cloud, and is not relying on dependencies it doesn't have, is that we must keep our deployment scripts modular as well (I did mention some small steps, right?)
We have a lot of async communication, and a lot of modules are depending one on another to send data in specific format otherwise they break. To expose those kind of problems early, we are introducing Pact for contract testing - the python version of the tool is a bit limited at the moment, but it has most of what we need.

So, to make it short - during a pull request, we will run unit, component and contract tests, and once a night we'll run our system tests. I have yet to tackle some other kinds of testing needs such as performance, usability, or security, but we already have a worst-case solution: spin a dedicated environment for that kind of test and get the feedback just slightly slower than the rest of tests. I'm hoping that we'll be able to move at least some of those into the smaller parts of testing (such as having a performance test for each component separately), but only time will tell.

I expect that we'll leave some gaps in the products that already exist and are used in this new projects, but hoping this gap will be managed in a good enough manner.

One question that popped up in my head as I was writing - is this approach what I would choose for similar applications that don't have the major constraint on environments? From where I'm standing, the answer is yes - even if it's easy and cheap to spin a lot of test environments, testing each component in isolation and focusing on pushing tests downwards simply provides a lot of power and enables kinds of tests that would be really difficult on a system-wide context.

So, those are my thoughts on testing cloud native, service oriented applications. What do you think?

בדיקות מוצר מבוסס ענן

English

בחודשים האחרונים אנחנו בונים מוצר שהוא, כפי שמקובל לומר, קלאוד-נייטיב, יעני, בנוי בארכיטקטורה שמנצלת את הכלים השונים בענן מראש. אני קיבלתי את המשימה "בוא תבין איך אנחנו הולכים לבדוק את המוצר הזה, עם כל המגבלות שפיתוח מוצר ענן מכיל עבור מוצר כמו שלנו". האמת? היו פה כמה עניינים מורכבים למדי, ובעוד שיש לא מעט חומר בחוץ על כלים שיכולים לעזור עם בעיות ספציפיות, לא הצלחתי למצוא בשום מקום הצעות לגישה הוליסטית לבדיקת מוצרי ענן, משהו שאפשר לבנות בעזרתו אסטרטגיה. אני מקווה שהפוסט הזה יהיה הצעה אחת כזו. אני אפילו יותר מקווה שאצליח להרגיז מספיק מהקוראים כדי שחלקם יכוונו אותי למאמרים קיימים שפספסתי רק כדי להראות לי שאני טועה.

הקשיים בהם נתקלתי, שאולי יהיו רלוונטיים גם לפרוייקט שלכם היו:

המוצר שלנו מתייחס ל"חשבון AWS" בתור יחידת העבודה הבסיסית שלו, כי המוצר שלנו מגן על הנתונים שבתוך חשבון מסויים. כלומר, כל פריסה עצמאית של המוצר צריכה שיהיה לה חשבון AWS משלה, מה שגם לוקח לא מעט זמן לפרוש, וגם משאיר "שאריות" (חשבונות מחוקים נשארים לאיזה 90 יום), כלומר, היכולת להרים סביבות בדיקה מלאות על ימין ועל שמאל לא ממש ריאלית.
המוצר הוא מוצר ענן - יש בו כמה רכיבים שמתקשרים אחד עם השני דרך הודעות טקסטואליות, הרכיבים האלה יכולים להיות מפותחים על ידי צוותים שונים ונמצאים בכמה מאגרי גיט נפרדים (איך לתרגם repository?), מה שעלול להוביל לכך שעבודה על רכיב אחד תשבור דברים בהמשך הדרך.

אז, איך אפשר לדאוג למשוב אפקטיבי בתוך תהליך הפיתוח? זה כמובן לא מאוד עוזר שעדיין לא הגדרנו את תהליך הפיתוח, אבל אנחנו לא פה כדי לעשות לעצמנו חיים קלים, נכון? מעבר לזה, זו גם הזדמנות להגדיר את תהליך הפיתוח בצורה שתאפשר את המשוב האפקטיבי.

הגישה שהכי מצאה חן בעיני היא זו שמצאתי בספר של דייב פארלי (כתבתי עליו כאן) שאומר ש"אנחנו לא מקבלים לבדוק את המוצר כולו לפני שהוא נפרש לשטח". המערכת שלנו מחולקת לרכיבים ניתנים לפרישה עצמאית (כלומר, אפשר לשחרר גרסה של כל אחד מהם בלי להתחשב ברכיבים האחרים) וכך אפשר לעבוד מהר ובצורה בטוחה.

יש לגישה הזו רק חיסרון אחד קטן - אין שום סיכוי בעולם שאני אצליח לשכנע את הארגון שזו הדרך הנכונה, ויש אפילו פחות סיכוי שאצליח ליצור את השינויים התשתיתיים, המחשבתיים והתרבותיים כדי שיהיה אפשר להשתמש בגישה כזו. אבל, אולי אני יכול להשתמש במגבלות הקיימות בפרוייקט כדי לעשות צעד מסויים בכיוון הזה.

אחרי כל ההקדמה הזו, איך אני חושב שנכון לגשת לבדוק מוצר כזה?

אם אני הגון, אני חייב להודות שאין שום דבר חדשני בגישה שלי - אלה בסך הכל כמה דגשים על פירמידת הבדיקות, עם העמדת הפנים "זה לא שאני לא רוצה לעבוד במודל הגלידה שאנחנו מפעילים בכל המקומות האחרים, זה שאין לי מספיק סביבות בדיקה בשביל זה".

בקיצור, קו ההגנה שאני מנסה למכור לקולגות שלי הוא זה:

אל תדאגו, תהיה ריצה של כל המערכת יחד. פעם בלילה, והיא תכסה רק תרחישים נטולי שגיאות, או כאלה שאי אפשר יהיה לבצע ברמות הנמוכות יותר. מטרת הריצה היא לענות על השאלה "אנחנו חושב שכל הרכיבים שלנו מנגנים יחד, האם יש משהו פטאלי שהחמצנו"?
קו ההגנה הראשון שלנו הן בדיקות היחידה. כשעובדים עם AWS יש ספריית mock אמינה יחסית בשם moto שמאפשרת לנו להריץ בדיקות יחידה מול "ענן" שרץ בזיכרון. הספרייה לא מספקת את מלוא היכולות, אבל היא עושה עבודה די מרשימה. כאן אנחנו נוודא שהלוגיקה שלנו נכונה, שיש טיפול הולם בשגיאות, וכל מה שרק נצליח לחשוב עליו.
מעל השכבה הזו נריץ בדיקות רכיבים. אנחנו אולי לא יכולים לפרוש את כל המערכת בכל פעם, אבל אנחנו לגמרי יכולים לפרוש כל מיני חלקים ממנה במקביל. כך נוכל לוודא שכל רכיב מצליח לתקשר עם תשתית הענן עצמה ולוודא שהתחליפים שהשתמשנו בהם בבדיקות היחידה אכן נאמנים למה שקורה בענן עצמו. נוכל גם לבדוק חלקים שלא לגמרי ניתנים לבדיקה ברמת היחידה, כמו למשל לוגיקה שנמצאת באופן בו אנחנו מגדירים את המערכת על AWS ולא ממש בקוד שלנו. כך גם נוכל לראות אם הקוד שאנחנו מנסים להריץ באיזושהי Lambda מנסה למשוך תלויות שאין לו. יתרון נוסף שאנחנו מקבלים הוא שזה מכריח אותנו לכתוב את המוצר באופן מודולרי, כך שכל רכיב יהיה ניתן להתקנה בפני עצמו (אמרתי צעדים קטנים בכיוון, לא?)
החלק האחרון בפאזל הוא בדיקות חוזה. כאמור, יש לנו כל מיני רכיבים שמתקשרים זה עם זה באופן אסינכרוני דרך הודעות טקסט - במקום לחכות עד שהם יהיו פרושים יחד כדי לראות מה שברנו בשינוי הקוד האחרון שלנו, אנחנו יכולים להריץ בדיקות חוזה בעזרת כלי בשם Pact. הגרסה הפייתונית של הכלי קצת מוגבלת כרגע, אבל כל מיני אנשים עובדים על זה במרץ, וממילא הכלי מכיל את רוב היכולות שאנחנו צריכים.

או, במבט ממעוף הציפור - בזמן pull request אנחנו רוצים להריץ בדיקות יחידה, בדיקות רכיבים ובדיקות חוזה. אחת ליום נריץ את בדיקות המערכת כדי לקבל אישור שכל החלקים אכן מתחברים אלה לאלה. יש עוד כל מיני פערים שצריך להתייחס אליהם כמו למשל מה המקום הנכון לבדוק בו את ביצועי המערכת או אספקטים שונים של אבטחת מידע. במקרה הגרוע - תהיה לנו סביבה ייעודית עבור הדברים האלה. אולי לאורך הדרך נמצא פתרונות טובים יותר, אבל לא חייבים לפתור את כל הבעיות מראש.

מחשבה אחת שצצה לה כשכתבתי את הפוסט הזה הייתה "מה הייתי אומר אילו היה קל, מהיר וזול ליצור סביבות בדיקה מלאות? האם הגישה שלי הייתה שונה באופן מהותי? אמנם קשה לענות בכנות על שאלות היפותטיות, אבל אני רוצה לחשוב שהגישה שאני מציג כאן נכונה גם במקרה כזה. נוסף על המהירות היחסית של הבדיקות הקטנות ביחס לבדיקות המערכת המלאות, בבדיקות הקטנות יותר יש לנו גישה ושליטה שקשה מאוד להשיג באופן אחר. חשבו למשל על מקרים של race condition - בבדיקות יחידה אני יכול פשוט לדמות את התחרות בכל אחת מהתוצאות, במקרים אחרים יהיה לי קל מאוד ליצור מצב בו אחד הרכיבים שולח הודעה לא צפויה אל הרכיב הבא בשרשרת (נניח, אם אנחנו מבקשים לסרוק קובץ, אבל עד שמגיעים לטפל בהודעה הקובץ כבר נמחק).

בכל אופן, אלו המחשבות שלי לגבי הדרך הנכונה לבדוק מוצרים מבוססי ענן, מה דעתכם?

Wednesday, May 22, 2024

Best intentions

One of the more difficult challenges in coding is to capture intent within your code. A code, naturally, is great at "what", but it takes some real effort to have also some of the "why" in it. There are a lot of tricks that can help - good variable names, breaking stuff into methods, modules and packages, domain driven design, and when all else fails - we might add a comment. Still, it's well too common to read a piece of code written by someone else (or by you of the past) and wonder "why on earth is the code like this?" When writing tests, and especially system tests, it's even more important.

I got a reminder for that recently. When testing some specific feature of malware detection, some of the files started to fail, and after some investigation we've found out that most of them were blocked because they contained a file that was malicious in a different way than the one expected by the test, and another file is now no longer considered malicious after some global configuration has changed.

The easiest way to fix this is to just remove those offending files and forget about it, but then come the questions -

Why are those files there?
Are there different aspects to the different files that are relevant to the tested feature?
why are we using multiple files to test what looks like a simple feature? are there complexities we're unaware of?
If I want to replace the test files - what properties should they have?
How to prevent this from happening in the future for the rest of the files?

Naturally, the exact time in which this has all happened was in the most inconvenient time - there was a pressing deadline blocked by this test, the person who wrote the test was in vacation, and everyone were wondering what is going on.

Looking back on the situation, I can see some mistakes we've made when writing this test, some of them we've talked about during the review and mistakenly dismissed, others we missed altogether:

The test was actually testing more than one thing. Due to careless choice of test data, we chose files that participated in some flows different than the one we intended to test, when the configuration around those flows changed, our test was sending us false failure signals
We failed to control our environment - we have a limitation (which we are aware of, and for the time being - accept) about some global configuration that can be updated outside of our team's control. We ignored the impact it might have on our test.
We didn't do a deep enough analysis: we had some files that each exposed a different kind of bug during development, but instead of understanding the root cause and what was actually different in those files, we just lumped everything together.
We were not intentional in our testing - instead of understanding the feature and crafting input data to challenge the different parts of our model, we just took some "real" data and threw it on our system. In addition to now not knowing which files would be a suitable replacement, we also have no idea how complete or incomplete our testing is.
Our files are not labeled in a way that conveys intent - they are just called "file 1", "file 2" and so on (the actual name is also mentioning the feature's name, but that's about all the extra data there)
Finally, our assertion messages proved to be less helpful than they should have - in some cases, not even mentioning the name of the file used (for time reasons, we decided to run multiple files on the same test, which we normally avoid)

So, we have some cleaning up to do now, but it's a good reminder to put more care about showing our intention in code.

עניין של כוונות

English

אחד הדברים הקשים יותר בפיתוח תוכנה הוא הבעת כוונה בעזרת הקוד. באופן טבעי, קוד מספר לנו "מה" נעשה (ואם כתבנו אותו היטב, הוא עושה זאת באופן ברור), אבל נדרש לא מעט מאמץ כדי להכניס לתוכו את התשובה ל"למה". יש כל מיני כלים שיכולים לעזור - שמות טובים, חלוקה לפונקציות ומודולים, פיתוח מונחה תחום (Domain Driven Design), ואם שום דבר אחר לא עובד - מוסיפים הערה בתוך הקוד. ועדיין, להכניס כוונה לתוך קוד שחייב להכיל הוראות ביצוע זה לא פשוט ואולי אפילו לא תמיד אפשרי. אחת החוויות הנפוצות היא לקרוא פיסת קוד של מישהו אחר (או של עצמי מלפני כמה חודשים) ולתהות "רגע, למה הקוד עושה את זה? זה בכוונה?". היכולת הזו של לספר על כוונה בתוך הקוד אפילו חשובה יותר כאשר כותבים בדיקות. למה? כי אתם יודעים שהפעם הבאה בה תסתכלו על הקוד הזה יהיה כאשר משהו ישתנה והבדיקה תישבר - זה כי הכנסנו באג או כי הדרישה השתנתה?

לאחרונה, קיבלתי תזכורת לעניין הזה. בזמן הבדיקה של זיהוי סוג מסויים של נוזקה - הבדיקה רצה על כמה וכמה קבצים, ורק חלק מהם התחילו להתנהג לא יפה - חלק נחסמו אבל מסיבה שונה לחלוטין, וחלק אחר פשוט הוכרזו כקבצים חפים מפשע. אחרי חיטוט לא מאוד קשה, גילינו את הסיבה - חלק מהקבצים הכילו קבצים אחרים שנחסמו מסיבה אחרת לחלוטין, וחלק אחר נחשבו לתמימים בעקבות שינוי קונפיגורציה (בחלק שלא נשלט על ידי מערכת הבדיקות, בינתיים). בסך הכל, חדשות טובות - זה לא באג במוצר, רק שימוש בנתונים לא מתאימים. עכשיו, איך לתקן את הבדיקה בחזרה?

לא משנה באיזה פיתרון נבחר, ברור שצריך להעיף את הקבצים הבעייתיים. אבל אז עולות כמה שאלות -

למה אנחנו מריצים כאן יותר מאשר קובץ אחד? האם כל קובץ מייצג מחלקת שקילות שונה, או שפשוט בחרנו כמה דוגמאות אקראיות שתמיד צפויות להתנהג באותו אופן?
האם יש לקבצים שנכשלו תכונות מיוחדות שאין לקבצים האחרים בבדיקה הזו?
האם אנחנו צריכים להחליף את הקבצים שהסרנו? אם כן, במה?
איך למנוע מהתקלה הזו לחזור על עצמה שוב?

כמובן, תקלות כאלה לא מתרחשות בזמנים רגועים. זה קרה בדיוק כשהיינו צריכים לשחרר גרסה (דחוף! עכשיו! אנחנו כבר באיחור!) ומי שכתב את הבדיקה הספציפית הזו היה בחופש של שבוע, כך שלאף אחד לא היה מושג איך לענות על השאלה הזו. הבעת כוונות, כבר אמרתי? כדאי שנשים לב - זה מעלה עוד קושי שלא הזכרנו קודם: כשאני כותב קוד, אילו כוונות אני צריך להביע? על אילו שאלות ארצה שהקוד יענה? למשל, הקוד הנוכחי ענה מצויין על השאלה "איזה פיצ'ר אנחנו בודקים", וברור לגמרי שהוא לא צריך להביע את כל מה שנכתב בתיאור הפיצ'ר. למרות זאת, בדיעבד, היו כמה דברים שיכולנו לעשות אחרת:

קודם כל, המבדק שלנו בדק יותר מאשר דבר אחד. בגלל שבחרנו לעבוד עם מידע "אמיתי" שלא הבנו עד הסוף, הקבצים שבחרנו עברו דרך כמה מסלולים נוספים שיכלו להפריע לנו. כשהשתנתה הקונפיגורציה מסביב לאלה - הבדיקה שלנו נשברה.
לא שלטנו מספיק טוב בסביבה שלנו, ולא הכרנו במגבלות האלה. יש לנו מגבלה (שאנחנו מכירים) סביב שליטה בכמה קונפיגורציות גלובליות, אבל התעלמנו ממה שזה עלול לעשות למבדק שלנו.
לא ניתחנו את הפיצ'ר טוב מספיק - היו לנו כמה קבצים שונים שחשפו באגים שונים במערכת בזמן הפיתוח אז השתמשנו בהם, במקום להבין את הסיבה להבדלים.
כתבנו את הבדיקות שלנו בצורה חסרת כוונה - הגישה הנכונה לסוג כזה של מבדק היא לנתח את הפיצ'ר, להבין אותו, ולתפור דאטה סינתטי (או לבחור קובץ אמיתי שמתאים בדיוק) שמכוון לכיסוי החלקים השונים במודל שלנו. במקום זה, לקחנו דאטה "אמיתי, מהשטח" וזרקנו אותו על המערכת, מה שגם אומר שאין לנו מושג כמה שלם הכיסוי שלני.
הקבצים שלנו לא היו מתוייגים בצורה שתבדיל ביניהם - בגדול, התיוג היה שקול ל"קובץ1","קובץ2".
ולבסוף, משהו קצת פחות קשור - הודעות השגיאה שהמבדק שלנו זרק היו פחות מועילות מאשר הן היו יכולות להיות. אפילו לא תמיד הזכירו את הקובץ הבעייתי (משיקולי זמן ריצה בחרנו לחבר כמה קבצים ביחד, אולי זו הייתה טעות נוספת)

בקיצור, יש לנו עבודת ניקיון לעשות, אבל זו גם תזכורת טובה על החשיבות של הטמעת כוונות בתוך הקוד שלנו.

Friday, May 3, 2024

Book review: Modern software engineering

This is a great book. It starts by presenting a thought-changing idea and then proceeds to a more familiar ground, painting it new (and better) in light of the first idea.

Let's start by ruining the surprise. Farley suggests that we've been using the term "Software engineering" all wrong, and this has been wreaking havoc on the way we think about creating software and on the way we work. He's claiming that contrary to a popular approach amongst professionals, software should not be treated as a craft, but rather as a field of engineering. No, you can put your automatic objections aside, not *this kind* of engineering. His twist, and what creates an "aha" moment is his observation that there are two types of engineering: production engineering, and design engineering. The first, which is the mold we've been trying to cram software creation into, is dealing with the following problem: Once we've designed a product, "how can we create it with high precision and economical efficiency?" The second type, as the name hints, treats the question of "how do we design a product in the first place?". When we think of engineering, we usually think of the former, which is quite different from the latter. Design engineering is a discovery process, and should be optimized for learning and maximizing the effective feedback we get from our efforts.

That's it. That's the one new idea I found in this book. Sure, it's more coherent and detailed than the short summary here, but this idea is quite powerful when we grok it. It also aligns quite well with "real" engineering - once we make the separation between design and production, it becomes evident that aiming for predictability and repeatability is just irrelevant and even harmful. The author even points to some of the physical engineering endeavors such as SpaceX choosing the material for their rockets' body, where after doing all of the calculations and computer simulations, they still went and blew some rockets to see if they got it right (or, as it is more properly stated - they experimented to test their hypotheses and to gather some more data).

Once the reader is convinced that it's appropriate to think of software creation as proper engineering field (or abandoning the book), the rest is advice that we've heard a million times about good practices such as CI\CD, TDD, and some design principles such as separation of concerns and cohesion. The only thing new is that now we have the language to explain not only why it works in the examples we can share, but also why is this the right approach initially. If we are participating in a design engineering effort, it follows that our main obstacle is complexity. The more we have to keep in our heads and the more our system does, the harder it is to understand, change and monitor. To deal with this complexity we have two main strategies that work well in tandem: Speed up our feedback and encapsulate complexity behind a bunch of smaller modules.

As can be expected for such book, it refers a lot to "quality", which is a term I gave up on as being unhelpful. Following "Accelerate", the author has a nice way circumventing the problem of defining quality. More specifically, he refers to two measurable properties - speed and stability - as a "yardstick" which is not the ultimate definition of quality, but rather the "best we currently understand". I like this approach, because it provides a measurement that is actionable and has real business impact, and because it helps counteracting the gut-instincts we have when we use poorly defined terms taken from other fields (or, as I might say after reading this book, from production engineering).

There are some points mentioned in the book that I believe are worth keeping in mind, even if you don't get to read the book:

The main challenge in software is to manage the complexity of our product. Both the complexity of our business, and that which is created by the environment our software operates in.
In order to have great feedback from evaluating our system, we need precise measuring points, and high control of the variables in place.
Dependency and coupling is causing pain. It doesn't matter if it's a piece of code that depends on another, or two teams that needs to synchronize their work. While it can't be reasonably avoided, it should be managed and minimized.
You don't get to test your entire product of dozens of microservices before production. Deal with it, plan for it. Trying to do otherwise will make your inter-dependency problem worse.

One thing I found a bit odd was the claim that unit tests are experiments. For me, this is the place where the analogy breaks a bit. An experiment is something meant to increase your knowledge and usually in order to (hopefully fail to) disprove a theory. The theory "I did not break anything else when writing my code" is not the kind of theories I would consider interesting. If old tests are related to experimenting (and I can probably accept the claim that new tests are sort of an experiment), they are more like the measurements taken when manufacturing something, after the design is done we still run tests as part of quality control, and still measure that we've put everything exactly in place. Calling old unit tests an "experiment" sounds a bit pompous to me. But then again - it's OK if the analogy is imperfect - software engineering is a new kind of engineering, and just like chemical engineering is different than aerospace engineering, not everything can fall exactly into place. This analogy does tell a compelling story, and that can be more valuable than accuracy.

All in all, I highly recommend anyone dealing with software to read this book.

Saturday, March 9, 2024

You're stupid!

עברית

you are stupid, and the sooner you'll accept it, the better.
But don't worry, your colleague is stupid as well, and so am I. And while it's true that some people seem to be stupider than most others, I want to focus on the regular level of stupidity.

Recently, I've been mentoring some teammates that suffer from the same problem - it takes them forever to complete tasks, especially those that involves coding. They can do it, but getting started takes forever, and code review frequently requires a large portion of rework. When we dove into it, we found that the reason was similar as well - they were over analyzing (and therefore not sure how to start), and were getting lost several times during coding. Its just that there are so many details to take care of.

To deal with it, we're starting working on what is probably the most fundamental programming skill - top-down design (which should not be conflated with TDD, even though one might argue that test-driven design is a specific form of top-down design). I asked them to practice consciously and force themselves to implement their code in a pure top-down approach, and commit every time a level was "done". So they did, and when we got to review it, I saw the problem - they were making shortcuts, adding something they "knew would be needed" and making a mental mess in their head.

The power of top-down approach is that you first decide on a way something will be used, and only afterwards understand what needs to be done in order to actually make it happen. It's not a panacea, but for this kind of problems, its just what the doctor ordered. There are a lot of tiny details that needs to be taken care of, and working within an already existing framework complicates manners even more as you need to adopt the mindset suitable to it as well.

When we stop to think about it, this approach benefits are not limited to coding. It can help with all sort of analysis and planning problems as well. Just like with recursion - there's some sort of magic in that we solve our problems by pretending they are smaller. It almost doesn't matter how complicated is the task we are trying to perform, it can probably be broken down to no more than five high level steps, and those steps can be broken down as well, until we get to a problem that is already solved.

as I was writing it, I started wondering whether this sort of thinking is required as a precondition to TDD, as the mindset seems pretty much similar. Perhaps it's the other way and TDD can be a way of adopting top-down thinking, but whatever the case, I believe it's an important tool to have in your toolbox. After all, we are too stupid to keep the entire problem we're working on in our head.

את טיפשה! וגם אתה!

English

אתם טיפשים, וכמה שתקבלו את העובדה הזו מהר יותר, ככה עדיף לכולם.
אבל זה לא נורא. גם הקולגות שלכם טיפשים, וגם אני. וכן, יש אנשים שהם טיפשים יותר מהאחרים, אבל בפוסט הזה אני רוצה לדבר על סתם טיפשים רגילים.

לאחרונה יצא לי לחנוך כמה קולגות שסובלים מאותה בעיה - לוקח להם נצח לסיים משימות, במיוחד כאלה שמערבות קוד. וזה לא שהם לא יודעים לכתוב קוד, זה פשוט שלוקח להם נצח להתחיל, ואיכשהו, עד שהם מסיימים את המשימה אז בזמן סקירת הקוד יש מיליון הערות ששולחות אותם לעשות חצי מהעבודה מחדש. כשצללנו לעומק, גילינו שהסיבה דומה - הם חשבו יותר מדי ונתקעו בניתוח אינסופי של הבעיה עד שהם כבר לא ידעו היכן להתחיל. וגם אם התחילו, תוך כדי קידוד הם היו נתקעים בניסיון להבין איך נכון לכתוב את הכל כך שיתחבר לכל המארג השלם. המון פרטים שצריך לדאוג להם כל הזמן, פלא שהם הולכים לאיבוד?
כדי להתמודד עם הבעיה, אנחנו מתרגלים את אחד מכישורי התכנות הכי בסיסיים - Top down design (ולא, זה לא מה שמתייחסים אליו כשכותבים TDD, למרות שהעקרונות דומים באופן מפתיע). ביקשתי מהם לבצע את משימת הקידוד הבאה שלהם באופן מודע - לנצל את גיט ובכל פעם בה הם מסיימים לכתוב "רמה" של הקוד, להכניס את השינוי. ככה יכולנו לעבור על רשימת הcommits ולראות איך הם ניגשו לבעיה. זה הפך את תמונת המצב לבהירה יותר - למרות הסיוע שהם קיבלו ממשימה מוגדרת שמכריחה אותם לחשוב "מלמעלה", הם כל הזמן עשו קיצורי דרך, כי "הם יצטרכו את זה בהמשך". כמובן, כשהם הגיעו להשתמש במה שהם כתבו, צורת השימוש הייתה עקומה להחריד וכפתה עליהם לכתוב קוד לא נוח כדי להתמודד עם מה שהם כתבו קודם. אבל, הנזק המשמעותי יותר היה שהם כל הזמן עבדו עם מודל מנטלי של כל המשימה ועשו לעצמם סלט בראש.
היתרון המשמעותי של כתיבת קוד מלמעלה הוא שקודם כל מחליטים מה צריך לעשות ואיך משתמשים בזה, ורק אחר כך צריך להבין איך נכון לגרום לזה לפעול. זה לא קסם, וזה לא יפתור את כל בעיות העולם, אבל זה בדיוק הפיתרון הנכון לסוג הזה של בעיות - על ידי מיקוד בצעד הנוכחי בלבד אנחנו יכולים להתרכז במשימה שלנו, ולא לתת לאלפי הפרטים הקטנים שבאים יחד עם תשתית הבדיקות שלנו להפריע. ככה אנחנו גם יודעים שהפיתרון שלנו שלם ולא שכחנו משהו בתוך כל הבלאגן. בכונוס אנחנו גם מקבלים קוד שנראה יפה יותר, עם חלוקה טובה יותר של תחומי אחריות, אבל זה רק בונוס.

כשעוצרים לחשוב על זה, צורת המחשבה הזו רלוונטית לא רק לבעיות קוד, היא יכולה לעזור לנו עם מגוון רחב של בעיות תכנון ואנליזה. כמו ברקורסיה, יש כאן קסם - כל מה שאנחנו עושים הוא לחשוב על "איך להפוך את הבעיה שלי לבעיות קטנות יותר?" ואז חוזרים על התהליך שוב ושוב ושוב עד שאיכשהו, הבעיה פתורה.

תוך כדי כתיבה, התחלתי לתהות אם סוג החשיבה הזה הוא תנאי מקדים לעבודה בTDD, כי, כמו שהזכרתי כבר, העקרונות דומים למדי. ואז, רק כדי לבלבל, עלתה האפשרות שאולי אחת הסיבות בגללן TDD היא אפקטיבית כל כך היא שהיא כופה תכנון מלמעלה ובעצם מאפשרת לנו להתאמן גם על הכישור הזה. בכל מקרה, בין אם כך ובין אם להיפך - זה בהחלט כלי שכדאי שיהיה בארגז הכלים שלנו. אחרי הכל, אנחנו בהחלט טיפשים מכדי להחזיק בראש את הבעיה שאנחנו מנסים לפתור במלואה.

Monday, January 1, 2024

The tracability fraud

עברית

When you talk about testing, it's just a matter of time until someone will mention the "gold standard", the one thing testers should be striving towards - full traceability. Perhaps by coincidence, the likelihood of someone mentioning it is negatively correlated with the amount of testing experience they have. The story is a compelling one - by some testing magic, we can map each test to the original requirement it came from, and when a new feature is introduced, all we have to do is to go to that requirements tree and find all the tests that are affected or that have relevance to that feature. Nifty, right? What is rarely mentioned by said people is the cost of creating and maintaining such a model, compared to the expected benefit, which, in the current, speedy development cycles, is even lower than before.

The first difficulty rises when we think about structuring our requirements - It probably have some sort of hierarchy in order for us to be able to find anything, and choosing the "right" kind of hierarchy is quite difficult. For instance, let's imagine a simple blogging platform, such as the one hosting this very post. It seems reasonable to assume that we would split the functionality between the blog management and visitors, as they provide a very different experience. Perhaps something like this:

Now, where, in our model would we place the requirements for comments? They can be seen and written by visitors reading each post, so it makes sense that it would reside there, perhaps even under the "view post" section. However, they also can be read and approved by the blog admin, so why not put it there? Should we have two "comments" categories? one for each branch of the tree? perhaps we should have a third category of "cross responsibility features", but that would only confuse us. We can also restructure the requirement tree to have the various components as root elements, like this:

Now we have solved the question of "where to put the comments section, but does it mean that we'll have to duplicate for each of the public facing components the "this is how the admin interacts with it, this is the user's view"? It's not that one view is more "correct" than the other, but rather that one will be easier for us and will match the way we think of our product.

Please note, this was the easy example. How would we represent here global requirements such as "always keep the logged in user session until they log out"? or "all pages must conform to WCAG level 2 (AA)" ?

So, it's a challenge to build this, but that's ok - we don't have to get it perfect, just good enough, and we'll roll with the punches. Let's consider the next challenge: Do we version this tree? how do we compare two versions? Do we just keep the latest version? do we freeze it every release and save a copy? can we compare two versions in any meaningful way?

And, since we are talking about managing the requirements tree - who is maintaining it? Is the person adding requirements to the product doing so? Each time I've heard this idea it was a meant to be a testers only effort, and requirements for new features are not written using this sort of structure. In my previous work place we've tried to maintain such a tree, and each time we'd start a new feature one of the testers had to go and translate the requirements one by one to the proper place on the tree, sometimes breaking a single requirement to multiple entries in the tree. This tree, of course, was unofficial.

The next part, tracing tests to requirements, is rather easy - IF you are using a heavy weight test management tool such as MicroFocus's QualityCenter of SmartBear's QAComplete that allows you to link requirements to specific tests (There are probably other products as well, I don't keep track). If you do use such a product, you probably already feel the cost of using it. You'll still have to wonder how to represent "we cover the requirements to support mobile devices and 30 languages", but let's assume it's not that terrible to do. If you do the sensible thing and don't bother with maintaining a repository of test-cases that go stale even before they are written - congratulations, if you want this traceability to happen you now have the extra burden of manually maintaining that link as well over feature changes, refactoring, and what's not.

I hope that by this point I managed to convince you that it's expensive to keep this tree up to date, but even so, it might still be worth the price.

So, let's look at the promised return on investment, shall we?

Let's assume we have all of the product requirements laid out in front of us in a neat tree without too many compromises. We now add a new feature, say, we enable a\b experiments for posts. It touches the post editing, post viewing, analytics, content caching, users and comments, and it might have some performance impact. We've learned all of this in a single glance on our requirements tree and so we also know which tests should be updated and which requirements (new or old) are not covered. We invest some time and craft the tests that are covering our gaps, not wasting time re-writing existing tests simply because we didn't notice they are there.

How much time have we saved? Let's be generous and assume that once we added the new requirements to our tree, we know exactly which requirements are affected, and which are not covered at all. Now we go, think of appropriate tests, add them to our tree, update our automation suites at the appropriate levels, run the attended tests we want and call this feature done. Compare this to working without our marvelous tree: We spend some time analyzing the feature and requirements, then dig around our test repository to see what we have and is relevant, then we do exactly the same steps of planning, updating and executing tests. We saved ourselves the need to dig around and see what we have and is relevant, which could amount to some significant amount of time over multiple features, right? Well, if we replace testers each feature - that's the case. But assuming we are working on the same product for a while, we get to know it over time. We don't need to dig through hundreds of lines, we can ask the person who's been here for a couple of years and be done with it in 10 minutes. Those 10 minutes we "lose" each time are small change compared to the increased analysis time (since we're updating a model, effectively duplicating the requirements as part of the analysis). So, even in theory when all goes well we don't really save time using this method.

Perhaps the gain is in completeness and not in time? There is a lot of value in knowing that we are investing our effort in the right places, and it is far less likely that we'll miss a requirement using this method. In addition, we'll sure be able to provide an exact coverage report at any time. We could even refer to this coverage report in our quarterly go\no-go meeting when we decide to delay the upcoming version. What's that now? We don't do those things anymore? Fast cycles and short sprints? Small changes and fast feedback from customers? The world we live in today is one where most of our safety nets are automatic and running often, so we can see where they break, and since we're doing small chunks of work that are easier to analyze we can be quite confident in how well we did our job. What we miss is caught by monitoring in production and fixed quickly. So, perhaps this value, too, is less important.

And what if we don't work in a snazzy agile shop and we do release a version once in a never? In this case we are probably trying to get to faster release cadence which, in turn, means that adopting a method that drives the wrong behavior is not something we desire and we want to minimize the effort we put into temporary solutions.

There is, however, one case where such traceability might be worth pouring our resources into - in a heavily regulated life critical products. From avionics to medical devices. In such cases we might actually be required to present such traceability matrix to an auditor, and even if we won't, failing and fixing later is a bad option for us (or, as the saying goes: if at first you don't succeed - skydiving is not for you). Such heavy regulatory requirements will affect the way the entire organization is operating, and this will remove some of the obstacles to maintaining such a tree. Whether we'll use it for purposes other than compliance? I don't know. I also don't know whether there are alternatives to fulfilling the regulatory needs with a lighter process. At any rate, when we endeavor to perform such a herculean task, it is important that we know why we are doing it, what we are trying to achieve and whether the cost is justifiable.