Wednesday, July 3, 2024

I have released an open source library


In my past two jobs we always talked about collecting some tests metadata to analyze later, but never had time to collect it. Well, I had a few moments between jobs to create an easy to extend (I hope) library for pytest. It might or might not help me in future workplace, but I'm hoping it will help someone. 

Taking the perspective of building a library for unknown users was an interesting experience - I obviously needed to put more emphasis on documentation, but more importantly, I needed to think "how can a user customize this for their needs?" One result is that I chose to omit functionality - for instance, I'm not providing a database storage utility, but leaving that for the user who knows which DB and schema are they using. I still consider creating an example project using this library, but we'll see if I have time and attention for that. 

Anyway, its free to use, modify and so on. If you need help, ping me (or better - open an issue)

PyPi: https://pypi.org/project/pytest-stats/ 

Github: https://github.com/amitwer/pytest-stats

יצרתי ספריית קוד פתוח


 English

 בשני מקומות העבודה האחרונים שלי נורא רצינו לאסוף נתונים על המבדקים שאנחנו מריצים כדי שנוכל לעשות אנליזה רוחבית אחר כך, אבל אף פעם לא היה לנו את הזמן לבנות את התשתית לזה. ובכן, היה לי קצת זמן בין העבודות, אז בניתי תשתית איסוף נתונים עבור pytest. האם זה יעזור לי בעתיד? אלוהים גדול. אני מקווה שזה יעזור למישהו. 

לבנות ספרייה עבור משתמשים שאין לי מושג מי הם זו חוייה מעניינת. מעבר לצורך הברור בתיעוד טוב יותר, פתאום מצאתי את עצמי חושב על "איך המשתמש יכול להתאים את זה לצרכיו?" אחת התוצאות היא שזרקתי החוצה חלק מהפונקציונליות. למשל - לא כתבתי מימוש מחדלי לשום מסד נתונים, כי אין לי מושג איך המשתמש ירצה שדברים ייראו אצלו, באיזה סוג מסד נתונים הוא משתמש או אילו נתונים הוא ירצה לשמור. תופעת לוואי של זה הייתה פרוייקט פשוט יותר למימוש - אני דואג לאסוף את הנתונים, והכתיבה היא בעיה של מישהו אחר. אולי בעתיד אצור פרוייקט לדוגמה שישמור נתונים איפשהו רק כדי להדגים איך עושים את זה, אבל נראה אם יהיה לי זמן או קשב כדי לעשות את זה.

בכל מקרה, מוזמנים להשתמש בעצמכם. אם אתם צריכים עזרה, צרו קשר (או, עדיף, פתחו באג\שאלה בגיטהאב).

PyPi: https://pypi.org/project/pytest-stats/

Github: https://github.com/amitwer/pytest-stats


Tuesday, July 2, 2024

closing 5 years, retrospect



After just a bit more than five years, today was my last day at Deep Instinct, and that's a great time for some reflection. I've seen some good, I've seen some bad, and I managed to learn from both. It's a bit daunting to try and pack five whole years into a single post, so I'll just paint an image in wide brush strokes. It will be inaccurate, and I'll miss a lot, but it is what it is. 

Also, it will be long, bear with me. 

Year 1

This was a year of growing, and of adjusting expectations. it was my first time working in a start-up, with only one previous workplace, and I was up for a surprise. There were no working procedures, minimal infrastructure, and the very strange part - it seemed like everyone were ok with that.  In this year I learned how does it look to have a project that can make or break the company, I learned that building communication channels takes a lot longer than I anticipated, and that what I believed were industry standards were not necessarily common. 

Our main focus on this year has been to build a team and to create the necessary infrastructure for system testing our products. So we spent some time hiring (note for readers - if you aim above the standard skill-set in your area, you have a long and arduous process ahead of you), and I found out that with more than 5 junior people to shuffle around (I wouldn't really call this mentoring) I was overburdened.  I came with an idea on how should our framework look like, and was lucky enough to have someone in the team come up with a better suited idea. There was a lot of work done on the technical side, and not a lot done on integrating with the other teams or the product work. We simply were not yet ready. by the end of this period we had a working system test framework, a team of ~12 people, and we've proven our value to the organization.

Years 2 & 3

Those are not actually full calendar years, but it's a nice title for the period after we've made enough catching up and focused on integrating with our environment. Some of this effort was being done during the first period, but this was more in the way of laying some foundations for the future. Now we've started turning up the notch on talking with other groups. This is where we've started feeling the missing procedures and culture around us - getting invited to a design review is quite easy, but getting invited to a design review that does not happen is a different thing altogether. We've hacked around and compromised a lot - starting with just asking for someone to give us some sort of a handover in lieu of the design review would be one such example. In other places, where we did join the table, we made sure to provide value so that we will be invited again, and we volunteered to help other groups use our infrastructure for their end, partially to gain a reputation boost (but mostly because it was needed for the company). We had a great opportunity when a new product was starting from scratch (ish) and we provided them with a dedicated person. We then used this team as a model to say "this is how we think we should work", thus initiating the third phase of our 4 steps plan (worded a bit differently, I wrote about it here. Roughly labeling, the steps are stabilize, grow, disperse and disband) - we didn't actually complete the growing part: we didn't have a good enough infrastructure to share, and most of the people we had were not skilled enough in testing yet, but reality is messy, you don't get to complete things cleanly before moving on. 

We also struggled with our own growth - when our team size neared 20, and the number of products grew, we started feeling the pain of stepping on each other's toes and trying to focus on too many things. So we split into two teams, which in turn required us to split our code-base to match - a realization we got to a few months after the team split, since we now had people focusing on one task and we didn't want us distracted by noises from the other side of the split. Another aspect of this split was that I found out I'm a difficult employee - as part of the split my then manager became a group lead, and a team lead was recruited. At the point where she has arrived, all of the team were people I either recruited or welcomed to the team, and I was a nexus of knowledge about most of the things happening around our team. That is to say - while the authority was hers, I had way more social power. It took us a while to sync, and there was this one time where I used my excessive power and clashed with her in front of the entire team - I knew I was wrong a few minutes after doing so, but the damage was done. I got a good lesson on the difficulty of apologizing publicly, and later we've both synced and balanced our power in a more suitable way, so I'm guessing it all worked out in the end, but I hope I will remember this lesson for the future.
We have also faced another problem - A lot of people were leaving us to pursue other opportunities. There have been a lot of factors to this - we have hired mostly juniors out of university, we were a small start-up with a lot of changes, but there were two factors that bothered me. First, there was the feeling that in order to progress in one's career, one had to get out of testing. The second was that in our company's culture, testers are seen as a second class employees - I'm pretty sure no one will admit to this even to themselves, but it can be seen in all of the tiny decisions - who gets invited to early discussions, how often do people feel comfortable telling you how to do your job, who gets credit for work done, and so on. It ties back nicely to the first problem, but it was noticeable enough to deserve its own place. It took me a while, but the first problem has led me down the path that led to building an in-testing career growth model. The second part of the problem was one I did not address at all, and when I look back on, there were probably some actions I could have attempted. The way I see it, there are 3 main reasons for testers to be on the bottom of the food chain: First is the reputation of the profession in the outside world, which is beyond me to change. Second is the fact that in most places, testers who write code are less capable than "proper developers" (which leads to a vicious cycle of hiring with low standards and reinforcing the conception of testers having inferior skills) and the third is that testing have no explicit contribution: outcomes such as risk reduction, placing safeguards, and maintaining test code are part of other roles as well, especially when actively pushing towards full integration of testing into the teams. I'm still bashing my head occasionally against this question, with recent thoughts influenced by "Wiring the winning organization" (A book review will come soon, I hope). 

All in all, there were a lot of challenges during this period, but it was a good time. 

Year 4

Year 4 has started on January 2nd, 2022. those with a keen eye and access to our HR system might notice that there are 2 months missing between my actual 4th anniversary on April 2nd and this date, but as I mentioned, the previous years have been so only in the most general way. In that date, we had a new VP engineering (it's formally named VP R&D, but we do have a separate research department) join us and replace the person who was filling this role at the time. 

 I learned one thing during this time - While it's true that Rome wasn't built in a day, it didn't take that long to burn it down. At first, I thought it's just a style of communication and trying to prove that he's the boss, but soon enough there were so many examples of bad leadership that I couldn't push down the fact that I was dealing with a bully. A few red flags for the future, and perhaps for the readers as well:

  1. Everyone who were here before the bully are stupid, as is every decision made.
  2. People that are still here need a firm sheriff  to teach them how to do their work.
  3. All decisions should go through the bully, if there was a decision made without him, it will be reversed.
  4. Listening is for other people.

I could see the damage really fast - the number of crises that required people to stay up late and to work on weekends skyrocketed, communication between teams was reduced, and everyone tried to make sure that when something goes wrong, they will have someone to point at and say "I did my part, so it must be those people over there". Add to that a replacement of 2 of the 3 group leaders within 3 months (one quit, the other was fired in the most insensitive manner), and it's no surprise that things were a mess. About a month after the VP has joined, he announced on two major projects - a complete rewrite of our SaaS product, and a shift to scrum. Both failed spectacularly.
Let's start with the technical project - As it is with many companies, there are more urgent tasks than there is time or people, so shifting people to work on the project would mean slowing down on the roadmap, which can feel fatal for a startup. Also, do recall point #2 - people other than the VP are incompetent by definition. We also have no real documentation of the system we are aiming to rebuild and no experience is creating good requirements or specification even for smaller projects. The solution? To hire an external company to build part of our core business components. Yes, it is stupid as it sounds. Yes, it was said in some circles at the time. Saying it to the VP would have been hopeless, and an invitation to be bulldozed. 

So, while this project is happening, the other project can't happen without the people in the company, right? Moving to Scrum is a people and processes task. Well, this has botched as well, and I wrote about it in some length already here. Since that time, the project has concluded and I got to talk with one of the consultants and asked him why some very basic things didn't happen, or at least mentioned as goals and his answer was very respectful and general, as he was masking the fact that the VP who hired them also blocked most of their initiatives (I did ask if I understood correctly, and got the "I can't answer this directly" kind of answer). 

So, why did I stay? At first, I was mostly shielded by my skip-level manager, the one remaining group leader. In July I approached him to tell that I have a job offer, but if he can promise me that he won't break in 6 months, I'm staying. His response was that while everything is dynamic and he can't make promises, at the moment he has no thoughts of leaving and is not looking around for alternatives. A month later he called me to tell that he got a job offer and is accepting it. Well, that's life, and I wished him well. Then I took my time - things were not terrible for me personally, so I thought to choose a place I'd be happy to. In January 10th 2023 I got an offer from a place that I thought could be good enough, but I took some time to deliberate. The 4th year has ended at January 15th 2023, when we were told that the VP has chosen to pursue other opportunities and will be leaving effective immediately. The next day I told the place that offered me a position that I'm staying. 

Naturally, there was a lot more going this year. For instance, I got to do some close mentoring with some people, which was very interesting - I worked formally with two people, one who joined us in order to stretch his abilities, and was very receptive to my attempts in teaching. I found out that I need to learn how to map the skills and gaps of my colleagues and then find proper way to teach those. For instance, I did manage to see that some of my mentees were struggling with a top-down code design, and then found that I'm not sure how to teach that (tried to force TDD, I think it worked to some extent), other skills such as modeling or communicating confidence - I don't think I managed to teach as much. 

It was also the year where I could put my knowledge in testing theory to use, even if a bit violently. The story was as follows: After my skip-level manager has skedaddled, another was brought in his place, though without either his skills, his care or his work ethics (though he does hold the record of being the fastest person to be fired that I've seen). Shortly after that, there was a crisis (did I mention there were a lot of those?) a product that has been developed for an entire version without any sort of testing or attention of testers (all of the testers were busy on the previous crisis) and without mitigating this problem by telling the people working "ok, you don't have testers, do your best", was now blocked by all of the tickets in a "ready for testing" state.  The new group lead, eager to show... something, has stated (or, perhaps, iterated the bully) "no compromises on testing, I want full test plans", which, given the deadlines we were facing was simply not possible, so I did some rough estimation and got to somewhere between 13 and 40 days just to write the test plans for the 42 features, let alone executing them, to which I got the lovely response "we need to be more agile". I'm still quite proud of not responding snarkily  "agile doesn't mean cutting corners and doing a shoddy work", instead I suggested that we'd use SBTM, which won't solve our problems of not having enough time, but will mean that we'll start working and finding problems faster. I might have exaggerated its benefits a bit, but I can now say that I got to win an argument only because I was more educated on testing than the other side and could slap some important looking documents at them (I have James Bach to thank for creating this document) , so yay me. Oh, also there was the part where we didn't waste a ton of time. 

Year 5

So, the bully has gone. There were some rumors around the cause of him getting fired, and sadly, "someone noticed he's menacing the entire organization" was not one of the options (in fact, another rumor had it that they actually had another VP coaching him on the bullying part, which, if true, saddens me greatly because it means his behavior was known). Now we had a chance to start a healing process and make up for the year of moving backwards (my feeling was that our culture went back to a state which is roughly equivalent to what was a year before the bully, but that's just a guess). We had several challenges to overcome - we had a beaten down department, we had lost a huge number of people (while I can't tell exactly the numbers, today, about half of our engineering department were hired in the year and a half after he was fired), the group leaders we had were hired specifically to match his management style - to relay his decisions and to have as little agency as possible and make no decisions themselves. Personally they were nice, but didn't have the skills to lead a healing process, and finally is the person who has replaced the bully,  that I think is a good manager, but took on an impossible task - managing two departments (he was, and still is, our VP of research) with over 100 people at the time, and having the style of a good manager, which is to delegate a lot to his direct reports, which would have been great had we the right people and the right structure in place. Add to that the market pressure as funds of the last round were slowly but decisively dwindling, and the result is that we had an absent VP. On top of all that, since last October we got a war in Israel, with many people, the new VP included, were called for months of reserve duty, risking their very lives defending the citizens here.

So, not the ideal conditions for healing. But, there were opportunities as well - before the bully left, he declared that we're moving to (finally) disband the QA department, sure - as always, it was done for the wrong reasons and seemed like he was going to do that the wrong way, but once he left we talked to the group leaders and understood he did not share with them more details than he had shared with us (not a lot, in case you've wondered, only a general statement), so we started discussing how to do this transition safely, and where should the responsibilities that were currently held by the test teams be. This task proved too much for our group leaders, so they decided to leave the status quo more or less as is, with some cosmetic changes perhaps, this left my manager to start and dedicate more of our team to specific products, which is a smaller step then what I aimed for, but at least it was in the right direction. What we both did not know at the time, was that the bully, after clashing with my manager who was defending the team quite admirably, has started the process of firing her, which while halted due to him being let go, did paint her as a troublemaker in front of HR and upper management. The difficult discussions around the re-org were the last straw, and despite the team rallying up behind her, she was let go. 

this was the point where I had to assume another role - the team's anchor. Several people approached me with concerns and wondered whether they should start looking for another place. I spent some time talking with them and trying to help them figure out what they need and how to get that here. I was feeling very ambiguous about this - on one hand, I wanted to join their rant and tell them I agree, on the other, I was the face of the system, and had a clear incentive of convincing them to stay. I was very careful not to lie, but there were some questions I had to dance around. There were some nicer aspects to this, though. Once I realized this is part of my role, I looked on the 2nd test team - we had a major problem there, a few months before my skip-level manager left, their team lead made a lateral move to another department, and the team was left without a manager. As long as my skip level was there, he was able to provide some guidance while looking for another team lead, but this effort stopped when he left, and the team of mostly juniors was left without a leader, and the few seniors there left or detached shortly after. So, now we have an orphaned team for over a year. People were leaving. I did a quick assessment of the people still there, and found that while most of the people there are ok, there is one that I really didn't want leaving, so I approached her and asked how she was feeling and what she was missing, which has led to us meeting weekly (ish) and me trying to be the professional authority she could learn from or consult with. That ended up being really fun.

On the work side, with a new VP, we finally started to work on the shadow project declared by the bully, only to find that, quite unsurprisingly, the contracting company who were now getting paid for over a year has managed to produce  some lengthy architecture documents that were not really suited to what we needed, and that the bulk of work still needs doing. Surprised? I wasn't. So, teams started scrambling in order to get up to speed, and the added difficulty of coordinating with a contracting team made it that much more difficult. I was brought in to assess the "testing" they have done, just to see some demo robot-framework tests done against a mockup, with no thought on the SDLC, and no one on the contracting team who even understood my basic questions. So it was up to me to devise a testing strategy for this new product, and for the modifications of the other parts that needed to change in order to support this rewrite. 

At any rate, reality was not waiting on us to sort our mess, with post-covid funding being harder to get and some major customers backing out of deals in the last moment, our company figured out that in order to stay in business, it is better to update the income forecast to be more on the pessimistic side, which meant a reduction in force - one of my mentees, whom I just told a week earlier that probably had a month to improve before termination was an option was the first to be named, and another, more appreciated employee was laid off as well. Again, I found myself as the go-to person for people disturbed by the change, and this time I could only tell them "the company claims that after this reduction we should be clear for at least a year, so if the only reason is job stability, keep an eye, but this need not be the fatal sign. As part of this, the company has pivoted and changed its market focus, so the entire rewrite project was called off, a year and a half too late. We returned to the previous initiative of a gradual upgrade of the current system, because, well, it made more sense. so new strategy and cool buzzword technologies? not anymore. At least this time when I did damage control I could be honest and say that my only problem with aborting this project was that it happened too late. 

Another thing that happened is that management declared that "we have a quality problem", and while they also said "quality is everyone's responsibility", the way they behaved  was more in the way of "we have a QA group, so they should patch quality in the end". We found out about this when we were told a "QA expert" consultant was hired and will be joining us. Yes, it was a slap in our face and ego, but we decided to let it pass and try to use this opportunity to further push our goals. This consultant, unlike the agile transition ones, was quite good - She did her research, did some coalition building, and started to look for ways to push for a change. The main difficulty? her hands were tied by the limited scope - fixing only the test team and not the surrounding problems we have could make a minor improvement at best. To make the problem worse, resource and time allocation were not really made (there were promises for time, but they were not scheduled or considered by the other projects) and we had a new test team lead (for the orphan team) that, at the very least, wasn't aligned with the rest of the company - but we still needed to push through his incessant resistance. 

I've learned a lot from the consultant. I learned how to take a vague goal and make it concrete and presentable to management, I learned from our failure on the importance of setting clear milestones, and of saying "I don't need 10% of the department time as a constant, I need 5% now, in the form of one specific person, and in 30 days I'll need 20% of all hands on deck", not doing so meant that we had people bored while we were preparing, and schedules tight when we actually needed the people. One thing we tried to do but didn't get to an agreement was on our definition of "quality" (which is a term I really hate for its vagueness) and following that, on the goals we could theoretically achieve.

 However, not all slaps in the face were as successful as this one, there were a few others, and two that had a significant impact, both were a case of a problematic hire. The first was a team leader for the orphaned team - it just so happened that the hiring manager, a developer in his past, had no idea how to assess testing skills, and thus, for most candidates in this position, he asked the other test team lead we had to interview, and due to the nature of the market - there were a lot of candidates that fit on paper, but had no relevant skills, and they were failing the simple skill checks. Then, there was this one candidate who did not get interviewed by said team lead - the benign explanation is that the team lead was under a time crunch and needed some space, and the hiring manager decided to skip this phase despite the obvious gaps in his filtering. The candidate that was hired was a very wrong fit - he didn't have the skills to help his team improve (they needed a strong coder) and he wasn't able to find someone else to provide this sort of service. He tried to bring a solution that worked for him in a completely different place without understanding the situation and without talking to his team, and he had the wrong mindset for what we were trying to achieve. All of this is because someone assumed they can hire a testing professional without at least having a conversation with the local testing experts (there were two of us, at the time, so I know it didn't happen) to understand what should they be looking for, and how to check for that. Failing to do that, the first person who knows to parrot some stuff that sound like testing will have an easy was to slip in.

The second slap is pretty recent - a new QA group lead was brought along. This is a double slap in our face - first, because bringing an outsider without letting someone from within to apply for the position is showing a lack of confidence in the people you have (this was also the case with the team lead position). Second, because bringing a group lead was the opposite of what we were doing here for the past few years, and finally, because no one talked to us, again. Now, it's complicated to have one interview their future manager, so it makes a lot of sense to leave us out of the hiring loop, but after so many bad choices, why not ask for our input? maybe we'll be able to convince you you don't need this function, maybe we won't and you'll explain to us what is it that you're trying to achieve, maybe we'll agree that in our current state we need to find a team lead that will have the explicit goal of disbanding this group within one year and then build a community of practice instead - but this didn't happen, and I found myself facing another manager that I don't appreciate professionally. In this specific case, it took the manager less than a month to make some unforgivable mistakes that were a clear signal for me that on top of what might be just professional differences between us and not sheer incompetence on his side, there are also a severe issue with his management style and skills that I can't accept. To be fair - this particular problem is something that is difficult to interview for, and I'm not sure it could have been avoided. It is still not a pleasant experience.

The one last thing that sticks in my mind is the most recent project I was involved in - a new, AWS native product that is supposed to be a major revenue boost for the company. It started while I was busy putting off some fires, or as we call it here "releasing a version". When I tried to get involved in the discussions with AWS, I got a pushback, and since I was busy with the fires, didn't insist. It turned out to be quite a mistake, since by the time I was ready to join, the initial architecture was already decided, and aside from being quite complex for reasons it took me a while to learn, there was zero thought on how to manage this coding project or how to test it, and I had to spend perhaps a month just wrapping my head around all of the new things I had to learn just to come up with a strategy, then break it down to a plan, and do this while everyone is asking "how soon can you start writing system tests?" and answer that we need about a month to change our infrastructure, and that it is more difficult to do because the product has no deployment scheme of the latest code, so each step is just taking five times the time and effort it should.  Dealing with the constraints I had was tough, and I wrote about it in my last blog post. I'm quite happy with the result, but sadly I won't see how well it works, or be there to help push for the strategy to be fulfilled - there are a lot of ideas that are new to the company (sadly, one of them is unit tests), and there is surely some tinkering required with the new mechanisms (such as contract tests and managing the various contracts), but that's someone else's task now. 


I'm off to find new challenges, and will be starting tomorrow in a new and exciting place.

Tuesday, June 18, 2024

Testing a cloud-native product


 עברית

In the past several months the company I work for has been building its first cloud native product, and I was tasked with figuring out how to test this, given the limitations developing in the cloud poses for our kind of application. This posed a few challenges that made everything that much more complicated, and while there's a lot of materials online about tools that can help with testing stuff on the cloud, and specifically on AWS (which was my focus) I couldn't find a holistic overview of how to approach testing such a project. I hope this post will help remediating this gap, and hope even more that I'll annoy enough people who will point me to articles that already exist and I missed just to prove me wrong.

I had noted the following difficulties, that might, or might not be relevant to your project as well:

  1. Our product deals with AWS accounts as its basic unit of operation - we deploy a service that protects the account's data. As such, each environment needs to have its own account, so deploying multiple environments for testing (and development) is not really feasible - it's expensive, long, and deleted accounts stay live for 90 days or so. 
  2. The product is designed as several services communicating with each other, maintained by several different teams, and using several repositories.

So, how can we bake feedback into our SDLC? It didn't help that we didn't yet define said SDLC, but we're not here to help, are we? Besides, this is also an opportunity to define said behavior in a way that will enable feedback instead of being an obstacle.

The solution that made sense in my mind was the one I found in Dave Farley's book "Modern Software Engineering" (I wrote about it here) that basically said testing the entire product before it is deployed to production is something you don't get to do in a services architecture. Really cool concept - having multiple, individually deployable components and have each of them reach production in its own time. There was only one caveat to this approach - there's no way in hell I'll manage to convince the organization its the right thing to do, and even less of a chance to get the necessary changes in our processes and thinking in place. However, I might be able to use the constraints we do have to help me push at least partially in that direction. 

The plan, if I'm honest, is assuming the old and familiar test pyramid, only this time presented as "well, we can't do it in the ice-cream cone we're used to, we don't have enough environments." 

So, the vision I'm trying to "sell" to the organization is as follows:

  1. Yes, you can have your nightly "end-to-end" system wide regression test suite. However, unlike other products where the bulk of testing actually happens on this level, the nightly will be focused on answering the question "is it really true that all of the parts really can work together?" For this purpose, we'd want a small number of happy-flow tests to run through the various features we have.
  2. The bulk of testing should be done in the unit test level. While our organization still has a lot to learn on how to use unit tests properly (starting with "let's have them as a standard"), at least for the small services it's really easy to cover all of the edge cases we can think of in unit tests. It also happens that for AWS there's an awesome mocking library called "moto" which provides a pretty decent and super easy to use mock for a lot of AWS functionality without needing to change the already existing code.  In this layer we'll verify our logic, error handling, and anything else we can think of and can test at this level.
  3. Still, not everything can be unit tested, and on top of our logic, we want to check that things work on the actual cloud before merging. Therefore, we are attempting to build a suite of component tests that will deploy parts of our system and trigger their sub-flows. For example, we have a component that scans a file and sends the verdict (malicious or benign) to another component to execute the relevant action - so we can deploy only the action executing lambda alongside a bucket and start sending it instructions to see that it works.One benefit we get, apart from being able to tell that our lambda code can actually run on the cloud, and is not relying on dependencies it doesn't have, is that we must keep our deployment scripts modular as well (I did mention some small steps, right?)
  4. We have a lot of async communication, and a lot of modules are depending one on another to send data in specific format otherwise they break. To expose those kind of problems early, we are introducing Pact for contract testing - the python version of the tool is a bit limited at the moment, but it has most of what we need. 

So, to make it short - during a pull request, we will run unit, component and contract tests, and once a night we'll run our system tests. I have yet to tackle some other kinds of testing needs such as performance, usability, or security, but we already have a worst-case solution: spin a dedicated environment for that kind of test and get the feedback just slightly slower than the rest of tests. I'm hoping that we'll be able to move at least some of those into the smaller parts of testing (such as having a performance test for each component separately), but only time will tell.

I expect that we'll leave some gaps in the products that already exist and are used in this new projects, but hoping this gap will be managed in a good enough manner. 

One question that popped up in my head as I was writing - is this approach what I would choose for similar applications that don't have the major constraint on environments? From where I'm standing, the answer is yes - even if it's easy and cheap to spin a lot of test environments, testing each component in isolation and focusing on pushing tests downwards simply provides a lot of power and enables kinds of tests that would be really difficult on a system-wide context.

So, those are my thoughts on testing cloud native, service oriented applications. What do you think?

בדיקות מוצר מבוסס ענן

 
 בחודשים האחרונים אנחנו בונים מוצר שהוא, כפי שמקובל לומר, קלאוד-נייטיב, יעני, בנוי בארכיטקטורה שמנצלת את הכלים השונים בענן מראש. אני קיבלתי את המשימה "בוא תבין איך אנחנו הולכים לבדוק את המוצר הזה, עם כל המגבלות שפיתוח מוצר ענן מכיל עבור מוצר כמו שלנו". האמת? היו פה כמה עניינים מורכבים למדי, ובעוד שיש לא מעט חומר בחוץ על כלים שיכולים לעזור עם בעיות ספציפיות, לא הצלחתי למצוא בשום מקום הצעות לגישה הוליסטית לבדיקת מוצרי ענן, משהו שאפשר לבנות בעזרתו אסטרטגיה. אני מקווה שהפוסט הזה יהיה הצעה אחת כזו. אני אפילו יותר מקווה שאצליח להרגיז מספיק מהקוראים כדי שחלקם יכוונו אותי למאמרים קיימים שפספסתי רק כדי להראות לי שאני טועה. 
הקשיים בהם נתקלתי, שאולי יהיו רלוונטיים גם לפרוייקט שלכם היו:
  1. המוצר שלנו מתייחס ל"חשבון AWS" בתור יחידת העבודה הבסיסית שלו, כי המוצר שלנו מגן על הנתונים שבתוך חשבון מסויים. כלומר, כל פריסה עצמאית של המוצר צריכה שיהיה לה חשבון AWS משלה, מה שגם לוקח לא מעט זמן לפרוש, וגם משאיר "שאריות" (חשבונות מחוקים נשארים לאיזה 90 יום), כלומר, היכולת להרים סביבות בדיקה מלאות על ימין ועל שמאל לא ממש ריאלית. 
  2. המוצר הוא מוצר ענן - יש בו כמה רכיבים שמתקשרים אחד עם השני דרך הודעות טקסטואליות, הרכיבים האלה יכולים להיות מפותחים על ידי צוותים שונים ונמצאים בכמה מאגרי גיט נפרדים (איך לתרגם repository?), מה שעלול להוביל לכך שעבודה על רכיב אחד תשבור דברים בהמשך הדרך.
אז, איך אפשר לדאוג למשוב אפקטיבי בתוך תהליך הפיתוח? זה כמובן לא מאוד עוזר שעדיין לא הגדרנו את תהליך הפיתוח, אבל אנחנו לא פה כדי לעשות לעצמנו חיים קלים, נכון? מעבר לזה, זו גם הזדמנות להגדיר את תהליך הפיתוח בצורה שתאפשר את המשוב האפקטיבי. 
הגישה שהכי מצאה חן בעיני היא זו שמצאתי בספר של דייב פארלי (כתבתי עליו כאן) שאומר ש"אנחנו לא מקבלים לבדוק את המוצר כולו לפני שהוא נפרש לשטח". המערכת שלנו מחולקת לרכיבים ניתנים לפרישה עצמאית (כלומר, אפשר לשחרר גרסה של כל אחד מהם בלי להתחשב ברכיבים האחרים) וכך אפשר לעבוד מהר ובצורה בטוחה. 
יש לגישה הזו רק חיסרון אחד קטן - אין שום סיכוי בעולם שאני אצליח לשכנע את הארגון שזו הדרך הנכונה, ויש אפילו פחות סיכוי שאצליח ליצור את השינויים התשתיתיים, המחשבתיים והתרבותיים כדי שיהיה אפשר להשתמש בגישה כזו. אבל, אולי אני יכול להשתמש במגבלות הקיימות בפרוייקט כדי לעשות צעד מסויים בכיוון הזה. 
 
אחרי כל ההקדמה הזו, איך אני חושב שנכון לגשת לבדוק מוצר כזה?
אם אני הגון, אני חייב להודות שאין שום דבר חדשני בגישה שלי - אלה בסך הכל כמה דגשים על פירמידת הבדיקות, עם העמדת הפנים "זה לא שאני לא רוצה לעבוד במודל הגלידה שאנחנו מפעילים בכל המקומות האחרים, זה שאין לי מספיק סביבות בדיקה בשביל זה". 
בקיצור, קו ההגנה שאני מנסה למכור לקולגות שלי הוא זה:
  1. אל תדאגו, תהיה ריצה של כל המערכת יחד. פעם בלילה, והיא תכסה רק תרחישים נטולי שגיאות, או כאלה שאי אפשר יהיה לבצע ברמות הנמוכות יותר. מטרת הריצה היא לענות על השאלה "אנחנו חושב שכל הרכיבים שלנו מנגנים יחד, האם יש משהו פטאלי שהחמצנו"?
  2. קו ההגנה הראשון שלנו הן בדיקות היחידה. כשעובדים עם AWS יש ספריית mock אמינה יחסית בשם moto שמאפשרת לנו להריץ בדיקות יחידה מול "ענן" שרץ בזיכרון. הספרייה לא מספקת את מלוא היכולות, אבל היא עושה עבודה די מרשימה. כאן אנחנו נוודא שהלוגיקה שלנו נכונה, שיש טיפול הולם בשגיאות, וכל מה שרק נצליח לחשוב עליו.
  3. מעל השכבה הזו נריץ בדיקות רכיבים. אנחנו אולי לא יכולים לפרוש את כל המערכת בכל פעם, אבל אנחנו לגמרי יכולים לפרוש כל מיני חלקים ממנה במקביל. כך נוכל לוודא שכל רכיב מצליח לתקשר עם תשתית הענן עצמה ולוודא שהתחליפים שהשתמשנו בהם בבדיקות היחידה אכן נאמנים למה שקורה בענן עצמו. נוכל גם לבדוק חלקים שלא לגמרי ניתנים לבדיקה ברמת היחידה, כמו למשל לוגיקה שנמצאת באופן בו אנחנו מגדירים את המערכת על AWS ולא ממש בקוד שלנו. כך גם נוכל לראות אם הקוד שאנחנו מנסים להריץ באיזושהי Lambda מנסה למשוך תלויות שאין לו. יתרון נוסף שאנחנו מקבלים הוא שזה מכריח אותנו לכתוב את המוצר באופן מודולרי, כך שכל רכיב יהיה ניתן להתקנה בפני עצמו (אמרתי צעדים קטנים בכיוון, לא?)
  4.  החלק האחרון בפאזל הוא בדיקות חוזה. כאמור, יש לנו כל מיני רכיבים שמתקשרים זה עם זה באופן אסינכרוני דרך הודעות טקסט - במקום לחכות עד שהם יהיו פרושים יחד כדי לראות מה שברנו בשינוי הקוד האחרון שלנו, אנחנו יכולים להריץ בדיקות חוזה בעזרת כלי בשם Pact. הגרסה הפייתונית של הכלי קצת מוגבלת כרגע, אבל כל מיני אנשים עובדים על זה במרץ, וממילא הכלי מכיל את רוב היכולות שאנחנו צריכים. 

או, במבט ממעוף הציפור - בזמן pull request אנחנו רוצים להריץ בדיקות יחידה, בדיקות רכיבים ובדיקות חוזה. אחת ליום נריץ את בדיקות המערכת כדי לקבל אישור שכל החלקים אכן מתחברים אלה לאלה. יש עוד כל מיני פערים שצריך להתייחס אליהם כמו למשל מה המקום הנכון לבדוק בו את ביצועי המערכת או אספקטים שונים של אבטחת מידע. במקרה הגרוע - תהיה לנו סביבה ייעודית עבור הדברים האלה. אולי לאורך הדרך נמצא פתרונות טובים יותר, אבל לא חייבים לפתור את כל הבעיות מראש. 

מחשבה אחת שצצה לה כשכתבתי את הפוסט הזה הייתה "מה הייתי אומר אילו היה קל, מהיר וזול ליצור סביבות בדיקה מלאות? האם הגישה שלי הייתה שונה באופן מהותי? אמנם קשה לענות בכנות על שאלות היפותטיות, אבל אני רוצה לחשוב שהגישה שאני מציג כאן נכונה גם במקרה כזה. נוסף על המהירות היחסית של הבדיקות הקטנות ביחס לבדיקות המערכת המלאות, בבדיקות הקטנות יותר יש לנו גישה ושליטה שקשה מאוד להשיג באופן אחר. חשבו למשל על מקרים של race condition - בבדיקות יחידה אני יכול פשוט לדמות את התחרות בכל אחת מהתוצאות, במקרים אחרים יהיה לי קל מאוד ליצור מצב בו אחד הרכיבים שולח הודעה לא צפויה אל הרכיב הבא בשרשרת (נניח, אם אנחנו מבקשים לסרוק קובץ, אבל עד שמגיעים לטפל בהודעה הקובץ כבר נמחק). 

בכל אופן, אלו המחשבות שלי לגבי הדרך הנכונה לבדוק  מוצרים מבוססי ענן, מה דעתכם?

Wednesday, May 22, 2024

Best intentions

 

 

One of the more difficult challenges in coding is to capture intent within your code. A code, naturally, is great at "what", but it takes some real effort to have also some of the "why" in it. There are a lot of tricks that can help - good variable names, breaking stuff into methods, modules and packages, domain driven design, and when all else fails - we might add a comment. Still, it's well too common to read a piece of code written by someone else (or by you of the past) and wonder "why on earth is the code like this?" When writing tests, and especially system tests, it's even more important. 

I got a reminder for that recently. When testing some specific feature of malware detection, some of the files started to fail, and after some investigation we've found out that most of them were blocked because they contained a file that was malicious in a different way than the one expected by the test, and another file is now no longer considered malicious after some global configuration has changed. 

The easiest way to fix this is to just remove those offending files and forget about it, but then come the questions - 

  • Why are those files there?
  • Are there different aspects to the different files that are relevant to the tested feature?
  • why are we using multiple files to test what looks like a simple feature? are there complexities we're unaware of? 
  • If I want to replace the test files - what properties should they have?  
  • How to prevent this from happening in the future for the rest of the files?

Naturally, the exact time in which this has all happened was in the most inconvenient time - there was a pressing deadline blocked by this test, the person who wrote the test was in vacation, and everyone were wondering what is going on. 

Looking back on the situation, I can see some mistakes we've made when writing this test, some of them we've talked about during the review and mistakenly dismissed, others we missed altogether:

  • The test was actually testing more than one thing. Due to careless choice of test data, we chose files that participated in some flows different than the one we intended to test, when the configuration around those flows changed, our test was sending us false failure signals
  • We failed to control our environment - we have a limitation (which we are aware of, and for the time being - accept) about some global configuration that can be updated outside of our team's control. We ignored the impact it might have on our test.
  • We didn't do a deep enough analysis: we had some files that each exposed a different kind of bug during development, but instead of understanding the root cause and what was actually different in those files, we just lumped everything together. 
  • We were not intentional in our testing - instead of understanding the feature and crafting input data to challenge the different parts of our model, we just took some "real" data and threw it on our system. In addition to now not knowing which files would be a suitable replacement, we also have no idea how complete or incomplete our testing is. 
  • Our files are not labeled in a way that conveys intent - they are just called "file 1", "file 2" and so on (the actual name is also mentioning the feature's name, but that's about all the extra data there)
  • Finally, our assertion messages proved to be less helpful than they should have - in some cases, not even mentioning the name of the file used (for time reasons, we decided to run multiple files on the same test, which we normally avoid)

So, we have some cleaning up to do now, but it's a good reminder to put more care about showing our intention in code. 



עניין של כוונות

 

 

אחד הדברים הקשים יותר בפיתוח תוכנה הוא הבעת כוונה בעזרת הקוד. באופן טבעי, קוד מספר לנו "מה" נעשה (ואם כתבנו אותו היטב, הוא עושה זאת באופן ברור), אבל נדרש לא מעט מאמץ כדי להכניס לתוכו את התשובה ל"למה". יש כל מיני כלים שיכולים לעזור - שמות טובים, חלוקה לפונקציות ומודולים, פיתוח מונחה תחום (Domain Driven Design), ואם שום דבר אחר לא עובד - מוסיפים הערה בתוך הקוד. ועדיין, להכניס כוונה לתוך קוד שחייב להכיל הוראות ביצוע זה לא פשוט ואולי אפילו לא תמיד אפשרי. אחת החוויות הנפוצות היא לקרוא פיסת קוד של מישהו אחר (או של עצמי מלפני כמה חודשים) ולתהות "רגע, למה הקוד עושה את זה? זה בכוונה?". היכולת הזו של לספר על כוונה בתוך הקוד אפילו חשובה יותר כאשר כותבים בדיקות. למה? כי אתם יודעים שהפעם הבאה בה תסתכלו על הקוד הזה יהיה כאשר משהו ישתנה והבדיקה תישבר - זה כי הכנסנו באג או כי הדרישה השתנתה? 
לאחרונה, קיבלתי תזכורת לעניין הזה. בזמן הבדיקה של זיהוי סוג מסויים של נוזקה - הבדיקה רצה על כמה וכמה קבצים, ורק חלק מהם התחילו להתנהג לא יפה - חלק נחסמו אבל מסיבה שונה לחלוטין, וחלק אחר פשוט הוכרזו כקבצים חפים מפשע. אחרי חיטוט לא מאוד קשה, גילינו את הסיבה - חלק מהקבצים הכילו קבצים אחרים שנחסמו מסיבה אחרת לחלוטין, וחלק אחר נחשבו לתמימים בעקבות שינוי קונפיגורציה (בחלק שלא נשלט על ידי מערכת הבדיקות, בינתיים). בסך הכל, חדשות טובות - זה לא באג במוצר, רק שימוש בנתונים לא מתאימים. עכשיו, איך לתקן את הבדיקה בחזרה?
לא משנה באיזה פיתרון נבחר, ברור שצריך להעיף את הקבצים הבעייתיים. אבל אז עולות כמה שאלות - 
  • למה אנחנו מריצים כאן יותר מאשר קובץ אחד? האם כל קובץ מייצג מחלקת שקילות שונה, או שפשוט בחרנו כמה דוגמאות אקראיות שתמיד צפויות להתנהג באותו אופן?
  • האם יש לקבצים שנכשלו תכונות מיוחדות שאין לקבצים האחרים בבדיקה הזו? 
  • האם אנחנו צריכים להחליף את הקבצים שהסרנו? אם כן, במה?
  • איך למנוע מהתקלה הזו לחזור על עצמה שוב?
כמובן, תקלות כאלה לא מתרחשות בזמנים רגועים. זה קרה בדיוק כשהיינו צריכים לשחרר גרסה (דחוף! עכשיו! אנחנו כבר באיחור!) ומי שכתב את הבדיקה הספציפית הזו היה בחופש של שבוע, כך שלאף אחד לא היה מושג איך לענות על השאלה הזו. הבעת כוונות, כבר אמרתי?  כדאי שנשים לב - זה מעלה עוד קושי שלא הזכרנו קודם: כשאני כותב קוד, אילו כוונות אני צריך להביע? על אילו שאלות ארצה שהקוד יענה? למשל, הקוד הנוכחי ענה מצויין על השאלה "איזה פיצ'ר אנחנו בודקים", וברור לגמרי שהוא לא צריך להביע את כל מה שנכתב בתיאור הפיצ'ר. למרות זאת, בדיעבד, היו כמה דברים שיכולנו לעשות אחרת:
  • קודם כל, המבדק שלנו בדק יותר מאשר דבר אחד. בגלל שבחרנו לעבוד עם מידע "אמיתי" שלא הבנו עד הסוף, הקבצים שבחרנו עברו דרך כמה מסלולים נוספים שיכלו להפריע לנו. כשהשתנתה הקונפיגורציה מסביב לאלה - הבדיקה שלנו נשברה. 
  • לא שלטנו מספיק טוב בסביבה שלנו, ולא הכרנו במגבלות האלה. יש לנו מגבלה (שאנחנו מכירים) סביב שליטה בכמה קונפיגורציות גלובליות, אבל התעלמנו ממה שזה עלול לעשות למבדק שלנו. 
  • לא ניתחנו את הפיצ'ר טוב מספיק - היו לנו כמה קבצים שונים שחשפו באגים שונים במערכת בזמן הפיתוח אז השתמשנו בהם, במקום להבין את הסיבה להבדלים. 
  • כתבנו את הבדיקות שלנו בצורה חסרת כוונה - הגישה הנכונה לסוג כזה של מבדק היא לנתח את הפיצ'ר, להבין אותו, ולתפור דאטה סינתטי (או לבחור קובץ אמיתי שמתאים בדיוק) שמכוון לכיסוי החלקים השונים במודל שלנו. במקום זה, לקחנו דאטה "אמיתי, מהשטח" וזרקנו אותו על המערכת, מה שגם אומר שאין לנו מושג כמה שלם הכיסוי שלני. 
  • הקבצים שלנו לא היו מתוייגים בצורה שתבדיל ביניהם - בגדול, התיוג היה שקול ל"קובץ1","קובץ2". 
  • ולבסוף, משהו קצת פחות קשור - הודעות השגיאה שהמבדק שלנו זרק היו פחות מועילות מאשר הן היו יכולות להיות. אפילו לא תמיד הזכירו את הקובץ הבעייתי (משיקולי זמן ריצה בחרנו לחבר כמה קבצים ביחד, אולי זו הייתה טעות נוספת)
בקיצור, יש לנו עבודת ניקיון לעשות, אבל זו גם תזכורת טובה על החשיבות של הטמעת כוונות בתוך הקוד שלנו.