Learning about AI at Agile Testing Days 2023

ATD 2023 – I don’t have enough words to describe the joy, the human connections, the learning. I think I do have words to share some of my biggest takeaways. For this first part, I’m going to share some things I learned about AI in workshops and open space sessions. Note that there were many more sessions on the topic of AI and machine learning, I simply couldn’t get to them all!

“Bart and James learn about testing AIs”

You’ll see that I pretty much just went to sessions facilitated by Bart Knaack and James Lyndsay! James and Bart gave us the opportunity to train a model using https://teachablemachine.withgoogle.com/. Bart and James provided us with a project that included Blue Things, Red Things, Green Things and Undecided things. They explained the basics of how the training works. This included the terminology, the parameters and metrics, and how to see how the training system judges its own accuracy and confidence levels.

I paired up with a fellow participant (Sami from Finland) for an exercise involving Teachable Machine’s parameters and metrics. We experimented with different parameters to see if the model became more accurate. Using items of various colors that we found around us, including a Coke Zero bottle label, the blue streak in my hair, and a banana, we tested the model to see if it recognized the right color. Uploading additional images with quite obvious color for training helped the accuracy. Training “blue things” with images of yellow things on purpose confused the models, as expected.

My takeaway: training a model looks like a combination of art and science to me. We had to keep in mind that the machine decides which images to train on, and which to keep back for testing. There’s a lot “under the hood” that isn’t visible. And this is true in “real-life” machine learning models. Other groups in the workshop found it’s easy to introduce bias into the model. Trying out the process of training a model is the best way to learn about machine learning models!

Bart and James open space: “Ethical aspects of testing AIs”

All the AI-related sessions attracted lots of people! Or – maybe it’s that James and Bart’s sessions are always popular! This open space session did not disappoint. One takeaway was their interesting way of setting the agenda. They asked us to write our topic ideas on a piece of A4 paper and drop it on the floor. Then, walk around the room, and stand by whatever idea interested us the most. Out of this, we got three groups. My group’s topic was “How to integrate planning fore ethics into a team’s software development process?”

Our group started by brainstorming individually, writing ideas onto sticky notes. Then we put the notes on a flip chart and grouped similar ones. The ideas generated insightful conversations. We decided the first step for a team or organization would be to make the topic of ethics visible. For example, a team could make compliance with ethical policies an item on the Definition of Done.

Ethical training, learning about large learning models, and gaining legal awareness is perhaps the next step. The organization also needs to set policies for things like AI tool use and data usage. One person in our group noted that where he works, they are not allowed to use GitHub Copilot with the organization’s code. However, they are free to use it with their personal repository in order to research coding questions and apply what they learned on the company’s code.

Having legal experts collaborate with the development team would be helpful. Teams also need AI subject matter experts. We saw a big benefit to pairing up and ensembling with people with these specialized knowledge areas.

As far as testing, we saw a need for risk storming / analysis activities to include AI ethical risks. How would teams test models for bias? They also need to check that the organization’s ethical guidelines are being met. The team’s continuous integration could include a process to control AI training parameters.

In my experience, organizations often fail to articulate their ethical guidelines and “test” to see that they are met. As AI becomes part of our daily work, we need to make sure that we don’t harm customers. And make sure that AI tools don’t harm us and our products.

Open Space: “AI Avatar Black Box Oddness with James and Bart”

This session was about judging the output of a generative AI, and trying different ways to improve the outputs. I only watched the first half of this, as I’d done a similar activity in James’ summer “Testing and AI Series” online workshops.

James and Bart use DreamBooth, which in turn uses StableDiffusion, to generate pictures based on text prompts. Starting with a picture of a paricular person, you can ask the generator to produce various pictures. For example, starting with a picture of Bart, you could ask it to create a cowboy astronaut Bart. Asking for a large number of pictures helps you start to see patterns.

Looking at the output from the prompts leads to interesting questions such as: What constitutes a “good” picture? How do we test whether the output matched what was requested in the prompt? Was the training data adequate to get good output? Were the prompts correctly worded? For me, this is just dipping my toe in the water of testing generative AI. These are amazing tools – if they are trained correctly, and if you know how to use them correctly.

Leveraging the power of AI for testing – Lisa and Rachel

Rachel Kibler and I had a full house for our workshop on the last afternoon of the conference. I was thrilled to get

Rachel Kibler and Lisa Crispin at their ATD 2023 workshop — Rachel Kibler and me at our workshop!

to do this workshop with Rachel. She’s using these generative AI tools in her real-life work. She’s found so many ways to apply them. Since we decided last year to do this workshop, I’ve been highly motivated to learn what I can! And, I probably learned as much as any of the workshop participants during our session!

Learning about Machine Learning

For our first exercise, we had table groups look at unlabeled pictures of various animals and classify them in whatever way made sense to them. Then we gave them the same pictures, but with labels on them specifying one of two classifications, and an additional unlabeled picture. We asked them to classify the unlabeled picture.

This gave everyone a taste of unsupervised versus supervised machine learning. Only one group out of ten figured out the right way to classify the unlabeled animals. Some groups couldn’t correctly classify the unlabeled one after studying the labeled pictures. It’s easy to see how machine learning can go wrong.

Practicing prompt engineering

Participants practiced prompt engineering in our second exercise. We asked the table groups to pair or group around one laptop and ask ChatGPT how to test a basic web app’s login screen for functionality, security, and accessibility. Our discussions brought out a lot of good tips, especially from Rachel. For example, ChatGPT will give you a huge list of suggestions. If you change your prompt to say something like, “I have only one hour to test this, what are the most important things to test”, it will give you a shorter, prioritized list.

Giving it context also helps a lot – “How do I test this as a security tester”. One group prompted ChatGPT to help with hacking the login page, which it politely refused to do. When given the prompt “I’m a white hat hacker”, ChatGPT was then happy to oblige. You can challenge ChatGPT with constraints, limitations, incomplete information, contradictory statements to get it to analyze and weigh responses.

“Explain like I’m five”

Another group exercise was to copy code in any language from Emily Bache’s Gilded Rose Kata repository, paste it into ChatGPT with a prompt to explain the code. One pair used the Cobol code first, then asked ChatGPT to translate it to Python. They then ran the Python code, which worked!

ChatGPT, and the IDE code assistant plugins such as GitHub Copilot, are generally helpful at explaining code. Do be careful though, because these tools can also make things up. I’ve seen Cody, a code assistant for VSCode, report infomation about files that aren’t even in the code base.

Perils and pitfalls of using generative AI

Rachel led participants through a quick exercise to illustrate anchoring bias. AI can help break our anchors, but it can also anchor us. She recommends starting with your own ideas. Read through everything the AI tool says. Ask multiple times in multiple ways to see if you get consistent results. Don’t trust everything you read!

All the participants joined a great discussion about the risks of using generative AI tools such as ChatGPT. We had already seen during the exercises that ChatGPT will just make things up sometimes. There’s huge potential for bias. Lack of visibility is a big issue – we don’t know what data was used to train the model. Code generated by ChatGPT, it may be poorly designed, it may not meet your organization’s standards. If you can’t spot the problems, you can end up with problems and technical debt. Many large corporations have had to ban or restrict use of generative AI tools to prevent employees pasting in confidential information.

Many uses of AI

Over the course of the workshop, Rachel shared many ways her team takes advantage of generative AI. For example, they made a template for bug reports. Feeding information about a potential defect into ChatGPT with the correct prompt to get a correctly formatted bug report. Rachel showed how they can paste testing notes into ChatGPT and get it to generate a list of FAQs. Let your imagination run wild and see how many ways generative AI can help you.

We tried to convey that AI isn’t taking our jobs – but we will need to know how to use these new tools in order to keep our jobs. We still need to have all the skills and knowledge that we’ve always needed. You can’t blindly take answers or code generated by AI and use them. You have to know whether the tools are giving you correct information and well-designed code. Using them intelligently will free some of our time that we can use for important work that requires our human skills and experience.

These sessions left me so excited to learn more about generative AI and what we can do with it. If you’re new to all this, like I was, and want to learn, here are a couple of resources that helped me:

How To Test a Time Machine, a practical guide to test architecture and automation, by Noemí Ferrera, includes information on core concepts of AI, using AI to help with testing, and how to test AI apps.
Prompt Engineering for Everyone, David Bernstein and ChatGPT,

Check back here for more of my ATD 2023 learning experiences!

Learning about AI at Agile Testing Days 2023

“Bart and James learn about testing AIs”

Bart and James open space: “Ethical aspects of testing AIs”

Open Space: “AI Avatar Black Box Oddness with James and Bart”

Leveraging the power of AI for testing – Lisa and Rachel

Learning about Machine Learning

Practicing prompt engineering

“Explain like I’m five”

Perils and pitfalls of using generative AI

Many uses of AI

Leave a Reply

Categories

Archives

Recent Posts:

Some takeaways from the Software Teaming 2024 conference

Celebrating 10 years of friendship, courage, and joy

A look back at release engineering

What DevOps culture means to me – and how you can help it grow

More learnings from Agile Testing Days 2023 – part 3

More stuff I learned at Agile Testing Days 2023

Some takeaways from the Software Teaming 2024 conference

Celebrating 10 years of friendship, courage, and joy

A look back at release engineering

What DevOps culture means to me – and how you can help it grow

More learnings from Agile Testing Days 2023 – part 3

©2024 Agile Testing with Lisa Crispin. All rights reserved.