Over the last few weeks, I’ve become more and more interested in the LinkedIn conversations that revolve around grappling with the role of “Agile Coach”. This interest was prompted by some provocative hashtags that had me feeling ambivalent about whether I wanted to dive deeper or not. One of the big barriers was that a lot of the grappling had happened in recorded group conversations that were then published on YouTube. I didn’t feel like sitting through a total of 8 hours of conversations to catch up.
At the same time, I’ve wanted to dive into using AI for something, so I started experimenting with building myself a system that would help me make sense of something like this. The results of my first attempt will remain shared only with the people who participated in the conversations directly. I received the feedback I had hoped for out of that first attempt, and that spurred me on to make another attempt.
Ironically, I ended up spending way more than 8 hours on this journey.
What have I learned?
- You need to start with really good transcripts – Sonix.ai became my go-to tool for this, and I spent $80 there, after using various tools to download the videos and extract the audio from them.
- You need to clean up the transcripts too, making sure you get the speakers matched up right. I needed help from the YouTube videos for this, which made it such that I spent more than 8 hours on the effort.
- You can’t necessarily take a whole session transcript, feed it to AI, and get something out that the conversations participants will recognize or agree with. You may need to break the transcript into smaller pieces (that fit into the AI’s context window), and keep a running summary to feed back in with each new chunk.
- Once you have the transcript, you can “have a conversation with the conversation“, as represented by AI. So you can ask questions of the conversation “from a distance”, and use the responses to inspire new questions.
What tools did I use?
- For post-processing the YouTube source material, I used:
- ffmpeg
- to extract the audio from the YouTube videos
- ffmpeg
- In my first attempt I used:
- SumTube.ai
- to get summaries of the conversations in “one shot”
- this did not result in satisfying outcomes, partly because what SumTube delivers is an image-based PDF format, so you then have to use other tools to convert the images to text
- the summaries work off of YouTube’s own transcripts, which do not maintain speaker attribution
- to get summaries of the conversations in “one shot”
- whisper (run locally on my laptop)
- to transcribe the audio files in an attempt to get better versions than what YouTube hands you – the problem is that whisper (the way I used it at least) didn’t maintain speaker attribution either – everything just became one big text file
- ollama
- to experiment with “offline” AI for processing the transcripts
- this did not result in satisfying outcomes, due to the limitations inherent in the models I was able to download and run on my limited hardware (gemma2:2b, llama31 and llava)
- to experiment with “offline” AI for processing the transcripts
- fabric
- to use a set of “patterns” (elaborate, well polished prompts) for analyzing the transcripts, utilizing either the local ollama models or OpenAI’s ChatGPT
- as with ollama above, the limitations of the local models gave less than satisfying results
- when pointed to OpenAI’s ChatGPT, the results were more satisfying, but due to the transcripts not having speaker attribution, they were not as good as they could be
- to use a set of “patterns” (elaborate, well polished prompts) for analyzing the transcripts, utilizing either the local ollama models or OpenAI’s ChatGPT
- SumTube.ai
- In my second attempt I used:
- Sonix.ai
- to get high-quality transcripts that helped maintain speaker attribution
- to review the transcripts by hand, with a highly capable website UI to make some necessary changes around acronyms, people names, concepts mentioned, etc.
- PowerShell scripts
- to arrange the transcripts sentence by sentence, one per “line”
- to split the overall transcripts into smaller chunks that better fit AI “context windows”
- to “clean” the chunked transcripts (spoken language idiosyncrasies, repeated words, starts-and-stops, etc.), producing what I will call “clean” transcripts
- to summarize each cleaned up chunk
- to reassemble the summarized “clean” chunks into one file(incidentally, ChatGPT was able to help with the creation of many of these scripts, something that impressed me quite a bit)
- ChatGPT with the 4o model (yup, I ended up spending another $20 for a subscription…)
- to summarize the “clean” transcripts
- to have “conversation with the conversations”
- fabric, pointing to ChatGPT’s API (and the 4o model)
- to use “patterns” for analyzing the “clean” transcripts, as in my first attempt
- Sonix.ai
Interestingly, the most expensive piece (dollar-wise) in all of this were the the transcripts. My own usage of OpenAI’s APIs via scripting came to a total of about $3 – and I feel like I had a lot of API calls going on.
I’m not sure if you’re interested in the results of the whole effort, but I can share with you the prompts I used in the various pieces.
For cleaning up the chunked transcripts (with gpt-4o-mini):
“Please clean up this transcript by removing only the most distracting linguistic or spoken word idiosyncrasies, while keeping as much of the original spoken word and conversational tone as possible. Do not retain any previous results once they have been processed. Make sure nothing is skipped or left out. Do not summarize or interpret. But maintain everything that seems like a back-and-forth ‘verbal volley’, or brief expressions of affirmation or dissent.”
prompt: “Please summarize the content of the following part of a longer conversation transcript. You will be given a previous summary for context. Make no mention that this is a new part to the conversation. Just treat it as a continuation.”content : “Here is the previous summary for context: <previousSummary> Here is the new text to summarize: <partContent>”
Please give me a detailed list of the points made by the participants in this transcript of a dialogue. Group the points by person.
What concrete evidence do the people provide for their claims?
What’s the real issue in the conversation?
Do you see any contradictions in the conversation?
Please summarize this transcript according to themes, problems and solutions discussed. Then, focus on each participant’s position and summarize what they brought to the conversation.
Now, please give me a list of all the points each participant made in the conversation, grouped by person.