You have a bunch of hidden assumptions here:
<we can scrape the post content from the page html>
I highly don't recommend that: it's awkward, unreliable, and prone to breaking if the site admins tweak the page styles even a little bit.
The best representation of a post to go on is its actual bbcode. The best way to read that
would be to have access to the forum database, but that's not likely to happen. In theory the admins could set up some API access to post data, but you'd have to give a bloody good reason for them to go to all that hassle.
You could, I suppose, use the forum quote function itself to see the bbcode of somebody's post. However, any
large-scale scraping of all threads on the forum is going to create unnecessary load on the servers, so don't be surprised if you get told off.
<each post quotes zero or one other posts>
Not true, I often quote several people at once if I only have a short response to each of them. I find it tidier than posting multiple replies.
<quotes can be traced back to the original post>
As I'm demonstrating right now, quotes can be paraphrased, or even invented out of whole cloth.
Import this information into a mind mapping tool, and we have Thought Bubbles!
Okay, but... now what? You've essentially created a threaded-discussion view of a forum topic (instead of phpbb's usual linear view), but it's not clear what this is useful for.
The tool I found to grab the pages grabs everything within that directory and sub directories so it would be a server hog and pulls lots of unnessecary data.
Running large numbers of wget -r queries against the forum server is probably a good way to piss off the admins, yeah.
<parsing HTML with regular expressions>You can't do that