There was a question on the forum about bots, which reminded me of a little experiment I did some years ago. I was at that time preparing for a project at my work where I would mine data from scientific publications and do some analysis on that. So I thought as a first experiment to do the same on this forum. So I wrote a bot that mined the posts from the Aviation forum and did some simple analysis on it. The thread you can find here The forum's most discussed single engined fighter of ww2 is....
So after reading the post about bots, I decided to brush off the dust on that old script and see if it would work again on the new forum layout. Of course it didn't. So I cleaned up the code, migrated the code from python2 to python3 (the program language, not the snake) and enhanced the working of the script.
The old script was very simple. It collected all the posts, filtered out the quotes and stored it as one long text file. I now changed this behavior in that it also stores the author, the posted date and the post number in the thread. The mined data will then be put in a little relational database so I can analyse more easily. I did not include the word-cloud script anymore, can still add that later.
For the programmers among us, the code can be found on github: GitHub - mke21/ww2aircraftcrawler: little script to find some simple statistics on the ww2aircraft forum.
If you didn't follow all that technical stuff, the summary is this: I made a script to count the occurrence of aircraft names on the forum. So I can count how many posts contained the word 'spitfire' or another aircraft. I can also do that per user or per year. I thought it would be a nice experiment to think up some analysis that I can do. These are the same techniques that Google, Microsoft and Facebook release on you in your daily browsing time. Of course their setup is much more sophisticated
For now I have a list of 90 single engine fighters. I will post this later in a post. Maybe you guys can come up with ideas that we could try.
I will post the current list tomorrow. Hope you will join me.
So after reading the post about bots, I decided to brush off the dust on that old script and see if it would work again on the new forum layout. Of course it didn't. So I cleaned up the code, migrated the code from python2 to python3 (the program language, not the snake) and enhanced the working of the script.
The old script was very simple. It collected all the posts, filtered out the quotes and stored it as one long text file. I now changed this behavior in that it also stores the author, the posted date and the post number in the thread. The mined data will then be put in a little relational database so I can analyse more easily. I did not include the word-cloud script anymore, can still add that later.
For the programmers among us, the code can be found on github: GitHub - mke21/ww2aircraftcrawler: little script to find some simple statistics on the ww2aircraft forum.
If you didn't follow all that technical stuff, the summary is this: I made a script to count the occurrence of aircraft names on the forum. So I can count how many posts contained the word 'spitfire' or another aircraft. I can also do that per user or per year. I thought it would be a nice experiment to think up some analysis that I can do. These are the same techniques that Google, Microsoft and Facebook release on you in your daily browsing time. Of course their setup is much more sophisticated
For now I have a list of 90 single engine fighters. I will post this later in a post. Maybe you guys can come up with ideas that we could try.
I will post the current list tomorrow. Hope you will join me.