Hi CubeCraft Community!
It's been a long time since I've been active around Minecraft. But before I get into my lifestory, I wanna jump right to point.
You have probably seen the hype around AI, you can't deny it. Well, as a son of an AI researcher I jumped right into learning it.
Since we've seen how you can cheat in exams now, I wanna focus on a totally other part of AI: defense and prevention.
This is not an official name, but basically these AIs focus on catching hackers and other bad actors on networks, filtering through your Google Drive for illegal content, etc.
So one question? If the technology is there, why aren't we using AI in game moderation? Well this is an ongoing question at this moment. You see detecting movement cheats etc. is not an easy task even with AI. So what did I do, you may ask?
Well, ever since the beggining, I had an idea of automating one part of moderation, that has been manual ever since: skins. So I jumped right into it.
You see I've created an AI capable of categorizing skins and capes (from all cape providers, such as Mojang, OptiFine, Labymod, etc.)
Now it's very primitive as I don't have that big of a dataset, but I'll jump to the results later.
So how did I do it?
Well you need a lot of examples to train your AI with, so I started with collecting many many skins in each category.
As the category names and the skins are actually inappropriate, I'll only show you the allowed directory this time :). (And also won't show you actual true inappropriate data in action, as you may search it up, and will get me in trouble on the forum)
After that you need to use some sort of AI library, because you don't want to write that on your own. Since I'll be implementing this to be an API, and since I have experience with Java I decided to go with DL4J.
With it, I basically created a Convolutional Neural Network, which job is to extract details about an image first, then pass it to a actual neural network which will learn what features correspond to which category. Now, please note I'm oversimplifying here, as it gets really technical.
Ok so it runs on my CPU, but OMG it runs sooo slow, minutes to train it! So I went through 2 hours of suffering getting CUDA to work, so the network is training on my GPU in parallel! With it I can train it in under 1 minute on avg.
Then after that you have the job of painstakingly adjusting little parameters until you get the model to perform decent.
So ok batchSize is 20, what is my accuracy 75%, ok then what will 21 do (just random numbers for an example, not actual behaviour!). You get the point
This is what I modified for hours long:
What is an epoch?
One complete pass through the training data. (Because in life if you see for ex. a formula once, you won't learn it)
Now with this there is a probablity it could over-fit, meaning it gets to know the training data soo much, it will only work with that. So here I have it setup, so after every epoch, a function will evaluate the network and stop it, if the epoch made it worse compared to the last one.
What is the batchSize?
The batch size defines the number of samples that will be propagated through the network. So very dumbly, with the real world example, it basically means it well see x amount of examples in one training session.
What is numOfExamples?
You see for a network to perform well you need a lot of data, and I mean ideally multiple thousands if not millions. Now you can image I don't have that many skins and capes on my computer. So I'm basically oversampling it, meaning the AI will see it multiple times not just once.
And a side note here. Every category must have around equal amount of pictures, otherwise one will be stronger. So behind the scenes the code is actually making sure the training data is balanced.
So does it work?
Yes!
However, at the moment the network is decent, but not good enough for my liking, which is mainly the reason behind my lacking dataset. Since I don't have much time downloading thousands of skins, and hence there aren't that many actually, I will not reach the actually recommended amount of data for training quickly.
My actual nemesis are capes, since they come in all sizes, 10x16 from Mojang, 20x32 from OptiFine. As you can see here's a nice false positive, courtesy to rubik_cube_man
So yeah, there's room for improvement, but note that false positives will always occur. It's basically impossible to get 100% acurracy.
However I may decide to implement the final product in a way, where it also learns during production use based on the feedback from the moderators, but that's just an idea.
So how does the model perform overall?
Well in my primited test data, the cape classifier classified the capes with 92% accuracy. (Broken down it means that if I would give it 100 players. It will categorize 92 of the properly, the other 8 falsely. ) The skin classifier performs around the same.
Can this be better? Of course, but I'll need much more data to run it on.
Can I use it?
Currently this project is closed source, as it's not close to final at all, and I won't even share the actual network configuration with you, but if we finish the project it will become open source on GitHub. Until then, please don't bombard me with requests and questions, as I won't really answer them. We also plan on introducing some NLP models, which will give us the power to categorize usernames as well! We also want to look into porting this to 100% Bedrock compatibility.
Disclaimer: I'm nowhere near an AI expert, so take these words with a grain of salt.
Hope you liked this thread, and gave you some insight into the world of AI classification models. I may do a follow up, when a notification system is in place, etc. But before I do so, I wanna make sure the model is as good as it gets.
It's been a long time since I've been active around Minecraft. But before I get into my lifestory, I wanna jump right to point.
You have probably seen the hype around AI, you can't deny it. Well, as a son of an AI researcher I jumped right into learning it.
Since we've seen how you can cheat in exams now, I wanna focus on a totally other part of AI: defense and prevention.
This is not an official name, but basically these AIs focus on catching hackers and other bad actors on networks, filtering through your Google Drive for illegal content, etc.
So one question? If the technology is there, why aren't we using AI in game moderation? Well this is an ongoing question at this moment. You see detecting movement cheats etc. is not an easy task even with AI. So what did I do, you may ask?
Well, ever since the beggining, I had an idea of automating one part of moderation, that has been manual ever since: skins. So I jumped right into it.
You see I've created an AI capable of categorizing skins and capes (from all cape providers, such as Mojang, OptiFine, Labymod, etc.)
Now it's very primitive as I don't have that big of a dataset, but I'll jump to the results later.
So how did I do it?
Well you need a lot of examples to train your AI with, so I started with collecting many many skins in each category.
As the category names and the skins are actually inappropriate, I'll only show you the allowed directory this time :). (And also won't show you actual true inappropriate data in action, as you may search it up, and will get me in trouble on the forum)
After that you need to use some sort of AI library, because you don't want to write that on your own. Since I'll be implementing this to be an API, and since I have experience with Java I decided to go with DL4J.
With it, I basically created a Convolutional Neural Network, which job is to extract details about an image first, then pass it to a actual neural network which will learn what features correspond to which category. Now, please note I'm oversimplifying here, as it gets really technical.
Ok so it runs on my CPU, but OMG it runs sooo slow, minutes to train it! So I went through 2 hours of suffering getting CUDA to work, so the network is training on my GPU in parallel! With it I can train it in under 1 minute on avg.
Then after that you have the job of painstakingly adjusting little parameters until you get the model to perform decent.
So ok batchSize is 20, what is my accuracy 75%, ok then what will 21 do (just random numbers for an example, not actual behaviour!). You get the point
This is what I modified for hours long:
What is an epoch?
One complete pass through the training data. (Because in life if you see for ex. a formula once, you won't learn it)
Now with this there is a probablity it could over-fit, meaning it gets to know the training data soo much, it will only work with that. So here I have it setup, so after every epoch, a function will evaluate the network and stop it, if the epoch made it worse compared to the last one.
What is the batchSize?
The batch size defines the number of samples that will be propagated through the network. So very dumbly, with the real world example, it basically means it well see x amount of examples in one training session.
What is numOfExamples?
You see for a network to perform well you need a lot of data, and I mean ideally multiple thousands if not millions. Now you can image I don't have that many skins and capes on my computer. So I'm basically oversampling it, meaning the AI will see it multiple times not just once.
And a side note here. Every category must have around equal amount of pictures, otherwise one will be stronger. So behind the scenes the code is actually making sure the training data is balanced.
So does it work?
Yes!
However, at the moment the network is decent, but not good enough for my liking, which is mainly the reason behind my lacking dataset. Since I don't have much time downloading thousands of skins, and hence there aren't that many actually, I will not reach the actually recommended amount of data for training quickly.
My actual nemesis are capes, since they come in all sizes, 10x16 from Mojang, 20x32 from OptiFine. As you can see here's a nice false positive, courtesy to rubik_cube_man
So yeah, there's room for improvement, but note that false positives will always occur. It's basically impossible to get 100% acurracy.
However I may decide to implement the final product in a way, where it also learns during production use based on the feedback from the moderators, but that's just an idea.
So how does the model perform overall?
Well in my primited test data, the cape classifier classified the capes with 92% accuracy. (Broken down it means that if I would give it 100 players. It will categorize 92 of the properly, the other 8 falsely. ) The skin classifier performs around the same.
Can this be better? Of course, but I'll need much more data to run it on.
Can I use it?
Currently this project is closed source, as it's not close to final at all, and I won't even share the actual network configuration with you, but if we finish the project it will become open source on GitHub. Until then, please don't bombard me with requests and questions, as I won't really answer them. We also plan on introducing some NLP models, which will give us the power to categorize usernames as well! We also want to look into porting this to 100% Bedrock compatibility.
Disclaimer: I'm nowhere near an AI expert, so take these words with a grain of salt.
Hope you liked this thread, and gave you some insight into the world of AI classification models. I may do a follow up, when a notification system is in place, etc. But before I do so, I wanna make sure the model is as good as it gets.
Last edited: