Three scales.
Huge
Huge would be a way to index all the subtitles of YouTube and make them indexable. This would be a massive undertaking. The video’s that do not have subtitles would need speech recognition software to get one.
Middle (current project)
A middle solution would be something I’m currently aiming at. It consist of the subtitles of a limited amount of communities that are a combination of data science and curating by hand. Only the video’s that contain subtitles would be indexed.
Small
Small would be to focus on one particular community. This would probably be a community that has already been mapped by external parties (such as the alt-right).