The incentive lies in the data monetisation.
Many owners of language data are unaware that their data (which was mostly created for a different purpose) has a value beyond its original purpose. When they offer this data for sale so that data consumers can use it to train AI models, this data gains a secondary use and creates added value.