Getting access to OSCAR
You can apply for an access request by sending us an email!
Carefully respect the following instructions, as incorrect submissions might significantly delay your access.
Do not create an account by yourselves, as it could delay you access by weeks! We will create an account for you.
Send us an email at contact at oscar-project.org, with OSCAR Access Request as the title, and the following (completed) as the body:
Please send your email using your institutional/academic address when possible. Otherwise, your access might be delayed/refused.
Access requests can take some days to be answered, sometimes more.
We post updates on our Discord server on exceptional delays, and you can always contact us there to inquire about yours.
After some time, you should get an email back from us with access instructions!
The following implies that you already have installed the Python datasets library
- Create an account on HuggingFace.
- Create a user access token.
- Open the OSCAR Team page.
- Open your corpus of choice. Instructions should be in the corpus page.
After all of this, you should be able to easily use OSCAR data with the
datasets library :
Using Git LFS
You can also get the raw data from HuggingFace using Git LFS.
The following steps assume you have git and git-lfs installed, and are on a UNIX system. The procedure should roughly be the same on Windows, but hasn’t been attempted.
This will download the Basque corpus from OSCAR 2109.