arXiv insanity - Part 0 - Setup all the things!
Setup!
Introduction Link to heading
TBD
Set up the Compute Engine instance Link to heading
When setting up a virtual machine, I choose Ubuntu 22.04.2 LTS as the operative system.
First, I install a set of packages which are required by pyenv
for building Python:
sudo apt update
sudo apt --assume-yes install build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl \
libncursesw5-dev xz-utils tk-dev libxml2-dev \
libxmlsec1-dev libffi-dev liblzma-dev
Install pyenv
with the pyenv
installer:
curl https://pyenv.run | bash
Then, add the following lines to ~/.bashrc
:
# See https://github.com/pyenv/pyenv/issues/2417#issuecomment-1257017013
[[ -d $HOME/.local/bin && :$PATH: != *":$HOME/.local/bin:"* ]] && export PATH="$HOME/.local/bin:$PATH"
# pyenv
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
Finally, restart your shell for the changes to take effect.
For this project, I use pyenv
to install a recent version of Python 3.11:
pyenv install 3.11.4
pyenv global 3.11.4
On a small virtual machine, installing a new Python version will take up to one hour.
Clone repository Link to heading
# git clone git@github.com:filippo82/arxiv-insanity.git
git clone https://github.com/filippo82/arxiv-insanity.git
Tip
Make sure you are authorised to clone a repo from the current environment. You might need to follow these instructions.
Python environment Link to heading
TBD
cd arxiv-insanity
pyenv virtualenv arxiv
pyenv local arxiv
pip install -U -r requirements.txt
pip install -U -r requirements-dev.txt
pre-commit install
Warning
You might need to add the --no-cache-dir
flag when installing torch
.
ADD BOX
Add link to blog post with my opinionated Python setup.
Credentials Link to heading
- Google Cloud
- Kaggle
- Prefect Cloud
Google Cloud Link to heading
First install the Google Cloud CLI by following the official instructions. If already installed, you can update its components to the latest version:
sudo apt-get install apt-transport-https ca-certificates gnupg curl sudo
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update && sudo apt-get --assume-yes install google-cloud-cli
gcloud components update
Finally, you can authenticate yourself with gcloud
:
gcloud auth login
gcloud projects list
gcloud config set project algebraic-fin-232107
gcloud auth application-default login
Kaggle Link to heading
To use the Kaggle CLI, follow these instructions:
- go to your account on Kaggle;
- generate a new API token;
- move the downloaded
kaggle.json
file to~/.kaggle/kaggle.json
; - run
chmod 600 ~/.kaggle/kaggle.json
to ensure thatkaggle.json
has the right permissions.
Prefect Cloud Link to heading
TBD
prefect cloud login --key xxx_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX --workspace "your_handle/your_workspace"
ADD BOX
If you have not created a workspace yet, you will be asked to create one
when logging in for the first time with prefect cloud login
.
In your local environment, where you executed the login command above,
create a file named basic_flow.py
with the following contents:
from prefect import flow, get_run_logger
@flow(name="Testing")
def basic_flow():
logger = get_run_logger()
logger.warning("The fun is about to begin")
if __name__ == "__main__":
basic_flow()
Now run python basic_flow.py
.
Go to the dashboard for your workspace in Prefect Cloud.
You’ll see the flow run results in the Flow Runs
panel.