Skip to content

Tools Setup

Terminal window
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
# Add to PATH
export PATH="$HOME/.cargo/bin:$PATH"
Terminal window
# Create virtual environment
uv venv
# Activate environment
source .venv/bin/activate
# Install dependencies
uv pip install -r requirements.txt

This guide will help you set up your development environment for the 5-Hour Data Engineering Boot Camp.

  1. Download Python 3.8 or higher from python.org
  2. During installation, make sure to check “Add Python to PATH”
  3. Verify installation:
Terminal window
python --version
pip --version

We recommend Visual Studio Code:

  1. Download from code.visualstudio.com
  2. Install recommended extensions:
    • Python
    • Pylance
    • GitLens
    • SQLTools
  1. Download Git from git-scm.com
  2. Verify installation:
Terminal window
git --version
Terminal window
# Create a new directory for your project
mkdir data-engineering-bootcamp
cd data-engineering-bootcamp
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Create a requirements.txt file with the following content:

pandas>=1.5.0
numpy>=1.21.0
sqlalchemy>=1.4.0
pytest>=7.0.0
apache-airflow>=2.5.0
pydantic>=2.0.0
requests>=2.28.0

Install the packages:

Terminal window
pip install -r requirements.txt
Terminal window
# Initialize Git repository
git init
# Create .gitignore file
echo "venv/
__pycache__/
*.pyc
.env
.DS_Store" > .gitignore
# Make initial commit
git add .
git commit -m "Initial project setup"

Create the following directory structure:

data-engineering-bootcamp/
├── src/
│ ├── extractors/
│ ├── transformers/
│ ├── loaders/
│ └── quality/
├── tests/
├── config/
└── data/
├── raw/
└── processed/

For the boot camp, we’ll use SQLite for simplicity:

Terminal window
# Create a data directory
mkdir -p data/raw data/processed

Create a .env file for configuration:

DATABASE_URL=sqlite:///data/processed/database.db
API_KEY=your_api_key_here

Create a simple test script test_setup.py:

import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import os
def test_environment():
# Test pandas
df = pd.DataFrame({'test': [1, 2, 3]})
assert len(df) == 3
# Test numpy
arr = np.array([1, 2, 3])
assert arr.sum() == 6
# Test SQLAlchemy
engine = create_engine('sqlite:///data/processed/test.db')
df.to_sql('test', engine, if_exists='replace')
print("All tests passed! Your environment is ready.")
if __name__ == "__main__":
test_environment()

Run the test:

Terminal window
python test_setup.py

Now that your environment is set up, you can:

  1. Review the Prerequisites if needed
  2. Start with Data Engineering Fundamentals
  3. Check out Additional Resources for more learning materials