Files
seo/guides/PROJECT_GUIDE.md
Kevin Bataille 8c7cd24685 Refactor SEO automation into unified CLI application
Major refactoring to create a clean, integrated CLI application:

### New Features:
- Unified CLI executable (./seo) with simple command structure
- All commands accept optional CSV file arguments
- Auto-detection of latest files when no arguments provided
- Simplified output directory structure (output/ instead of output/reports/)
- Cleaner export filename format (all_posts_YYYY-MM-DD.csv)

### Commands:
- export: Export all posts from WordPress sites
- analyze [csv]: Analyze posts with AI (optional CSV input)
- recategorize [csv]: Recategorize posts with AI
- seo_check: Check SEO quality
- categories: Manage categories across sites
- approve [files]: Review and approve recommendations
- full_pipeline: Run complete workflow
- analytics, gaps, opportunities, report, status

### Changes:
- Moved all scripts to scripts/ directory
- Created config.yaml for configuration
- Updated all scripts to use output/ directory
- Deprecated old seo-cli.py in favor of new ./seo
- Added AGENTS.md and CHANGELOG.md documentation
- Consolidated README.md with updated usage

### Technical:
- Added PyYAML dependency
- Removed hardcoded configuration values
- All scripts now properly integrated
- Better error handling and user feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 14:24:44 +01:00

8.6 KiB

SEO Analysis & Improvement System - Project Guide

📋 Overview

A complete 4-phase SEO analysis pipeline that:

  1. Integrates Google Analytics, Search Console, and WordPress data
  2. Identifies high-potential keywords for optimization (positions 11-30)
  3. Discovers new content opportunities using AI
  4. Generates a comprehensive report with 90-day action plan

📂 Project Structure

seo/
├── input/                              # SOURCE DATA (your exports)
│   ├── new-propositions.csv           # WordPress posts
│   ├── README.md                      # How to export data
│   └── analytics/
│       ├── ga4_export.csv             # Google Analytics
│       └── gsc/
│           ├── Pages.csv              # GSC pages (required)
│           ├── Requêtes.csv           # GSC queries (optional)
│           └── ...
│
├── output/                             # RESULTS (auto-generated)
│   ├── results/
│   │   ├── seo_optimization_report.md  # 📍 PRIMARY OUTPUT
│   │   ├── posts_with_analytics.csv
│   │   ├── posts_prioritized.csv
│   │   ├── keyword_opportunities.csv
│   │   └── content_gaps.csv
│   │
│   ├── logs/
│   │   ├── import_log.txt
│   │   ├── opportunity_analysis_log.txt
│   │   └── content_gap_analysis_log.txt
│   │
│   └── README.md                      # Output guide
│
├── 🚀 run_analysis.sh                 # Run entire pipeline
├── analytics_importer.py              # Phase 1: Merge data
├── opportunity_analyzer.py            # Phase 2: Find wins
├── content_gap_analyzer.py            # Phase 3: Find gaps
├── report_generator.py                # Phase 4: Generate report
├── config.py
├── requirements.txt
├── .env.example
└── .gitignore

🚀 Getting Started

Step 1: Prepare Input Data

Place WordPress posts CSV:

input/new-propositions.csv

Export Google Analytics 4:

  1. Go to: Analytics > Reports > Engagement > Pages and Screens
  2. Set date range: Last 90 days
  3. Download CSV → Save as: input/analytics/ga4_export.csv

Export Google Search Console (Pages):

  1. Go to: Performance
  2. Set date range: Last 90 days
  3. Export CSV → Save as: input/analytics/gsc/Pages.csv

Step 2: Run Analysis

# Run entire pipeline
./run_analysis.sh

# OR run steps individually
./venv/bin/python analytics_importer.py
./venv/bin/python opportunity_analyzer.py
./venv/bin/python content_gap_analyzer.py
./venv/bin/python report_generator.py

Step 3: Review Report

Open: output/results/seo_optimization_report.md

Contains:

  • Executive summary with current metrics
  • Top 20 posts ranked by opportunity (with AI recommendations)
  • Keyword opportunities breakdown
  • Content gap analysis
  • 90-day phased action plan

📊 What Each Script Does

analytics_importer.py (Phase 1)

Purpose: Merge analytics data with WordPress posts

Input:

  • input/new-propositions.csv (WordPress posts)
  • input/analytics/ga4_export.csv (Google Analytics)
  • input/analytics/gsc/Pages.csv (Search Console)

Output:

  • output/results/posts_with_analytics.csv (enriched dataset)
  • output/logs/import_log.txt (matching report)

Handles: French and English column names, URL normalization, multi-source merging

opportunity_analyzer.py (Phase 2)

Purpose: Identify high-potential optimization opportunities

Input:

  • output/results/posts_with_analytics.csv

Output:

  • output/results/keyword_opportunities.csv (26 opportunities)
  • output/logs/opportunity_analysis_log.txt

Features:

  • Filters posts at positions 11-30 (page 2-3)
  • Calculates opportunity scores (0-100)
  • Generates AI recommendations for top 20 posts

content_gap_analyzer.py (Phase 3)

Purpose: Discover new content opportunities

Input:

  • output/results/posts_with_analytics.csv
  • input/analytics/gsc/Requêtes.csv (optional)

Output:

  • output/results/content_gaps.csv
  • output/logs/content_gap_analysis_log.txt

Features:

  • Topic cluster extraction
  • Gap identification
  • AI-powered content suggestions

report_generator.py (Phase 4)

Purpose: Create comprehensive report with action plan

Input:

  • All analysis results from phases 1-3

Output:

  • output/results/seo_optimization_report.mdPRIMARY DELIVERABLE
  • output/results/posts_prioritized.csv

Features:

  • Comprehensive markdown report
  • All 262 posts ranked
  • 90-day action plan with estimated gains

📈 Understanding Your Report

Key Metrics (Executive Summary)

  • Total Posts: All posts analyzed
  • Monthly Traffic: Current organic traffic
  • Total Impressions: Search visibility (90 days)
  • Average Position: Current ranking position
  • Opportunities: Posts ready to optimize

Top 20 Posts to Optimize

Each post shows:

  • Title (the post name)
  • Current Position (search ranking)
  • Impressions (search visibility)
  • Traffic (organic visits)
  • Priority Score (0-100 opportunity rating)
  • Status (page 1 vs page 2-3)
  • Recommendations (how to improve)

Priority Scoring (0-100)

Higher scores = more opportunity for gain with less effort

Calculated from:

  • Position (35%) - How close to page 1
  • Traffic Potential (30%) - Search impressions
  • CTR Gap (20%) - Improvement opportunity
  • Content Quality (15%) - Existing engagement

🎯 Action Plan

Week 1-2: Quick Wins (+100 visits/month)

  • Focus on posts at positions 11-15
  • Update SEO titles and meta descriptions
  • 30-60 minutes per post

Week 3-4: Core Optimization (+150 visits/month)

  • Posts 6-15 in priority list
  • Add content sections
  • Improve structure with headers
  • 2-3 hours per post

Week 5-8: New Content (+300 visits/month)

  • Create 3-5 new posts from gap analysis
  • Target high-search-demand topics
  • 4-6 hours per post

Week 9-12: Refinement (+100 visits/month)

  • Monitor ranking improvements
  • Refine underperforming optimizations
  • Prepare next round of analysis

Total: +650 visits/month potential gain

🔧 Configuration

Edit .env to customize analysis:

# Position range for opportunities
ANALYSIS_MIN_POSITION=11
ANALYSIS_MAX_POSITION=30

# Minimum impressions to consider
ANALYSIS_MIN_IMPRESSIONS=50

# Posts for AI recommendations
ANALYSIS_TOP_N_POSTS=20

🐛 Troubleshooting

Missing Input Files

❌ Error: File not found: input/...

→ Check that all files are in the correct locations

Empty Report Titles

✓ FIXED - Now correctly loads post titles from multiple column names

No Opportunities Found

⚠️  No opportunities found in specified range

→ Try lowering ANALYSIS_MIN_IMPRESSIONS in .env

API Errors

❌ AI generation failed: ...

→ Check OPENROUTER_API_KEY in .env and account balance

📚 Additional Resources

  • input/README.md - How to export analytics data
  • output/README.md - Output files guide
  • QUICKSTART_ANALYSIS.md - Step-by-step tutorial
  • ANALYSIS_SYSTEM.md - Technical documentation

Success Checklist

  • All input files placed in input/ directory
  • .env file configured with API key
  • Ran ./run_analysis.sh successfully
  • Reviewed output/results/seo_optimization_report.md
  • Identified 5-10 quick wins to start with
  • Created action plan for first week

🎓 Key Learnings

Why Positions 11-30 Matter

  • Page 1 posts are hard to move
  • Page 2-3 posts are easy wins (small improvements move them up)
  • Quick gains: 1-2 position improvements = CTR increases 20-30%

CTR Expectations by Position

  • Position 1: ~30% CTR
  • Position 5-10: 4-7% CTR
  • Position 11-15: 1-2% CTR (quick wins)
  • Position 16-20: 0.8-1% CTR
  • Position 21-30: ~0.5% CTR

Content Quality Signals

  • Higher bounce rate = less relevant content
  • Low traffic = poor CTR or position
  • Low impressions = insufficient optimization

📞 Support

Check Logs First

output/logs/import_log.txt
output/logs/opportunity_analysis_log.txt
output/logs/content_gap_analysis_log.txt

Common Issues

  1. Empty titles → Fixed with flexible column name mapping
  2. File not found → Check file locations match structure
  3. API errors → Verify API key and account balance
  4. No opportunities → Lower minimum impressions threshold

🚀 Ready to Optimize?

  1. Prepare your input data
  2. Run ./run_analysis.sh
  3. Open the report
  4. Start with quick wins
  5. Track improvements in 4 weeks

Good luck boosting your SEO! 📈


Last Updated: February 2026 System Status: Production Ready