Refactor SEO automation into unified CLI application
Major refactoring to create a clean, integrated CLI application: ### New Features: - Unified CLI executable (./seo) with simple command structure - All commands accept optional CSV file arguments - Auto-detection of latest files when no arguments provided - Simplified output directory structure (output/ instead of output/reports/) - Cleaner export filename format (all_posts_YYYY-MM-DD.csv) ### Commands: - export: Export all posts from WordPress sites - analyze [csv]: Analyze posts with AI (optional CSV input) - recategorize [csv]: Recategorize posts with AI - seo_check: Check SEO quality - categories: Manage categories across sites - approve [files]: Review and approve recommendations - full_pipeline: Run complete workflow - analytics, gaps, opportunities, report, status ### Changes: - Moved all scripts to scripts/ directory - Created config.yaml for configuration - Updated all scripts to use output/ directory - Deprecated old seo-cli.py in favor of new ./seo - Added AGENTS.md and CHANGELOG.md documentation - Consolidated README.md with updated usage ### Technical: - Added PyYAML dependency - Removed hardcoded configuration values - All scripts now properly integrated - Better error handling and user feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
310
guides/PROJECT_GUIDE.md
Normal file
310
guides/PROJECT_GUIDE.md
Normal file
@@ -0,0 +1,310 @@
|
||||
# SEO Analysis & Improvement System - Project Guide
|
||||
|
||||
## 📋 Overview
|
||||
|
||||
A complete 4-phase SEO analysis pipeline that:
|
||||
1. **Integrates** Google Analytics, Search Console, and WordPress data
|
||||
2. **Identifies** high-potential keywords for optimization (positions 11-30)
|
||||
3. **Discovers** new content opportunities using AI
|
||||
4. **Generates** a comprehensive report with 90-day action plan
|
||||
|
||||
## 📂 Project Structure
|
||||
|
||||
```
|
||||
seo/
|
||||
├── input/ # SOURCE DATA (your exports)
|
||||
│ ├── new-propositions.csv # WordPress posts
|
||||
│ ├── README.md # How to export data
|
||||
│ └── analytics/
|
||||
│ ├── ga4_export.csv # Google Analytics
|
||||
│ └── gsc/
|
||||
│ ├── Pages.csv # GSC pages (required)
|
||||
│ ├── Requêtes.csv # GSC queries (optional)
|
||||
│ └── ...
|
||||
│
|
||||
├── output/ # RESULTS (auto-generated)
|
||||
│ ├── results/
|
||||
│ │ ├── seo_optimization_report.md # 📍 PRIMARY OUTPUT
|
||||
│ │ ├── posts_with_analytics.csv
|
||||
│ │ ├── posts_prioritized.csv
|
||||
│ │ ├── keyword_opportunities.csv
|
||||
│ │ └── content_gaps.csv
|
||||
│ │
|
||||
│ ├── logs/
|
||||
│ │ ├── import_log.txt
|
||||
│ │ ├── opportunity_analysis_log.txt
|
||||
│ │ └── content_gap_analysis_log.txt
|
||||
│ │
|
||||
│ └── README.md # Output guide
|
||||
│
|
||||
├── 🚀 run_analysis.sh # Run entire pipeline
|
||||
├── analytics_importer.py # Phase 1: Merge data
|
||||
├── opportunity_analyzer.py # Phase 2: Find wins
|
||||
├── content_gap_analyzer.py # Phase 3: Find gaps
|
||||
├── report_generator.py # Phase 4: Generate report
|
||||
├── config.py
|
||||
├── requirements.txt
|
||||
├── .env.example
|
||||
└── .gitignore
|
||||
```
|
||||
|
||||
## 🚀 Getting Started
|
||||
|
||||
### Step 1: Prepare Input Data
|
||||
|
||||
**Place WordPress posts CSV:**
|
||||
```
|
||||
input/new-propositions.csv
|
||||
```
|
||||
|
||||
**Export Google Analytics 4:**
|
||||
1. Go to: Analytics > Reports > Engagement > Pages and Screens
|
||||
2. Set date range: Last 90 days
|
||||
3. Download CSV → Save as: `input/analytics/ga4_export.csv`
|
||||
|
||||
**Export Google Search Console (Pages):**
|
||||
1. Go to: Performance
|
||||
2. Set date range: Last 90 days
|
||||
3. Export CSV → Save as: `input/analytics/gsc/Pages.csv`
|
||||
|
||||
### Step 2: Run Analysis
|
||||
|
||||
```bash
|
||||
# Run entire pipeline
|
||||
./run_analysis.sh
|
||||
|
||||
# OR run steps individually
|
||||
./venv/bin/python analytics_importer.py
|
||||
./venv/bin/python opportunity_analyzer.py
|
||||
./venv/bin/python content_gap_analyzer.py
|
||||
./venv/bin/python report_generator.py
|
||||
```
|
||||
|
||||
### Step 3: Review Report
|
||||
|
||||
Open: **`output/results/seo_optimization_report.md`**
|
||||
|
||||
Contains:
|
||||
- Executive summary with current metrics
|
||||
- Top 20 posts ranked by opportunity (with AI recommendations)
|
||||
- Keyword opportunities breakdown
|
||||
- Content gap analysis
|
||||
- 90-day phased action plan
|
||||
|
||||
## 📊 What Each Script Does
|
||||
|
||||
### `analytics_importer.py` (Phase 1)
|
||||
**Purpose:** Merge analytics data with WordPress posts
|
||||
|
||||
**Input:**
|
||||
- `input/new-propositions.csv` (WordPress posts)
|
||||
- `input/analytics/ga4_export.csv` (Google Analytics)
|
||||
- `input/analytics/gsc/Pages.csv` (Search Console)
|
||||
|
||||
**Output:**
|
||||
- `output/results/posts_with_analytics.csv` (enriched dataset)
|
||||
- `output/logs/import_log.txt` (matching report)
|
||||
|
||||
**Handles:** French and English column names, URL normalization, multi-source merging
|
||||
|
||||
### `opportunity_analyzer.py` (Phase 2)
|
||||
**Purpose:** Identify high-potential optimization opportunities
|
||||
|
||||
**Input:**
|
||||
- `output/results/posts_with_analytics.csv`
|
||||
|
||||
**Output:**
|
||||
- `output/results/keyword_opportunities.csv` (26 opportunities)
|
||||
- `output/logs/opportunity_analysis_log.txt`
|
||||
|
||||
**Features:**
|
||||
- Filters posts at positions 11-30 (page 2-3)
|
||||
- Calculates opportunity scores (0-100)
|
||||
- Generates AI recommendations for top 20 posts
|
||||
|
||||
### `content_gap_analyzer.py` (Phase 3)
|
||||
**Purpose:** Discover new content opportunities
|
||||
|
||||
**Input:**
|
||||
- `output/results/posts_with_analytics.csv`
|
||||
- `input/analytics/gsc/Requêtes.csv` (optional)
|
||||
|
||||
**Output:**
|
||||
- `output/results/content_gaps.csv`
|
||||
- `output/logs/content_gap_analysis_log.txt`
|
||||
|
||||
**Features:**
|
||||
- Topic cluster extraction
|
||||
- Gap identification
|
||||
- AI-powered content suggestions
|
||||
|
||||
### `report_generator.py` (Phase 4)
|
||||
**Purpose:** Create comprehensive report with action plan
|
||||
|
||||
**Input:**
|
||||
- All analysis results from phases 1-3
|
||||
|
||||
**Output:**
|
||||
- `output/results/seo_optimization_report.md` ← **PRIMARY DELIVERABLE**
|
||||
- `output/results/posts_prioritized.csv`
|
||||
|
||||
**Features:**
|
||||
- Comprehensive markdown report
|
||||
- All 262 posts ranked
|
||||
- 90-day action plan with estimated gains
|
||||
|
||||
## 📈 Understanding Your Report
|
||||
|
||||
### Key Metrics (Executive Summary)
|
||||
- **Total Posts:** All posts analyzed
|
||||
- **Monthly Traffic:** Current organic traffic
|
||||
- **Total Impressions:** Search visibility (90 days)
|
||||
- **Average Position:** Current ranking position
|
||||
- **Opportunities:** Posts ready to optimize
|
||||
|
||||
### Top 20 Posts to Optimize
|
||||
Each post shows:
|
||||
- **Title** (the post name)
|
||||
- **Current Position** (search ranking)
|
||||
- **Impressions** (search visibility)
|
||||
- **Traffic** (organic visits)
|
||||
- **Priority Score** (0-100 opportunity rating)
|
||||
- **Status** (page 1 vs page 2-3)
|
||||
- **Recommendations** (how to improve)
|
||||
|
||||
### Priority Scoring (0-100)
|
||||
Higher scores = more opportunity for gain with less effort
|
||||
|
||||
Calculated from:
|
||||
- **Position (35%)** - How close to page 1
|
||||
- **Traffic Potential (30%)** - Search impressions
|
||||
- **CTR Gap (20%)** - Improvement opportunity
|
||||
- **Content Quality (15%)** - Existing engagement
|
||||
|
||||
## 🎯 Action Plan
|
||||
|
||||
### Week 1-2: Quick Wins (+100 visits/month)
|
||||
- Focus on posts at positions 11-15
|
||||
- Update SEO titles and meta descriptions
|
||||
- 30-60 minutes per post
|
||||
|
||||
### Week 3-4: Core Optimization (+150 visits/month)
|
||||
- Posts 6-15 in priority list
|
||||
- Add content sections
|
||||
- Improve structure with headers
|
||||
- 2-3 hours per post
|
||||
|
||||
### Week 5-8: New Content (+300 visits/month)
|
||||
- Create 3-5 new posts from gap analysis
|
||||
- Target high-search-demand topics
|
||||
- 4-6 hours per post
|
||||
|
||||
### Week 9-12: Refinement (+100 visits/month)
|
||||
- Monitor ranking improvements
|
||||
- Refine underperforming optimizations
|
||||
- Prepare next round of analysis
|
||||
|
||||
**Total: +650 visits/month potential gain**
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
Edit `.env` to customize analysis:
|
||||
```bash
|
||||
# Position range for opportunities
|
||||
ANALYSIS_MIN_POSITION=11
|
||||
ANALYSIS_MAX_POSITION=30
|
||||
|
||||
# Minimum impressions to consider
|
||||
ANALYSIS_MIN_IMPRESSIONS=50
|
||||
|
||||
# Posts for AI recommendations
|
||||
ANALYSIS_TOP_N_POSTS=20
|
||||
```
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Missing Input Files
|
||||
```
|
||||
❌ Error: File not found: input/...
|
||||
```
|
||||
→ Check that all files are in the correct locations
|
||||
|
||||
### Empty Report Titles
|
||||
✓ FIXED - Now correctly loads post titles from multiple column names
|
||||
|
||||
### No Opportunities Found
|
||||
```
|
||||
⚠️ No opportunities found in specified range
|
||||
```
|
||||
→ Try lowering `ANALYSIS_MIN_IMPRESSIONS` in `.env`
|
||||
|
||||
### API Errors
|
||||
```
|
||||
❌ AI generation failed: ...
|
||||
```
|
||||
→ Check `OPENROUTER_API_KEY` in `.env` and account balance
|
||||
|
||||
## 📚 Additional Resources
|
||||
|
||||
- **`input/README.md`** - How to export analytics data
|
||||
- **`output/README.md`** - Output files guide
|
||||
- **`QUICKSTART_ANALYSIS.md`** - Step-by-step tutorial
|
||||
- **`ANALYSIS_SYSTEM.md`** - Technical documentation
|
||||
|
||||
## ✅ Success Checklist
|
||||
|
||||
- [ ] All input files placed in `input/` directory
|
||||
- [ ] `.env` file configured with API key
|
||||
- [ ] Ran `./run_analysis.sh` successfully
|
||||
- [ ] Reviewed `output/results/seo_optimization_report.md`
|
||||
- [ ] Identified 5-10 quick wins to start with
|
||||
- [ ] Created action plan for first week
|
||||
|
||||
## 🎓 Key Learnings
|
||||
|
||||
### Why Positions 11-30 Matter
|
||||
- **Page 1** posts are hard to move
|
||||
- **Page 2-3** posts are easy wins (small improvements move them up)
|
||||
- **Quick gains:** 1-2 position improvements = CTR increases 20-30%
|
||||
|
||||
### CTR Expectations by Position
|
||||
- Position 1: ~30% CTR
|
||||
- Position 5-10: 4-7% CTR
|
||||
- Position 11-15: 1-2% CTR (quick wins)
|
||||
- Position 16-20: 0.8-1% CTR
|
||||
- Position 21-30: ~0.5% CTR
|
||||
|
||||
### Content Quality Signals
|
||||
- Higher bounce rate = less relevant content
|
||||
- Low traffic = poor CTR or position
|
||||
- Low impressions = insufficient optimization
|
||||
|
||||
## 📞 Support
|
||||
|
||||
### Check Logs First
|
||||
```
|
||||
output/logs/import_log.txt
|
||||
output/logs/opportunity_analysis_log.txt
|
||||
output/logs/content_gap_analysis_log.txt
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
1. **Empty titles** → Fixed with flexible column name mapping
|
||||
2. **File not found** → Check file locations match structure
|
||||
3. **API errors** → Verify API key and account balance
|
||||
4. **No opportunities** → Lower minimum impressions threshold
|
||||
|
||||
## 🚀 Ready to Optimize?
|
||||
|
||||
1. Prepare your input data
|
||||
2. Run `./run_analysis.sh`
|
||||
3. Open the report
|
||||
4. Start with quick wins
|
||||
5. Track improvements in 4 weeks
|
||||
|
||||
Good luck boosting your SEO! 📈
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** February 2026
|
||||
**System Status:** Production Ready ✅
|
||||
Reference in New Issue
Block a user