Add post migration and author filter features
- Add migrate command to transfer posts between websites - Support CSV-based and filtered migration modes - Preserve original post dates (with --ignore-original-date option) - Auto-create categories and tags on destination site - Add author filtering to export (--author and --author-id flags) - Include author_name column in exported CSV - Add comprehensive documentation (MIGRATION_GUIDE.md, AUTHOR_FILTER_GUIDE.md) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
226
AUTHOR_FILTER_GUIDE.md
Normal file
226
AUTHOR_FILTER_GUIDE.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Author Filter Guide
|
||||
|
||||
Export posts from specific authors using the enhanced export functionality.
|
||||
|
||||
## Overview
|
||||
|
||||
The export command now supports filtering posts by author name or author ID, making it easy to:
|
||||
- Export posts from a specific author across all sites
|
||||
- Combine author filtering with site filtering
|
||||
- Export posts from multiple authors at once
|
||||
|
||||
## Usage
|
||||
|
||||
### Filter by Author Name
|
||||
|
||||
Export posts from a specific author (case-insensitive, partial match):
|
||||
|
||||
```bash
|
||||
# Export posts by "John Doe"
|
||||
./seo export --author "John Doe"
|
||||
|
||||
# Export posts by "admin" (partial match)
|
||||
./seo export --author admin
|
||||
|
||||
# Export posts from multiple authors
|
||||
./seo export --author "John Doe" "Jane Smith"
|
||||
```
|
||||
|
||||
### Filter by Author ID
|
||||
|
||||
Export posts from specific author IDs:
|
||||
|
||||
```bash
|
||||
# Export posts by author ID 1
|
||||
./seo export --author-id 1
|
||||
|
||||
# Export posts from multiple author IDs
|
||||
./seo export --author-id 1 2 3
|
||||
```
|
||||
|
||||
### Combine with Site Filter
|
||||
|
||||
Export posts from a specific author on a specific site:
|
||||
|
||||
```bash
|
||||
# Export John's posts from mistergeek.net only
|
||||
./seo export --author "John Doe" --site mistergeek.net
|
||||
|
||||
# Export posts by author ID 1 from webscroll.fr
|
||||
./seo export --author-id 1 -s webscroll.fr
|
||||
```
|
||||
|
||||
### Dry Run Mode
|
||||
|
||||
Preview what would be exported:
|
||||
|
||||
```bash
|
||||
./seo export --author "John Doe" --dry-run
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Author Name Matching**
|
||||
- Case-insensitive matching
|
||||
- Partial matches work (e.g., "john" matches "John Doe")
|
||||
- Matches against author's display name and slug
|
||||
|
||||
2. **Author ID Matching**
|
||||
- Exact match on WordPress user ID
|
||||
- More reliable than name matching
|
||||
- Useful when authors have similar names
|
||||
|
||||
3. **Author Information**
|
||||
- The exporter fetches all authors from each site
|
||||
- Author names are included in the exported CSV
|
||||
- Posts are filtered before export
|
||||
|
||||
## Export Output
|
||||
|
||||
The exported CSV includes author information:
|
||||
|
||||
```csv
|
||||
site,post_id,status,title,slug,url,author_id,author_name,date_published,...
|
||||
mistergeek.net,123,publish,"VPN Guide",vpn-guide,https://...,1,John Doe,2024-01-15,...
|
||||
```
|
||||
|
||||
### New Column: `author_name`
|
||||
|
||||
The export now includes the author's display name in addition to the author ID.
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Export All Posts by Admin
|
||||
|
||||
```bash
|
||||
./seo export --author admin
|
||||
```
|
||||
|
||||
Output: `output/all_posts_YYYY-MM-DD.csv`
|
||||
|
||||
### Example 2: Export Specific Author from Specific Site
|
||||
|
||||
```bash
|
||||
./seo export --author "Marie" --site webscroll.fr
|
||||
```
|
||||
|
||||
### Example 3: Export Multiple Authors
|
||||
|
||||
```bash
|
||||
./seo export --author "John" "Marie" "Admin"
|
||||
```
|
||||
|
||||
### Example 4: Export by Author ID
|
||||
|
||||
```bash
|
||||
./seo export --author-id 5
|
||||
```
|
||||
|
||||
### Example 5: Combine Author and Site Filters
|
||||
|
||||
```bash
|
||||
./seo export --author "John" --site mistergeek.net --verbose
|
||||
```
|
||||
|
||||
## Finding Author IDs
|
||||
|
||||
If you don't know the author ID, you can:
|
||||
|
||||
1. **Export all posts and check the CSV:**
|
||||
```bash
|
||||
./seo export
|
||||
# Then open the CSV and check the author_id column
|
||||
```
|
||||
|
||||
2. **Use WordPress Admin:**
|
||||
- Go to Users → All Users
|
||||
- Hover over a user name
|
||||
- The URL shows the user ID (e.g., `user_id=5`)
|
||||
|
||||
3. **Use WordPress REST API directly:**
|
||||
```bash
|
||||
curl -u username:password https://yoursite.com/wp-json/wp/v2/users
|
||||
```
|
||||
|
||||
## Tips
|
||||
|
||||
1. **Use quotes for names with spaces:**
|
||||
```bash
|
||||
./seo export --author "John Doe" # ✓ Correct
|
||||
./seo export --author John Doe # ✗ Wrong (treated as 2 authors)
|
||||
```
|
||||
|
||||
2. **Partial matching is your friend:**
|
||||
```bash
|
||||
./seo export --author "john" # Matches "John Doe", "Johnny", etc.
|
||||
```
|
||||
|
||||
3. **Combine with migration:**
|
||||
```bash
|
||||
# Export author's posts, then migrate to another site
|
||||
./seo export --author "John Doe" --site webscroll.fr
|
||||
./seo migrate output/all_posts_*.csv --destination mistergeek.net
|
||||
```
|
||||
|
||||
4. **Verbose mode for debugging:**
|
||||
```bash
|
||||
./seo export --author "John" --verbose
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No posts exported
|
||||
|
||||
**Possible causes:**
|
||||
- Author name doesn't match (try different spelling)
|
||||
- Author has no posts
|
||||
- Author doesn't exist on that site
|
||||
|
||||
**Solutions:**
|
||||
- Use `--verbose` to see what's happening
|
||||
- Try author ID instead of name
|
||||
- Check if author exists on the site
|
||||
|
||||
### Author names not showing in CSV
|
||||
|
||||
**Possible causes:**
|
||||
- WordPress REST API doesn't allow user enumeration
|
||||
- Authentication issue
|
||||
|
||||
**Solutions:**
|
||||
- Check WordPress user permissions
|
||||
- Verify credentials in config
|
||||
- Author ID will still be present even if name lookup fails
|
||||
|
||||
## API Usage
|
||||
|
||||
Use author filtering programmatically:
|
||||
|
||||
```python
|
||||
from seo.app import SEOApp
|
||||
|
||||
app = SEOApp()
|
||||
|
||||
# Export by author name
|
||||
csv_file = app.export(author_filter=["John Doe"])
|
||||
|
||||
# Export by author ID
|
||||
csv_file = app.export(author_ids=[1, 2])
|
||||
|
||||
# Export by author and site
|
||||
csv_file = app.export(
|
||||
author_filter=["John"],
|
||||
site_filter="mistergeek.net"
|
||||
)
|
||||
```
|
||||
|
||||
## Related Commands
|
||||
|
||||
- `seo migrate` - Migrate exported posts to another site
|
||||
- `seo analyze` - Analyze exported posts with AI
|
||||
- `seo export --help` - Show all export options
|
||||
|
||||
## See Also
|
||||
|
||||
- [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) - Post migration guide
|
||||
- [README.md](README.md) - Main documentation
|
||||
269
MIGRATION_GUIDE.md
Normal file
269
MIGRATION_GUIDE.md
Normal file
@@ -0,0 +1,269 @@
|
||||
# Post Migration Guide
|
||||
|
||||
This guide explains how to migrate posts between WordPress sites using the SEO automation tool.
|
||||
|
||||
## Overview
|
||||
|
||||
The migration feature allows you to move posts from one WordPress site to another while preserving:
|
||||
- Post content (title, body, excerpt)
|
||||
- Categories (automatically created if they don't exist)
|
||||
- Tags (automatically created if they don't exist)
|
||||
- SEO metadata (RankMath, Yoast SEO)
|
||||
- Post slug
|
||||
|
||||
## Migration Modes
|
||||
|
||||
There are two ways to migrate posts:
|
||||
|
||||
### 1. CSV-Based Migration
|
||||
|
||||
Migrate specific posts listed in a CSV file.
|
||||
|
||||
**Requirements:**
|
||||
- CSV file with at least two columns: `site` and `post_id`
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Basic migration (posts deleted from source after migration)
|
||||
./seo migrate posts_to_migrate.csv --destination mistergeek.net
|
||||
|
||||
# Keep posts on source site
|
||||
./seo migrate posts_to_migrate.csv --destination mistergeek.net --keep-source
|
||||
|
||||
# Publish immediately instead of draft
|
||||
./seo migrate posts_to_migrate.csv --destination mistergeek.net --post-status publish
|
||||
|
||||
# Custom output file for migration report
|
||||
./seo migrate posts_to_migrate.csv --destination mistergeek.net --output custom_report.csv
|
||||
```
|
||||
|
||||
### 2. Filtered Migration
|
||||
|
||||
Migrate posts based on filters (category, date, etc.).
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Migrate all posts from source to destination
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net
|
||||
|
||||
# Migrate posts from specific categories
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN "Torrent Clients"
|
||||
|
||||
# Migrate posts with specific tags
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net --tag-filter "guide" "tutorial"
|
||||
|
||||
# Migrate posts by date range
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net --date-after 2024-01-01 --date-before 2024-12-31
|
||||
|
||||
# Limit number of posts
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net --limit 10
|
||||
|
||||
# Combine filters
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||
--category-filter VPN \
|
||||
--date-after 2024-01-01 \
|
||||
--limit 5 \
|
||||
--keep-source
|
||||
```
|
||||
|
||||
## Command Options
|
||||
|
||||
### Required Options
|
||||
|
||||
- `--destination`, `--to`: Destination site (mistergeek.net, webscroll.fr, hellogeek.net)
|
||||
- `--source`, `--from`: Source site (for filtered migration only)
|
||||
- CSV file: Path to CSV with posts (for CSV-based migration)
|
||||
|
||||
### Optional Options
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--keep-source` | Keep posts on source site after migration | Delete after migration |
|
||||
| `--post-status` | Status for migrated posts (draft, publish, pending) | draft |
|
||||
| `--no-categories` | Don't create categories automatically | Create categories |
|
||||
| `--no-tags` | Don't create tags automatically | Create tags |
|
||||
| `--category-filter` | Filter by category names (filtered migration) | All categories |
|
||||
| `--tag-filter` | Filter by tag names (filtered migration) | All tags |
|
||||
| `--date-after` | Migrate posts after this date (YYYY-MM-DD) | No limit |
|
||||
| `--date-before` | Migrate posts before this date (YYYY-MM-DD) | No limit |
|
||||
| `--limit` | Maximum number of posts to migrate | No limit |
|
||||
| `--output`, `-o` | Custom output file for migration report | Auto-generated |
|
||||
| `--dry-run` | Preview what would be done without doing it | Execute |
|
||||
| `--verbose`, `-v` | Enable verbose logging | Normal logging |
|
||||
|
||||
## Migration Process
|
||||
|
||||
### What Gets Migrated
|
||||
|
||||
1. **Post Content**
|
||||
- Title
|
||||
- Body content (HTML preserved)
|
||||
- Excerpt
|
||||
- Slug
|
||||
|
||||
2. **Categories**
|
||||
- Mapped from source to destination
|
||||
- Created automatically if they don't exist on destination
|
||||
- Hierarchical structure preserved (parent-child relationships)
|
||||
|
||||
3. **Tags**
|
||||
- Mapped from source to destination
|
||||
- Created automatically if they don't exist on destination
|
||||
|
||||
4. **SEO Metadata**
|
||||
- RankMath title and description
|
||||
- Yoast SEO title and description
|
||||
- Focus keywords
|
||||
|
||||
### What Doesn't Get Migrated
|
||||
|
||||
- Featured images (must be re-uploaded manually)
|
||||
- Post author (uses destination site's default)
|
||||
- Comments (not transferred)
|
||||
- Custom fields (except SEO metadata)
|
||||
- Post revisions
|
||||
|
||||
## Migration Report
|
||||
|
||||
After migration, a CSV report is generated in `output/` with the following information:
|
||||
|
||||
```csv
|
||||
source_site,source_post_id,destination_site,destination_post_id,title,status,categories_migrated,tags_migrated,deleted_from_source
|
||||
webscroll.fr,123,mistergeek.net,456,"VPN Guide",draft,3,5,True
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Migrate Specific Posts from CSV
|
||||
|
||||
1. Create a CSV file with posts to migrate:
|
||||
```csv
|
||||
site,post_id,title
|
||||
webscroll.fr,123,VPN Guide
|
||||
webscroll.fr,456,Torrent Tutorial
|
||||
```
|
||||
|
||||
2. Run migration:
|
||||
```bash
|
||||
./seo migrate my_posts.csv --destination mistergeek.net
|
||||
```
|
||||
|
||||
### Example 2: Migrate All VPN Content
|
||||
|
||||
```bash
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||
--category-filter VPN "VPN Reviews" \
|
||||
--post-status draft \
|
||||
--keep-source
|
||||
```
|
||||
|
||||
### Example 3: Migrate Recent Content
|
||||
|
||||
```bash
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||
--date-after 2024-06-01 \
|
||||
--limit 20
|
||||
```
|
||||
|
||||
### Example 4: Preview Migration
|
||||
|
||||
```bash
|
||||
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||
--category-filter VPN \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Before Migration
|
||||
|
||||
1. **Backup both sites** - Always backup before bulk operations
|
||||
2. **Test with a few posts** - Migrate 1-2 posts first to verify
|
||||
3. **Check category structure** - Review destination site's categories
|
||||
4. **Plan URL redirects** - If deleting from source, set up redirects
|
||||
|
||||
### During Migration
|
||||
|
||||
1. **Use dry-run first** - Preview what will be migrated
|
||||
2. **Start with drafts** - Review before publishing
|
||||
3. **Monitor logs** - Watch for errors or warnings
|
||||
4. **Limit batch size** - Migrate in batches of 10-20 posts
|
||||
|
||||
### After Migration
|
||||
|
||||
1. **Review migrated posts** - Check formatting and categories
|
||||
2. **Add featured images** - Manually upload if needed
|
||||
3. **Set up redirects** - From old URLs to new URLs
|
||||
4. **Update internal links** - Fix cross-site links
|
||||
5. **Monitor SEO** - Track rankings after migration
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**1. "Site not found" error**
|
||||
- Check site name is correct (mistergeek.net, webscroll.fr, hellogeek.net)
|
||||
- Verify credentials in config.yaml or .env
|
||||
|
||||
**2. "Category already exists" warning**
|
||||
- This is normal - the migrator found a matching category
|
||||
- The existing category will be used
|
||||
|
||||
**3. "Failed to create post" error**
|
||||
- Check WordPress REST API is enabled
|
||||
- Verify user has post creation permissions
|
||||
- Check authentication credentials
|
||||
|
||||
**4. Posts missing featured images**
|
||||
- Featured images are not migrated automatically
|
||||
- Upload images manually to destination site
|
||||
- Update featured image on migrated posts
|
||||
|
||||
**5. Categories not matching**
|
||||
- Categories are matched by name (case-insensitive)
|
||||
- "VPN" and "vpn" will match
|
||||
- "VPN Guide" and "VPN" will NOT match - new category created
|
||||
|
||||
## API Usage
|
||||
|
||||
You can also use the migration feature programmatically:
|
||||
|
||||
```python
|
||||
from seo.app import SEOApp
|
||||
|
||||
app = SEOApp()
|
||||
|
||||
# CSV-based migration
|
||||
app.migrate(
|
||||
csv_file='output/posts_to_migrate.csv',
|
||||
destination_site='mistergeek.net',
|
||||
create_categories=True,
|
||||
create_tags=True,
|
||||
delete_after=False,
|
||||
status='draft'
|
||||
)
|
||||
|
||||
# Filtered migration
|
||||
app.migrate_by_filter(
|
||||
source_site='webscroll.fr',
|
||||
destination_site='mistergeek.net',
|
||||
category_filter=['VPN', 'Software'],
|
||||
date_after='2024-01-01',
|
||||
limit=10,
|
||||
create_categories=True,
|
||||
delete_after=False,
|
||||
status='draft'
|
||||
)
|
||||
```
|
||||
|
||||
## Related Commands
|
||||
|
||||
- `seo export` - Export posts from all sites
|
||||
- `seo editorial_strategy` - Analyze and get migration recommendations
|
||||
- `seo category_propose` - Get AI category recommendations
|
||||
|
||||
## See Also
|
||||
|
||||
- [README.md](README.md) - Main documentation
|
||||
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
|
||||
- [CATEGORY_MANAGEMENT_GUIDE.md](CATEGORY_MANAGEMENT_GUIDE.md) - Category management
|
||||
108
src/seo/app.py
108
src/seo/app.py
@@ -12,6 +12,7 @@ from .analyzer import EnhancedPostAnalyzer
|
||||
from .category_proposer import CategoryProposer
|
||||
from .category_manager import WordPressCategoryManager, CategoryAssignmentProcessor
|
||||
from .editorial_strategy import EditorialStrategyAnalyzer
|
||||
from .post_migrator import WordPressPostMigrator
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -34,11 +35,23 @@ class SEOApp:
|
||||
else:
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
def export(self) -> str:
|
||||
"""Export all posts from WordPress sites."""
|
||||
def export(self, author_filter: Optional[List[str]] = None,
|
||||
author_ids: Optional[List[int]] = None,
|
||||
site_filter: Optional[str] = None) -> str:
|
||||
"""
|
||||
Export all posts from WordPress sites.
|
||||
|
||||
Args:
|
||||
author_filter: List of author names to filter by
|
||||
author_ids: List of author IDs to filter by
|
||||
site_filter: Export from specific site only
|
||||
|
||||
Returns:
|
||||
Path to exported CSV file
|
||||
"""
|
||||
logger.info("📦 Exporting all posts from WordPress sites...")
|
||||
exporter = PostExporter()
|
||||
return exporter.run()
|
||||
exporter = PostExporter(author_filter=author_filter, author_ids=author_ids)
|
||||
return exporter.run(site_filter=site_filter)
|
||||
|
||||
def analyze(self, csv_file: Optional[str] = None, fields: Optional[List[str]] = None,
|
||||
update: bool = False, output: Optional[str] = None) -> str:
|
||||
@@ -164,6 +177,93 @@ class SEOApp:
|
||||
analyzer = EditorialStrategyAnalyzer()
|
||||
return analyzer.run(csv_file)
|
||||
|
||||
def migrate(self, csv_file: str, destination_site: str,
|
||||
create_categories: bool = True, create_tags: bool = True,
|
||||
delete_after: bool = False, status: str = 'draft',
|
||||
output_file: Optional[str] = None,
|
||||
ignore_original_date: bool = False) -> str:
|
||||
"""
|
||||
Migrate posts from CSV file to destination site.
|
||||
|
||||
Args:
|
||||
csv_file: Path to CSV file with posts to migrate (must have 'site' and 'post_id' columns)
|
||||
destination_site: Destination site name (mistergeek.net, webscroll.fr, hellogeek.net)
|
||||
create_categories: If True, create categories if they don't exist
|
||||
create_tags: If True, create tags if they don't exist
|
||||
delete_after: If True, delete posts from source after migration
|
||||
status: Status for new posts ('draft', 'publish', 'pending')
|
||||
output_file: Custom output file path for migration report
|
||||
ignore_original_date: If True, use current date instead of original post date
|
||||
|
||||
Returns:
|
||||
Path to migration report CSV
|
||||
"""
|
||||
logger.info(f"🚀 Migrating posts to {destination_site}...")
|
||||
|
||||
migrator = WordPressPostMigrator()
|
||||
return migrator.migrate_posts_from_csv(
|
||||
csv_file=csv_file,
|
||||
destination_site=destination_site,
|
||||
create_categories=create_categories,
|
||||
create_tags=create_tags,
|
||||
delete_after=delete_after,
|
||||
status=status,
|
||||
output_file=output_file,
|
||||
ignore_original_date=ignore_original_date
|
||||
)
|
||||
|
||||
def migrate_by_filter(self, source_site: str, destination_site: str,
|
||||
category_filter: Optional[List[str]] = None,
|
||||
tag_filter: Optional[List[str]] = None,
|
||||
date_after: Optional[str] = None,
|
||||
date_before: Optional[str] = None,
|
||||
status_filter: Optional[List[str]] = None,
|
||||
create_categories: bool = True,
|
||||
create_tags: bool = True,
|
||||
delete_after: bool = False,
|
||||
status: str = 'draft',
|
||||
limit: Optional[int] = None,
|
||||
ignore_original_date: bool = False) -> str:
|
||||
"""
|
||||
Migrate posts based on filters.
|
||||
|
||||
Args:
|
||||
source_site: Source site name
|
||||
destination_site: Destination site name
|
||||
category_filter: List of category names to filter by
|
||||
tag_filter: List of tag names to filter by
|
||||
date_after: Only migrate posts after this date (YYYY-MM-DD)
|
||||
date_before: Only migrate posts before this date (YYYY-MM-DD)
|
||||
status_filter: List of statuses to filter by (e.g., ['publish', 'draft'])
|
||||
create_categories: If True, create categories if they don't exist
|
||||
create_tags: If True, create tags if they don't exist
|
||||
delete_after: If True, delete posts from source after migration
|
||||
status: Status for new posts
|
||||
limit: Maximum number of posts to migrate
|
||||
ignore_original_date: If True, use current date instead of original post date
|
||||
|
||||
Returns:
|
||||
Path to migration report CSV
|
||||
"""
|
||||
logger.info(f"🚀 Migrating posts from {source_site} to {destination_site}...")
|
||||
|
||||
migrator = WordPressPostMigrator()
|
||||
return migrator.migrate_posts_by_filter(
|
||||
source_site=source_site,
|
||||
destination_site=destination_site,
|
||||
category_filter=category_filter,
|
||||
tag_filter=tag_filter,
|
||||
date_after=date_after,
|
||||
date_before=date_before,
|
||||
status_filter=status_filter,
|
||||
create_categories=create_categories,
|
||||
create_tags=create_tags,
|
||||
delete_after=delete_after,
|
||||
status=status,
|
||||
limit=limit,
|
||||
ignore_original_date=ignore_original_date
|
||||
)
|
||||
|
||||
def status(self) -> dict:
|
||||
"""Get status of output files."""
|
||||
files = list(self.output_dir.glob('*.csv'))
|
||||
|
||||
156
src/seo/cli.py
156
src/seo/cli.py
@@ -49,6 +49,27 @@ Examples:
|
||||
parser.add_argument('--description', '-d', help='Category description')
|
||||
parser.add_argument('--strict', action='store_true', help='Strict confidence matching (exact match only)')
|
||||
|
||||
# Export arguments
|
||||
parser.add_argument('--author', nargs='+', help='Filter by author name(s) for export')
|
||||
parser.add_argument('--author-id', type=int, nargs='+', help='Filter by author ID(s) for export')
|
||||
|
||||
# Migration arguments
|
||||
parser.add_argument('--destination', '--to', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
|
||||
help='Destination site for migration')
|
||||
parser.add_argument('--source', '--from', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
|
||||
help='Source site for filtered migration')
|
||||
parser.add_argument('--keep-source', action='store_true', help='Keep posts on source site (default: delete after migration)')
|
||||
parser.add_argument('--post-status', choices=['draft', 'publish', 'pending'], default='draft',
|
||||
help='Status for migrated posts (default: draft)')
|
||||
parser.add_argument('--no-categories', action='store_true', help='Do not create categories automatically')
|
||||
parser.add_argument('--no-tags', action='store_true', help='Do not create tags automatically')
|
||||
parser.add_argument('--category-filter', nargs='+', help='Filter by category names (for filtered migration)')
|
||||
parser.add_argument('--tag-filter', nargs='+', help='Filter by tag names (for filtered migration)')
|
||||
parser.add_argument('--date-after', help='Migrate posts after this date (YYYY-MM-DD)')
|
||||
parser.add_argument('--date-before', help='Migrate posts before this date (YYYY-MM-DD)')
|
||||
parser.add_argument('--limit', type=int, help='Limit number of posts to migrate')
|
||||
parser.add_argument('--ignore-original-date', action='store_true', help='Use current date instead of original post date')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.command:
|
||||
@@ -73,6 +94,7 @@ Examples:
|
||||
'category_apply': cmd_category_apply,
|
||||
'category_create': cmd_category_create,
|
||||
'editorial_strategy': cmd_editorial_strategy,
|
||||
'migrate': cmd_migrate,
|
||||
'status': cmd_status,
|
||||
'help': cmd_help,
|
||||
}
|
||||
@@ -104,8 +126,19 @@ def cmd_export(app, args):
|
||||
"""Export all posts."""
|
||||
if args.dry_run:
|
||||
print("Would export all posts from WordPress sites")
|
||||
if args.author:
|
||||
print(f" Author filter: {args.author}")
|
||||
if args.author_id:
|
||||
print(f" Author ID filter: {args.author_id}")
|
||||
return 0
|
||||
app.export()
|
||||
|
||||
result = app.export(
|
||||
author_filter=args.author,
|
||||
author_ids=args.author_id,
|
||||
site_filter=args.site
|
||||
)
|
||||
if result:
|
||||
print(f"✅ Export completed! Output: {result}")
|
||||
return 0
|
||||
|
||||
|
||||
@@ -259,6 +292,94 @@ def cmd_editorial_strategy(app, args):
|
||||
return 0
|
||||
|
||||
|
||||
def cmd_migrate(app, args):
|
||||
"""Migrate posts between websites."""
|
||||
if args.dry_run:
|
||||
print("Would migrate posts between websites")
|
||||
if args.destination:
|
||||
print(f" Destination: {args.destination}")
|
||||
if args.source:
|
||||
print(f" Source: {args.source}")
|
||||
return 0
|
||||
|
||||
# Validate required arguments
|
||||
if not args.destination:
|
||||
print("❌ Destination site required. Use --destination mistergeek.net|webscroll.fr|hellogeek.net")
|
||||
return 1
|
||||
|
||||
delete_after = not args.keep_source
|
||||
create_categories = not args.no_categories
|
||||
create_tags = not args.no_tags
|
||||
|
||||
# Check if using filtered migration or CSV-based migration
|
||||
if args.source:
|
||||
# Filtered migration
|
||||
print(f"Migrating posts from {args.source} to {args.destination}")
|
||||
print(f"Post status: {args.post_status}")
|
||||
print(f"Delete after migration: {delete_after}")
|
||||
if args.category_filter:
|
||||
print(f"Category filter: {args.category_filter}")
|
||||
if args.tag_filter:
|
||||
print(f"Tag filter: {args.tag_filter}")
|
||||
if args.date_after:
|
||||
print(f"Date after: {args.date_after}")
|
||||
if args.date_before:
|
||||
print(f"Date before: {args.date_before}")
|
||||
if args.limit:
|
||||
print(f"Limit: {args.limit}")
|
||||
|
||||
result = app.migrate_by_filter(
|
||||
source_site=args.source,
|
||||
destination_site=args.destination,
|
||||
category_filter=args.category_filter,
|
||||
tag_filter=args.tag_filter,
|
||||
date_after=args.date_after,
|
||||
date_before=args.date_before,
|
||||
status_filter=None,
|
||||
create_categories=create_categories,
|
||||
create_tags=create_tags,
|
||||
delete_after=delete_after,
|
||||
status=args.post_status,
|
||||
limit=args.limit,
|
||||
ignore_original_date=args.ignore_original_date
|
||||
)
|
||||
|
||||
if result:
|
||||
print(f"\n✅ Migration completed!")
|
||||
print(f" Report: {result}")
|
||||
else:
|
||||
# CSV-based migration
|
||||
csv_file = args.args[0] if args.args else None
|
||||
|
||||
if not csv_file:
|
||||
print("❌ CSV file required. Provide path to CSV with 'site' and 'post_id' columns")
|
||||
print(" Usage: seo migrate <csv_file> --destination <site>")
|
||||
print(" Or use filtered migration: seo migrate --source <site> --destination <site>")
|
||||
return 1
|
||||
|
||||
print(f"Migrating posts from CSV: {csv_file}")
|
||||
print(f"Destination: {args.destination}")
|
||||
print(f"Post status: {args.post_status}")
|
||||
print(f"Delete after migration: {delete_after}")
|
||||
|
||||
result = app.migrate(
|
||||
csv_file=csv_file,
|
||||
destination_site=args.destination,
|
||||
create_categories=create_categories,
|
||||
create_tags=create_tags,
|
||||
delete_after=delete_after,
|
||||
status=args.post_status,
|
||||
output_file=args.output,
|
||||
ignore_original_date=args.ignore_original_date
|
||||
)
|
||||
|
||||
if result:
|
||||
print(f"\n✅ Migration completed!")
|
||||
print(f" Report: {result}")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def cmd_status(app, args):
|
||||
"""Show status."""
|
||||
if args.dry_run:
|
||||
@@ -285,6 +406,9 @@ SEO Automation CLI - Available Commands
|
||||
|
||||
Export & Analysis:
|
||||
export Export all posts from WordPress sites
|
||||
export --author "John Doe" Export posts by specific author
|
||||
export --author-id 1 2 Export posts by author IDs
|
||||
export -s mistergeek.net Export from specific site only
|
||||
analyze [csv_file] Analyze posts with AI
|
||||
analyze -f title Analyze specific fields (title, meta_description, categories, site)
|
||||
analyze -u Update input CSV with new columns (creates backup)
|
||||
@@ -299,11 +423,35 @@ Category Management:
|
||||
Strategy & Migration:
|
||||
editorial_strategy [csv] Analyze editorial lines and recommend migrations
|
||||
editorial_strategy Get migration recommendations between sites
|
||||
migrate <csv> --destination <site> Migrate posts from CSV to destination site
|
||||
migrate --source <site> --destination <site> Migrate posts with filters
|
||||
migrate --source A --to B --category-filter "VPN" Migrate specific categories
|
||||
migrate --source A --to B --date-after 2024-01-01 --limit 10
|
||||
|
||||
Utility:
|
||||
status Show output files status
|
||||
help Show this help message
|
||||
|
||||
Export Options:
|
||||
--author Filter by author name(s) (case-insensitive, partial match)
|
||||
--author-id Filter by author ID(s)
|
||||
--site, -s Export from specific site only
|
||||
|
||||
Migration Options:
|
||||
--destination, --to Destination site: mistergeek.net, webscroll.fr, hellogeek.net
|
||||
--source, --from Source site for filtered migration
|
||||
--keep-source Keep posts on source site (default: delete after migration)
|
||||
--post-status Status for migrated posts: draft, publish, pending (default: draft)
|
||||
--no-categories Do not create categories automatically
|
||||
--no-tags Do not create tags automatically
|
||||
--category-filter Filter by category names (for filtered migration)
|
||||
--tag-filter Filter by tag names (for filtered migration)
|
||||
--date-after Migrate posts after this date (YYYY-MM-DD)
|
||||
--date-before Migrate posts before this date (YYYY-MM-DD)
|
||||
--limit Limit number of posts to migrate
|
||||
--ignore-original-date Use current date instead of original post date
|
||||
--output, -o Custom output file path for migration report
|
||||
|
||||
Options:
|
||||
--verbose, -v Enable verbose logging
|
||||
--dry-run Show what would be done without doing it
|
||||
@@ -317,11 +465,17 @@ Options:
|
||||
|
||||
Examples:
|
||||
seo export
|
||||
seo export --author "John Doe"
|
||||
seo export --author-id 1 2
|
||||
seo export -s mistergeek.net --author "admin"
|
||||
seo analyze -f title categories
|
||||
seo category_propose
|
||||
seo category_apply -s mistergeek.net -c Medium
|
||||
seo category_create -s webscroll.fr "Torrent Clients"
|
||||
seo editorial_strategy
|
||||
seo migrate posts_to_migrate.csv --destination mistergeek.net
|
||||
seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN
|
||||
seo migrate --source A --to B --date-after 2024-01-01 --limit 10 --keep-source
|
||||
seo status
|
||||
""")
|
||||
return 0
|
||||
|
||||
@@ -20,11 +20,21 @@ logger = logging.getLogger(__name__)
|
||||
class PostExporter:
|
||||
"""Export posts from WordPress sites to CSV."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the exporter."""
|
||||
def __init__(self, author_filter: Optional[List[str]] = None,
|
||||
author_ids: Optional[List[int]] = None):
|
||||
"""
|
||||
Initialize the exporter.
|
||||
|
||||
Args:
|
||||
author_filter: List of author names to filter by (case-insensitive)
|
||||
author_ids: List of author IDs to filter by
|
||||
"""
|
||||
self.sites = Config.WORDPRESS_SITES
|
||||
self.all_posts = []
|
||||
self.category_cache = {}
|
||||
self.author_filter = author_filter
|
||||
self.author_ids = author_ids
|
||||
self.author_cache = {} # Cache author info by site
|
||||
|
||||
def fetch_category_names(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
|
||||
"""Fetch category names from a WordPress site."""
|
||||
@@ -50,8 +60,55 @@ class PostExporter:
|
||||
self.category_cache[site_name] = categories
|
||||
return categories
|
||||
|
||||
def fetch_posts_from_site(self, site_name: str, site_config: Dict) -> List[Dict]:
|
||||
"""Fetch all posts from a WordPress site."""
|
||||
def fetch_authors(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
|
||||
"""
|
||||
Fetch all authors/users from a WordPress site.
|
||||
|
||||
Returns:
|
||||
Dict mapping author ID to author data (name, slug)
|
||||
"""
|
||||
if site_name in self.author_cache:
|
||||
return self.author_cache[site_name]
|
||||
|
||||
logger.info(f" Fetching authors from {site_name}...")
|
||||
authors = {}
|
||||
base_url = site_config['url'].rstrip('/')
|
||||
api_url = f"{base_url}/wp-json/wp/v2/users"
|
||||
auth = HTTPBasicAuth(site_config['username'], site_config['password'])
|
||||
|
||||
try:
|
||||
response = requests.get(api_url, params={'per_page': 100}, auth=auth, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
for user in response.json():
|
||||
authors[user['id']] = {
|
||||
'id': user['id'],
|
||||
'name': user.get('name', ''),
|
||||
'slug': user.get('slug', ''),
|
||||
'description': user.get('description', '')
|
||||
}
|
||||
logger.info(f" ✓ Fetched {len(authors)} authors")
|
||||
except Exception as e:
|
||||
logger.warning(f" Could not fetch authors from {site_name}: {e}")
|
||||
# Fallback: create empty dict if authors can't be fetched
|
||||
# Author IDs will still be exported, just without names
|
||||
|
||||
self.author_cache[site_name] = authors
|
||||
return authors
|
||||
|
||||
def fetch_posts_from_site(self, site_name: str, site_config: Dict,
|
||||
authors_map: Optional[Dict[int, Dict]] = None) -> List[Dict]:
|
||||
"""
|
||||
Fetch all posts from a WordPress site.
|
||||
|
||||
Args:
|
||||
site_name: Site name
|
||||
site_config: Site configuration
|
||||
authors_map: Optional authors mapping for filtering
|
||||
|
||||
Returns:
|
||||
List of post data
|
||||
"""
|
||||
logger.info(f"\nFetching posts from {site_name}...")
|
||||
|
||||
posts = []
|
||||
@@ -59,14 +116,23 @@ class PostExporter:
|
||||
api_url = f"{base_url}/wp-json/wp/v2/posts"
|
||||
auth = HTTPBasicAuth(site_config['username'], site_config['password'])
|
||||
|
||||
# Build base params
|
||||
base_params = {'page': 1, 'per_page': 100, '_embed': True}
|
||||
|
||||
# Add author filter if specified
|
||||
if self.author_ids:
|
||||
base_params['author'] = ','.join(map(str, self.author_ids))
|
||||
logger.info(f" Filtering by author IDs: {self.author_ids}")
|
||||
|
||||
for status in ['publish', 'draft']:
|
||||
page = 1
|
||||
while True:
|
||||
try:
|
||||
params = {**base_params, 'page': page, 'status': status}
|
||||
logger.info(f" Fetching page {page} ({status} posts)...")
|
||||
response = requests.get(
|
||||
api_url,
|
||||
params={'page': page, 'per_page': 100, 'status': status},
|
||||
params=params,
|
||||
auth=auth,
|
||||
timeout=10
|
||||
)
|
||||
@@ -76,7 +142,28 @@ class PostExporter:
|
||||
if not page_posts:
|
||||
break
|
||||
|
||||
# Filter by author name if specified
|
||||
if self.author_filter and authors_map:
|
||||
filtered_posts = []
|
||||
for post in page_posts:
|
||||
author_id = post.get('author')
|
||||
if author_id and author_id in authors_map:
|
||||
author_name = authors_map[author_id]['name'].lower()
|
||||
author_slug = authors_map[author_id]['slug'].lower()
|
||||
|
||||
# Check if author matches filter
|
||||
for filter_name in self.author_filter:
|
||||
filter_lower = filter_name.lower()
|
||||
if (filter_lower in author_name or
|
||||
filter_lower == author_slug):
|
||||
filtered_posts.append(post)
|
||||
break
|
||||
|
||||
page_posts = filtered_posts
|
||||
logger.info(f" ✓ Got {len(page_posts)} posts after author filter")
|
||||
|
||||
posts.extend(page_posts)
|
||||
if page_posts:
|
||||
logger.info(f" ✓ Got {len(page_posts)} posts")
|
||||
|
||||
page += 1
|
||||
@@ -94,7 +181,8 @@ class PostExporter:
|
||||
logger.info(f"✓ Total posts from {site_name}: {len(posts)}\n")
|
||||
return posts
|
||||
|
||||
def extract_post_details(self, post: Dict, site_name: str, category_map: Dict) -> Dict:
|
||||
def extract_post_details(self, post: Dict, site_name: str, category_map: Dict,
|
||||
author_map: Optional[Dict[int, Dict]] = None) -> Dict:
|
||||
"""Extract post details for CSV export."""
|
||||
title = post.get('title', {})
|
||||
if isinstance(title, dict):
|
||||
@@ -122,6 +210,13 @@ class PostExporter:
|
||||
for cat_id in category_ids
|
||||
]) if category_ids else ''
|
||||
|
||||
# Get author name from author map
|
||||
author_id = post.get('author', '')
|
||||
author_name = ''
|
||||
if author_map and author_id:
|
||||
author_data = author_map.get(author_id, {})
|
||||
author_name = author_data.get('name', '')
|
||||
|
||||
return {
|
||||
'site': site_name,
|
||||
'post_id': post['id'],
|
||||
@@ -129,7 +224,8 @@ class PostExporter:
|
||||
'title': title.strip(),
|
||||
'slug': post.get('slug', ''),
|
||||
'url': post.get('link', ''),
|
||||
'author_id': post.get('author', ''),
|
||||
'author_id': author_id,
|
||||
'author_name': author_name,
|
||||
'date_published': post.get('date', ''),
|
||||
'date_modified': post.get('modified', ''),
|
||||
'categories': category_names,
|
||||
@@ -158,7 +254,7 @@ class PostExporter:
|
||||
return ""
|
||||
|
||||
fieldnames = [
|
||||
'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id',
|
||||
'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id', 'author_name',
|
||||
'date_published', 'date_modified', 'categories', 'tags', 'excerpt',
|
||||
'content_preview', 'seo_title', 'meta_description', 'focus_keyword', 'word_count',
|
||||
]
|
||||
@@ -173,24 +269,46 @@ class PostExporter:
|
||||
logger.info(f"✓ CSV exported to: {output_file}")
|
||||
return str(output_file)
|
||||
|
||||
def run(self) -> str:
|
||||
"""Run the complete export process."""
|
||||
def run(self, site_filter: Optional[str] = None) -> str:
|
||||
"""
|
||||
Run the complete export process.
|
||||
|
||||
Args:
|
||||
site_filter: Optional site name to export from (default: all sites)
|
||||
|
||||
Returns:
|
||||
Path to exported CSV file
|
||||
"""
|
||||
logger.info("="*70)
|
||||
logger.info("EXPORTING ALL POSTS")
|
||||
logger.info("="*70)
|
||||
|
||||
if self.author_filter:
|
||||
logger.info(f"Author filter: {self.author_filter}")
|
||||
if self.author_ids:
|
||||
logger.info(f"Author IDs: {self.author_ids}")
|
||||
if site_filter:
|
||||
logger.info(f"Site filter: {site_filter}")
|
||||
|
||||
logger.info("Sites configured: " + ", ".join(self.sites.keys()))
|
||||
|
||||
for site_name, config in self.sites.items():
|
||||
# Skip sites if filter is specified
|
||||
if site_filter and site_name != site_filter:
|
||||
logger.info(f"Skipping {site_name} (not in filter)")
|
||||
continue
|
||||
|
||||
categories = self.fetch_category_names(site_name, config)
|
||||
posts = self.fetch_posts_from_site(site_name, config)
|
||||
authors = self.fetch_authors(site_name, config)
|
||||
posts = self.fetch_posts_from_site(site_name, config, authors)
|
||||
|
||||
if posts:
|
||||
for post in posts:
|
||||
post_details = self.extract_post_details(post, site_name, categories)
|
||||
post_details = self.extract_post_details(post, site_name, categories, authors)
|
||||
self.all_posts.append(post_details)
|
||||
|
||||
if not self.all_posts:
|
||||
logger.error("No posts found on any site")
|
||||
logger.warning("No posts found matching criteria")
|
||||
return ""
|
||||
|
||||
self.all_posts.sort(key=lambda x: (x['site'], x['post_id']))
|
||||
|
||||
1007
src/seo/post_migrator.py
Normal file
1007
src/seo/post_migrator.py
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user