进阶教程从零开始完整高效精选

eBay售价自动追踪Python爬虫实战：2025年从零开始完整高效精选推荐排行榜

2026-06-09

阅读 0

热度 0

作者菜鸟AI编辑部

摘要

使用Python构建eBay价格监控系统，采用动态代理池轮换IP以规避限流，通过解析DOM节点抓取商

电商价格监控的核心需求其实很直白：定期抓取eBay上的商品售价，存个快照，遇到降价就触发告警。eBay的页面主要是服务端渲染，不用费劲解析Ja vaScript，但麻烦也不少——URL结构绕来绕去、有区间价、访问频率限制还特别严。长期批量爬品类数据，单IP高频请求几乎必被限流封禁。

下面这套方案基于Python实现了一个完整的eBay价格监控链路，从商品检索、价格解析、数据持久化，到价格异动检测、降价通知，一应俱全。代码可以直接部署运行，当然，仅供技术演示参考。

页面数据定位

用浏览器开发者工具（F12）解析DOM节点就行了。当前可用的CSS选择器如下（注意：平台会做A/B测试，Class名称可能动态变更，选择器失效时需要重新校验）：

数据项	CSS选择器
商品标题	itemTitle / h1.x-item-title__mainTitle
在售价格	.x-price-primary .ux-textspans
划线原价	.x-price-approxprice
运费	.ux-labels-valuesvalues .ux-textspans
搜索列表项	.srp-results .s-item
列表页价格	.s-item__price

如何自动追踪 eBay 售价？Python 爬虫实战解析

搜索列表采集与袋里方案

eBay对搜索接口的管控相当严格。单IP连续请求，短时间内就会触发限流。因此需要用袋里池做IP轮换，通过统一入口，由云端自动完成IP切换。标准袋里池的IP体量通常在30万以上，延迟能控制在100ms以内。

袋里接入的核心逻辑

通过请求头传递随机数值来实现IP动态切换。相比TCP长连接模式，这种方式稳定性更强——哪怕连接中断，也能正常切到新的出口IP。

import requests
import random

# 隧道袋里基础配置（使用通用袋里池，替换为实际凭据）
proxyHost = "proxy.example.com"
proxyPort = "31111"
proxyUser = "your_username"
proxyPass = "your_password"

proxyMeta = f"http://{proxyUser}:{proxyPass}@{proxyHost}:{proxyPort}"
proxies = {"http": proxyMeta, "https": proxyMeta}

搜索采集器的完整实现基于requests会话维持请求上下文，用BeautifulSoup解析页面，通过数据类结构化存储商品信息。为了降低风控概率，内置了随机请求间隔。

from bs4 import BeautifulSoup
from urllib.parse import quote
from dataclasses import dataclass
from typing import Optional, List
import time
import random
import sqlite3
import hashlib

@dataclass
class Product:
    """商品数据实体类"""
    ebay_id: str = ""
    title: str = ""
    url: str = ""
    price: float = 0.0
    original_price: float = 0.0
    currency: str = "USD"
    shipping: str = ""
    seller: str = ""
    condition: str = ""
    image_url: str = ""
    uid: str = ""

    def __post_init__(self):
        if not self.uid and self.ebay_id:
            self.uid = hashlib.md5(self.ebay_id.encode()).hexdigest()[:12]

class EbaySearchScraper:
    """eBay搜索列表采集器（集成袋里池）"""
    BASE_URL = "https://www.ebay.com/sch/i.html"

    def __init__(self, proxy_user: str = "", proxy_pass: str = ""):
        self.session = requests.Session()
        # 模拟标准浏览器请求头
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
            "Accept-Language": "en-US,en;q=0.9",
        })
        self.use_proxy = bool(proxy_user and proxy_pass)
        self.proxies = None
        if self.use_proxy:
            proxy_meta = f"http://{proxy_user}:{proxy_pass}@proxy.example.com:31111"
            self.proxies = {"http": proxy_meta, "https": proxy_meta}

    def _get(self, url, **kwargs):
        """封装请求方法，动态切换出口IP"""
        headers = {}
        if self.use_proxy:
            headers["Proxy-Tunnel"] = str(random.randint(1, 10000))
        return self.session.get(url, proxies=self.proxies, headers=headers, timeout=15,**kwargs)

    def search(self, keyword: str, max_pages: int = 3, delay: float = 3.0) -> List[Product]:
        """分页检索商品列表"""
        all_products = []
        for page in range(1, max_pages + 1):
            params = {"_nkw": keyword, "_pgn": page, "_ipg": 60}
            try:
                resp = self._get(self.BASE_URL, params=params)
                if resp.status_code != 200:
                    break
                products = self._parse_search(resp.text)
                all_products.extend(products)
                time.sleep(delay + random.uniform(0.5, 2.0))
            except Exception:
                break
        return all_products

    def _parse_search(self, html: str) -> List[Product]:
        """解析搜索页HTML"""
        soup = BeautifulSoup(html, "html.parser")
        products = []
        for item in soup.select(".srp-results .s-item"):
            try:
                title_el = item.select_one(".s-item__title")
                link_el = item.select_one(".s-item__link")
                if not all([title_el, link_el]):
                    continue
                title = title_el.get_text(strip=True)
                url = link_el.get("href", "")
                ebay_id = url.split("/itm/")[1].split("?")[0] if "/itm/" in url else ""
                price = self._parse_price(item.select_one(".s-item__price"))
                shipping = item.select_one(".s-item__shipping").get_text(strip=True) if item.select_one(".s-item__shipping") else ""
                image_url = item.select_one(".s-item__image-img").get("src", "") if item.select_one(".s-item__image-img") else ""
                if ebay_id and price > 0:
                    products.append(Product(
                        ebay_id=ebay_id, title=title, url=url,
                        price=price, shipping=shipping, image_url=image_url
                    ))
            except Exception:
                continue
        return products

    @staticmethod
    def _parse_price(el) -> float:
        """价格文本清洗与类型转换，区间价取首值"""
        if not el:
            return 0.0
        text = el.get_text(strip=True).split("to")[0].replace("$", "").replace("C", "").replace(",", "")
        try:
            return float(text)
        except ValueError:
            return 0.0

商品详情页数据解析

复用采集器的会话与袋里配置，拉取详情页的完整数据——原价、精准运费、卖家信息等，都能一并拿到。

class EbayDetailScraper:
    """商品详情页采集解析器"""
    def __init__(self, scraper: EbaySearchScraper):
        self._get = scraper._get

    def fetch_detail(self, product: Product) -> Product:
        try:
            resp = self._get(product.url)
            if resp.status_code != 200:
                return product
            soup = BeautifulSoup(resp.text, "html.parser")
            # 解析在售价格
            price_el = soup.select_one(".x-price-primary .ux-textspans")
            if price_el:
                product.price = self._parse(price_el.get_text(strip=True))
            # 解析划线原价
            orig_el = soup.select_one(".x-price-approx__price .ux-textspans--STRIKETHROUGH")
            if orig_el:
                product.original_price = self._parse(orig_el.get_text(strip=True))
            # 解析运费
            for sec in soup.select(".ux-labels-values__values-content .ux-textspans"):
                text = sec.get_text(strip=True).lower()
                if "free" in text or "$" in text:
                    product.shipping = text
                    break
            # 解析卖家名称
            seller_el = soup.select_one(".x-seller-info__name")
            if seller_el:
                product.seller = seller_el.get_text(strip=True)
        except Exception:
            pass
        return product

    @staticmethod
    def _parse(text: str) -> float:
        """详情页价格清洗"""
        cleaned = text.replace("$", "").replace(",", "").strip()
        try:
            return float(cleaned.split()[0])
        except (ValueError, IndexError):
            return 0.0

历史价格数据持久化

用轻量级SQLite来构建数据存储层。商品基础表和价格历史表分开，建好索引，查询效率有保障。支持数据更新、历史记录写入、价格比对查询。

class PriceDatabase:
    def __init__(self, db_path: str = "ebay_prices.db"):
        self.conn = sqlite3.connect(db_path)
        # 商品基础信息表
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS products (
                ebay_id TEXT PRIMARY KEY,
                title TEXT, url TEXT, image_url TEXT, seller TEXT,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
            )
        """)
        # 价格历史记录表
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS price_history (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                ebay_id TEXT, price REAL, original_price REAL, shipping TEXT,
                recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                FOREIGN KEY (ebay_id) REFERENCES products(ebay_id)
            )
        """)
        self.conn.execute("CREATE INDEX IF NOT EXISTS idx_hist ON price_history(ebay_id)")
        self.conn.commit()

    def upsert_product(self, p: Product):
        """新增/更新商品基础信息"""
        self.conn.execute(
            "INSERT OR REPLACE INTO products (ebay_id, title, url, image_url, seller) VALUES (?,?,?,?,?)",
            (p.ebay_id, p.title, p.url, p.image_url, p.seller)
        )
        self.conn.commit()

    def record_price(self, p: Product):
        """写入价格快照"""
        self.conn.execute(
            "INSERT INTO price_history (ebay_id, price, original_price, shipping) VALUES (?,?,?,?)",
            (p.ebay_id, p.price, p.original_price, p.shipping)
        )
        self.conn.commit()

    def get_latest(self, ebay_id: str) -> Optional[dict]:
        """查询单商品最新价格"""
        row = self.conn.execute(
            "SELECT price, recorded_at FROM price_history WHERE ebay_id=? ORDER BY recorded_at DESC LIMIT 1",
            (ebay_id,)
        ).fetchone()
        return {"price": row[0], "date": row[1]} if row else None

    def find_drops(self, threshold: float = 0.1) -> List[dict]:
        """批量筛选降幅达标的降价商品"""
        rows = self.conn.execute("""
            SELECT h1.ebay_id, p.title, h1.price old_price, h2.price new_price,
                   (h1.price - h2.price)/h1.price drop_ratio, h2.recorded_at
            FROM price_history h1
            JOIN price_history h2 ON h1.ebay_id = h2.ebay_id
            JOIN products p ON h1.ebay_id = p.ebay_id
            WHERE h2.recorded_at > h1.recorded_at
            AND h1.id = (SELECT MAX(id) FROM price_history WHERE ebay_id=h1.ebay_id AND id < h2.id)
            AND (h1.price - h2.price)/h1.price >= ?
            ORDER BY drop_ratio DESC
        """, (threshold,)).fetchall()
        return [{
            "ebay_id": r[0], "title": r[1], "old": r[2], "new": r[3],
            "drop_pct": round(r[4]*100,1), "date": r[5]
        } for r in rows]

价格异动检测与告警

拿当期价格和历史快照一对比，设定降幅阈值，触发告警。最后把各个模块整合起来，做一个任务调度。

class PriceAlert:
    def __init__(self, db: PriceDatabase):
        self.db = db

    def check(self, products: List[Product]) -> List[dict]:
        """检测价格变动，生成降价告警"""
        alerts = []
        for p in products:
            self.db.upsert_product(p)
            self.db.record_price(p)
            latest = self.db.get_latest(p.ebay_id)
            if not latest or latest["price"] <= 0 or p.price <= 0:
                continue
            drop_rate = (latest["price"] - p.price) / latest["price"]
            # 降幅≥5%触发告警
            if drop_rate >= 0.05:
                alerts.append({
                    "type": "↓",
                    "ebay_id": p.ebay_id,
                    "title": p.title,
                    "old": latest["price"],
                    "new": p.price,
                    "drop": round(drop_rate*100,1),
                    "url": p.url
                })
        return alerts

# 主运行入口
def track(keywords: List[str], delay: float = 3.0):
    db = PriceDatabase()
    scraper = EbaySearchScraper(proxy_user="your_user", proxy_pass="your_password")
    detail = EbayDetailScraper(scraper)
    alert = PriceAlert(db)
    all_alerts = []

    for kw in keywords:
        print(f"=== 检索关键词：{kw} ===")
        products = scraper.search(kw, max_pages=2, delay=delay)
        for idx, p in enumerate(products[:10]):
            print(f"解析详情 {idx+1}/10：{p.title[:30]}...")
            detail.fetch_detail(p)
            time.sleep(delay + random.uniform(1, 3))
        all_alerts.extend(alert.check(products))

    if all_alerts:
        print(f"{'='*50}检测到{len(all_alerts)}条价格变动：")
        for item in all_alerts:
            print(f"{item['type']} 降幅{item['drop']}% | ${item['old']:.2f} → ${item['new']:.2f} | {item['title'][:35]}")
    else:
        print("未检测到显著价格变动")

# 启动监控任务
if __name__ == "__main__":
    track(["mechanical keyboard", "rtx 4070", "sony wh-1000xm5"])

袋里选型与技术要点

袋里方案选型

eBay不同采集场景，袋里模式也得跟着调：

搜索列表 / 常规详情页：用动态转发模式，靠大规模IP池做请求隔离，配合头部动态换IP。
登录态 / 会话保持采集：用固定转发版，短时锁定IP维持会话有效。

IP切换有两种机制：

TCP Keep-Alive：复用会话连接，连接重建后自动换IP。
请求头强制切换：通过随机数值强制换IP，不受网络连接状态影响，本方案优先采用这个。

常见故障排查

搜索结果为空：请求头Accept-Language要设成en-US，别用requests默认UA，得匹配浏览器标识。
袋里407错误：账号密码校验失败，检查一下凭据。
价格解析异常：多币种、区间价场景得扩展清洗逻辑。
CSS选择器失效：平台前端迭代改了Class，重新F12抓一下节点就成。

方案边界与合规说明

合规约束：遵循eBay robots.txt规则，请求间隔不低于3秒。数据仅用于个人价格分析，禁止商用分发。
场景限制：这套方案是定时批处理模式，不支持秒杀类实时监控。部分动态渲染页面得改用Playwright这类无头浏览器。
性能边界：批量高频轮询会提升风控概率，大规模监控建议拆分任务、分布式部署。

来源：互联网

上一篇 Java JNI高性能实践指南：5个关键优化技巧 下一篇 Codex新手入门推荐：零基础快速上手指南

免责声明

本网站新闻资讯均来自公开渠道，力求准确但不保证绝对无误，内容观点仅代表作者本人，与本站无关。若涉及侵权，请联系我们处理。本站保留对声明的修改权，最终解释权归本站所有。