eBay售价自动追踪Python爬虫实战:2025年从零开始完整高效精选推荐排行榜
摘要
使用Python构建eBay价格监控系统,采用动态代理池轮换IP以规避限流,通过解析DOM节点抓取商
电商价格监控的核心需求其实很直白:定期抓取eBay上的商品售价,存个快照,遇到降价就触发告警。eBay的页面主要是服务端渲染,不用费劲解析Ja vaScript,但麻烦也不少——URL结构绕来绕去、有区间价、访问频率限制还特别严。长期批量爬品类数据,单IP高频请求几乎必被限流封禁。
下面这套方案基于Python实现了一个完整的eBay价格监控链路,从商品检索、价格解析、数据持久化,到价格异动检测、降价通知,一应俱全。代码可以直接部署运行,当然,仅供技术演示参考。
页面数据定位
用浏览器开发者工具(F12)解析DOM节点就行了。当前可用的CSS选择器如下(注意:平台会做A/B测试,Class名称可能动态变更,选择器失效时需要重新校验):
| 数据项 | CSS选择器 |
| 商品标题 | itemTitle / h1.x-item-title__mainTitle |
| 在售价格 | .x-price-primary .ux-textspans |
| 划线原价 | .x-price-approxprice |
| 运费 | .ux-labels-valuesvalues .ux-textspans |
| 搜索列表项 | .srp-results .s-item |
| 列表页价格 | .s-item__price |

搜索列表采集与袋里方案
eBay对搜索接口的管控相当严格。单IP连续请求,短时间内就会触发限流。因此需要用袋里池做IP轮换,通过统一入口,由云端自动完成IP切换。标准袋里池的IP体量通常在30万以上,延迟能控制在100ms以内。
袋里接入的核心逻辑
通过请求头传递随机数值来实现IP动态切换。相比TCP长连接模式,这种方式稳定性更强——哪怕连接中断,也能正常切到新的出口IP。
import requests
import random
# 隧道袋里基础配置(使用通用袋里池,替换为实际凭据)
proxyHost = "proxy.example.com"
proxyPort = "31111"
proxyUser = "your_username"
proxyPass = "your_password"
proxyMeta = f"http://{proxyUser}:{proxyPass}@{proxyHost}:{proxyPort}"
proxies = {"http": proxyMeta, "https": proxyMeta}
搜索采集器的完整实现基于requests会话维持请求上下文,用BeautifulSoup解析页面,通过数据类结构化存储商品信息。为了降低风控概率,内置了随机请求间隔。
from bs4 import BeautifulSoup
from urllib.parse import quote
from dataclasses import dataclass
from typing import Optional, List
import time
import random
import sqlite3
import hashlib
@dataclass
class Product:
"""商品数据实体类"""
ebay_id: str = ""
title: str = ""
url: str = ""
price: float = 0.0
original_price: float = 0.0
currency: str = "USD"
shipping: str = ""
seller: str = ""
condition: str = ""
image_url: str = ""
uid: str = ""
def __post_init__(self):
if not self.uid and self.ebay_id:
self.uid = hashlib.md5(self.ebay_id.encode()).hexdigest()[:12]
class EbaySearchScraper:
"""eBay搜索列表采集器(集成袋里池)"""
BASE_URL = "https://www.ebay.com/sch/i.html"
def __init__(self, proxy_user: str = "", proxy_pass: str = ""):
self.session = requests.Session()
# 模拟标准浏览器请求头
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
})
self.use_proxy = bool(proxy_user and proxy_pass)
self.proxies = None
if self.use_proxy:
proxy_meta = f"http://{proxy_user}:{proxy_pass}@proxy.example.com:31111"
self.proxies = {"http": proxy_meta, "https": proxy_meta}
def _get(self, url, **kwargs):
"""封装请求方法,动态切换出口IP"""
headers = {}
if self.use_proxy:
headers["Proxy-Tunnel"] = str(random.randint(1, 10000))
return self.session.get(url, proxies=self.proxies, headers=headers, timeout=15,**kwargs)
def search(self, keyword: str, max_pages: int = 3, delay: float = 3.0) -> List[Product]:
"""分页检索商品列表"""
all_products = []
for page in range(1, max_pages + 1):
params = {"_nkw": keyword, "_pgn": page, "_ipg": 60}
try:
resp = self._get(self.BASE_URL, params=params)
if resp.status_code != 200:
break
products = self._parse_search(resp.text)
all_products.extend(products)
time.sleep(delay + random.uniform(0.5, 2.0))
except Exception:
break
return all_products
def _parse_search(self, html: str) -> List[Product]:
"""解析搜索页HTML"""
soup = BeautifulSoup(html, "html.parser")
products = []
for item in soup.select(".srp-results .s-item"):
try:
title_el = item.select_one(".s-item__title")
link_el = item.select_one(".s-item__link")
if not all([title_el, link_el]):
continue
title = title_el.get_text(strip=True)
url = link_el.get("href", "")
ebay_id = url.split("/itm/")[1].split("?")[0] if "/itm/" in url else ""
price = self._parse_price(item.select_one(".s-item__price"))
shipping = item.select_one(".s-item__shipping").get_text(strip=True) if item.select_one(".s-item__shipping") else ""
image_url = item.select_one(".s-item__image-img").get("src", "") if item.select_one(".s-item__image-img") else ""
if ebay_id and price > 0:
products.append(Product(
ebay_id=ebay_id, title=title, url=url,
price=price, shipping=shipping, image_url=image_url
))
except Exception:
continue
return products
@staticmethod
def _parse_price(el) -> float:
"""价格文本清洗与类型转换,区间价取首值"""
if not el:
return 0.0
text = el.get_text(strip=True).split("to")[0].replace("$", "").replace("C", "").replace(",", "")
try:
return float(text)
except ValueError:
return 0.0
商品详情页数据解析
复用采集器的会话与袋里配置,拉取详情页的完整数据——原价、精准运费、卖家信息等,都能一并拿到。
class EbayDetailScraper:
"""商品详情页采集解析器"""
def __init__(self, scraper: EbaySearchScraper):
self._get = scraper._get
def fetch_detail(self, product: Product) -> Product:
try:
resp = self._get(product.url)
if resp.status_code != 200:
return product
soup = BeautifulSoup(resp.text, "html.parser")
# 解析在售价格
price_el = soup.select_one(".x-price-primary .ux-textspans")
if price_el:
product.price = self._parse(price_el.get_text(strip=True))
# 解析划线原价
orig_el = soup.select_one(".x-price-approx__price .ux-textspans--STRIKETHROUGH")
if orig_el:
product.original_price = self._parse(orig_el.get_text(strip=True))
# 解析运费
for sec in soup.select(".ux-labels-values__values-content .ux-textspans"):
text = sec.get_text(strip=True).lower()
if "free" in text or "$" in text:
product.shipping = text
break
# 解析卖家名称
seller_el = soup.select_one(".x-seller-info__name")
if seller_el:
product.seller = seller_el.get_text(strip=True)
except Exception:
pass
return product
@staticmethod
def _parse(text: str) -> float:
"""详情页价格清洗"""
cleaned = text.replace("$", "").replace(",", "").strip()
try:
return float(cleaned.split()[0])
except (ValueError, IndexError):
return 0.0
历史价格数据持久化
用轻量级SQLite来构建数据存储层。商品基础表和价格历史表分开,建好索引,查询效率有保障。支持数据更新、历史记录写入、价格比对查询。
class PriceDatabase:
def __init__(self, db_path: str = "ebay_prices.db"):
self.conn = sqlite3.connect(db_path)
# 商品基础信息表
self.conn.execute("""
CREATE TABLE IF NOT EXISTS products (
ebay_id TEXT PRIMARY KEY,
title TEXT, url TEXT, image_url TEXT, seller TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
# 价格历史记录表
self.conn.execute("""
CREATE TABLE IF NOT EXISTS price_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ebay_id TEXT, price REAL, original_price REAL, shipping TEXT,
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (ebay_id) REFERENCES products(ebay_id)
)
""")
self.conn.execute("CREATE INDEX IF NOT EXISTS idx_hist ON price_history(ebay_id)")
self.conn.commit()
def upsert_product(self, p: Product):
"""新增/更新商品基础信息"""
self.conn.execute(
"INSERT OR REPLACE INTO products (ebay_id, title, url, image_url, seller) VALUES (?,?,?,?,?)",
(p.ebay_id, p.title, p.url, p.image_url, p.seller)
)
self.conn.commit()
def record_price(self, p: Product):
"""写入价格快照"""
self.conn.execute(
"INSERT INTO price_history (ebay_id, price, original_price, shipping) VALUES (?,?,?,?)",
(p.ebay_id, p.price, p.original_price, p.shipping)
)
self.conn.commit()
def get_latest(self, ebay_id: str) -> Optional[dict]:
"""查询单商品最新价格"""
row = self.conn.execute(
"SELECT price, recorded_at FROM price_history WHERE ebay_id=? ORDER BY recorded_at DESC LIMIT 1",
(ebay_id,)
).fetchone()
return {"price": row[0], "date": row[1]} if row else None
def find_drops(self, threshold: float = 0.1) -> List[dict]:
"""批量筛选降幅达标的降价商品"""
rows = self.conn.execute("""
SELECT h1.ebay_id, p.title, h1.price old_price, h2.price new_price,
(h1.price - h2.price)/h1.price drop_ratio, h2.recorded_at
FROM price_history h1
JOIN price_history h2 ON h1.ebay_id = h2.ebay_id
JOIN products p ON h1.ebay_id = p.ebay_id
WHERE h2.recorded_at > h1.recorded_at
AND h1.id = (SELECT MAX(id) FROM price_history WHERE ebay_id=h1.ebay_id AND id < h2.id)
AND (h1.price - h2.price)/h1.price >= ?
ORDER BY drop_ratio DESC
""", (threshold,)).fetchall()
return [{
"ebay_id": r[0], "title": r[1], "old": r[2], "new": r[3],
"drop_pct": round(r[4]*100,1), "date": r[5]
} for r in rows]
价格异动检测与告警
拿当期价格和历史快照一对比,设定降幅阈值,触发告警。最后把各个模块整合起来,做一个任务调度。
class PriceAlert:
def __init__(self, db: PriceDatabase):
self.db = db
def check(self, products: List[Product]) -> List[dict]:
"""检测价格变动,生成降价告警"""
alerts = []
for p in products:
self.db.upsert_product(p)
self.db.record_price(p)
latest = self.db.get_latest(p.ebay_id)
if not latest or latest["price"] <= 0 or p.price <= 0:
continue
drop_rate = (latest["price"] - p.price) / latest["price"]
# 降幅≥5%触发告警
if drop_rate >= 0.05:
alerts.append({
"type": "↓",
"ebay_id": p.ebay_id,
"title": p.title,
"old": latest["price"],
"new": p.price,
"drop": round(drop_rate*100,1),
"url": p.url
})
return alerts
# 主运行入口
def track(keywords: List[str], delay: float = 3.0):
db = PriceDatabase()
scraper = EbaySearchScraper(proxy_user="your_user", proxy_pass="your_password")
detail = EbayDetailScraper(scraper)
alert = PriceAlert(db)
all_alerts = []
for kw in keywords:
print(f"=== 检索关键词:{kw} ===")
products = scraper.search(kw, max_pages=2, delay=delay)
for idx, p in enumerate(products[:10]):
print(f"解析详情 {idx+1}/10:{p.title[:30]}...")
detail.fetch_detail(p)
time.sleep(delay + random.uniform(1, 3))
all_alerts.extend(alert.check(products))
if all_alerts:
print(f"{'='*50}检测到{len(all_alerts)}条价格变动:")
for item in all_alerts:
print(f"{item['type']} 降幅{item['drop']}% | ${item['old']:.2f} → ${item['new']:.2f} | {item['title'][:35]}")
else:
print("未检测到显著价格变动")
# 启动监控任务
if __name__ == "__main__":
track(["mechanical keyboard", "rtx 4070", "sony wh-1000xm5"])
袋里选型与技术要点
袋里方案选型
eBay不同采集场景,袋里模式也得跟着调:
- 搜索列表 / 常规详情页:用动态转发模式,靠大规模IP池做请求隔离,配合头部动态换IP。
- 登录态 / 会话保持采集:用固定转发版,短时锁定IP维持会话有效。
IP切换有两种机制:
- TCP Keep-Alive:复用会话连接,连接重建后自动换IP。
- 请求头强制切换:通过随机数值强制换IP,不受网络连接状态影响,本方案优先采用这个。
常见故障排查
- 搜索结果为空:请求头
Accept-Language要设成en-US,别用requests默认UA,得匹配浏览器标识。 - 袋里407错误:账号密码校验失败,检查一下凭据。
- 价格解析异常:多币种、区间价场景得扩展清洗逻辑。
- CSS选择器失效:平台前端迭代改了Class,重新F12抓一下节点就成。
方案边界与合规说明
- 合规约束:遵循eBay robots.txt规则,请求间隔不低于3秒。数据仅用于个人价格分析,禁止商用分发。
- 场景限制:这套方案是定时批处理模式,不支持秒杀类实时监控。部分动态渲染页面得改用Playwright这类无头浏览器。
- 性能边界:批量高频轮询会提升风控概率,大规模监控建议拆分任务、分布式部署。
来源:互联网
本网站新闻资讯均来自公开渠道,力求准确但不保证绝对无误,内容观点仅代表作者本人,与本站无关。若涉及侵权,请联系我们处理。本站保留对声明的修改权,最终解释权归本站所有。