2345天气王国内天气国际天气数据爬虫

csdn上大多数的2345天气爬虫都是用的老旧接口,虽然接口还没有关闭,但是在整个访问通信过程中已经看不到像 http://tianqi.2345.com/t/wea_history/js 这种旧接口的影子了。

搜寻接口

打开chrmoe的network工具,刷新找找可以的接口。由于通信数据可能是编码后的,所以搜索汉语一般不奏效。观察天气数据发现会有2021,于是搜索2021,出现唯一接口

打开接口发现是Unicode编码的json数据。

接口参数

接口共有四个字段,分别是areaInfo[areaId],areaInfo[areaType],date[year],date[month]。

1
http://tianqi.2345.com/Pc/GetHistory?areaInfo[areaId]=349727&areaInfo[areaType]=1&date[year]=2021&date[month]=2

第一个是地区编号,不是很好找,但是在通信过程中发现interCitySelectData2.js里面全是一些国际城市的编号,国内城市的编号在citySelectData2.js里。

1
2
3
http://tianqi.2345.com/tqpcimg/tianqiimg/theme4/js/citySelectData2.js

http://tianqi.2345.com/tqpcimg/tianqiimg/theme4/js/interCitySelectData2.js

第二个areaInfo[areaType]意义不明,去掉也可以正常返回。后面两个就是年份和月份。

全部代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

import requests
import re
import json
from bs4 import BeautifulSoup
import pandas as pd

months = [1,2,3,4,5,6,7,8,9,10,11,12]
years = [2015,2016,2017,2018,2019,2020,]


index_ = ['date','MaxTemp','MinTemp', 'Weather','Wind'] # 选取的气象要素
data = pd.DataFrame(columns=index_) # 建立一个空dataframe


for y in years:
for m in months:
url = 'http://tianqi.2345.com/Pc/GetHistory?areaInfo[areaId]=349727&areaInfo[areaType]=1&date[year]='+str(y)+'&date[month]='+str(m)
response = requests.get(url=url)
if response.status_code == 200: # 防止url请求无响应
#print(json.loads(response.text)['data'])
html_str = json.loads(response.text)['data']
soup = BeautifulSoup(html_str,'lxml')
tr = soup.table.select("tr")
for i in tr[1:]:

td = i.select('td')
tmp = []
for j in td:
#print(re.sub('<.*?>',"",str(j)))
tmp.append(re.sub('<.*?>',"",str(j)))
#print(tmp)
data_spider = pd.DataFrame(tmp).T
data_spider.columns = index_ # 修改列名
#data_spider.index = date # 修改索引
data = pd.concat((data,data_spider), axis=0) # 数据拼接
#print(data)
data.to_excel('weatherdata_ny.xlsx')


结果

Donate
  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
  • Copyrights © 2015-2024 galaxy
  • Visitors: | Views:

请我喝杯咖啡吧~

支付宝
微信