Warning: include_once(/www/wwwroot/roadke.com/wp-content/plugins/wp-super-cache/wp-cache-phase1.php): Failed to open stream: No such file or directory in /www/wwwroot/www.roadke.com/wp-content/advanced-cache.php on line 22

Warning: include_once(): Failed opening '/www/wwwroot/roadke.com/wp-content/plugins/wp-super-cache/wp-cache-phase1.php' for inclusion (include_path='.:') in /www/wwwroot/www.roadke.com/wp-content/advanced-cache.php on line 22

Warning: include(/www/wwwroot/roadke.com/wp-content/plugins/wp-super-cache/wp-cache-base.php): Failed to open stream: No such file or directory in /www/wwwroot/www.roadke.com/wp-content/plugins/wp-super-cache/wp-cache.php on line 137

Warning: include(): Failed opening '/www/wwwroot/roadke.com/wp-content/plugins/wp-super-cache/wp-cache-base.php' for inclusion (include_path='.:') in /www/wwwroot/www.roadke.com/wp-content/plugins/wp-super-cache/wp-cache.php on line 137

Warning: include_once(/www/wwwroot/roadke.com/wp-content/plugins/wp-super-cache/ossdl-cdn.php): Failed to open stream: No such file or directory in /www/wwwroot/www.roadke.com/wp-content/plugins/wp-super-cache/wp-cache.php on line 174

Warning: include_once(): Failed opening '/www/wwwroot/roadke.com/wp-content/plugins/wp-super-cache/ossdl-cdn.php' for inclusion (include_path='.:') in /www/wwwroot/www.roadke.com/wp-content/plugins/wp-super-cache/wp-cache.php on line 174
Python实现从CSV文件(数据库)批量提取图片链接|合肥小程序开发公司

Python实现从CSV文件(数据库)批量提取图片链接

因为老网站内容使用了第三方的网站的图片外链,随时都可能有图片打不开的风险。于是需要把30多兆的数据库文件中的图片链接提取出来并批量下载下来到oss存储。

合肥小程序开发公司插图

首先把网站数据库导出成CSV文件,通过PYthon脚本批量提取出csv文件里面的图片链接。(注意csv文件字段数限制

  1. csv.field_size_limit(500*1024*1024)

以下代码提取csv中的图片链接,并打印出来。

合肥小程序开发公司插图1

python代码如下

import csv
import re

def extract_image_links(csv_file):
    image_links = []
    with open(csv_file, 'r',encoding='utf-8') as file:
        reader = csv.reader(file)
        for row in reader:
            for item in row:
                # 使用正则表达式提取图片链接
                matches = re.findall(r'(https?://\S+\.png|https?://\S+\.jpg|https?://\S+\.jpeg|https?://\S+\.gif)', item)
                if matches:
                    image_links.extend(matches)
    return image_links

csv_file = '你的文件.csv'
image_links = extract_image_links(csv_file)
for link in image_links:
    print(link)

原创转载需要经过作者同意

Scroll to Top