前言
本篇向大家介绍下语义分割任务中的常用的数据集。本文将会向大家介绍常用的一些数据集以及数据集的处理加载方式。本篇文章收录于语义分割专栏,如果对语义分割领域感兴趣的,可以去看看本专栏,会对经典的模型以及代码进行详细的讲解哦!其中会包含可复现的代码!
mask模式
在讲数据集前,首先向大家介绍一下我们语义标注文件的格式。
在语义分割中,标注文件一般都是P模式(调色板模式)或者是L模式(灰度模式)。如下图所示,左边就是L模式的图片展示,右边就是P模式的图片展示。
其实两种模式的内容是相同的,都是单通道的,但是其所表示的含义不同。P模式中的数字代表的是类别到调色板的映射,什么意思呢?就是8位最多有256个(0-255),其中每一个都映射了调色版中的一个颜色,当然最多也就只能映射256个颜色了,每个颜色代表了一个类别。P模式的标注图能够更直观的让我们看到图像的标注。而L模式中的数字代表的就是灰度值,同样的每个灰度值代表了一个类别。
我们可以通过如下方式来查看图像的模式:
print(Image.open('image.png').mode)
两者的读写也是不同的,对于L模式,我们用PIL或者cv2都是可以进行读取的,读灰度图模式即可,写的话直接保存即可:
# 读方式一:label = cv2.imread(label_path, 0)# 读方式二:label = np.asarray(Image.open(label_path), dtype=np.int32)# 写方式一:cv2.imwrite('gray_image.png', gray_image)# 写方式二:gray_image.save('gray_image.png')
对于P模式,我们只能用PIL库来进行处理(cv2库不支持P模式):
#读label = np.asarray(Image.open(label_path), dtype=np.int32)#写def save_colored_mask(mask, save_path): lbl_pil = Image.fromarray(mask.astype(np.uint8), mode="P") colormap = imgviz.label_colormap() lbl_pil.putpalette(colormap.flatten()) lbl_pil.save(save_path)
不过我们也可能遇到RGB模式的标注文件了(很少很少,基本上不会,其实就是为了更直观的查看而已),就是我们日常中经常使用的图像,这个就需要在数据加载的时候将RGB的格式转成P模式或者L模式进行加载,要不然会出错。后续有个数据集就是这么处理的。对于三种模式的总结如下,应该还是比较一目了然的。(其实不管L模式还是P模式,我们可以用PIL.Image.open()
统一读取的)
模式 | 通道数 | 每个像素的含义 | 适用于 | 常用读取方式 | 备注 |
---|---|---|---|---|---|
P (Palette) | 1通道(索引) | 类别ID ➔ 查调色板得到颜色 | 语义分割标注(类别索引型) | PIL.Image.open() | 像素值是类别索引,调色板映射成RGB |
L (Luminance) | 1通道(灰度值) | 0~255 灰度值 | 灰度图、深度图、标签图 | PIL.Image.open().convert('L') orcv2.imread(..., 0) | 直接表示亮度或类别ID,无调色板 |
RGB | 3通道 | 真实颜色(R,G,B各0~255) | 彩色图片、可视化图像 | cv2.imread() /PIL.Image.open().convert('RGB') | 每个像素是直接的颜色值 |
PASCAL-VOC2012
下载
数据集名称:PASCAL-VOC2012
数据集下载地址:The PASCAL Visual Object Classes Challenge 2012 (VOC2012)
在这里下载哈,2GB的那个。
数据集简介
VOC2012 数据集是Pascal Visual Object Classes (VOC) 持续的竞赛和挑战的一部分,广泛用于图像分类、目标检测、语义分割等任务。VOC2012 数据集的语义分割任务包含20个类别,主要用于评估物体级别的分割精度。
数据特点:
- 图像数量:训练集有1,464张图像,验证集有1,449张图像,测试集有1,456张图像。
- 类别:包括20个物体类别,如人、动物、交通工具、家具等,且每个图像都标注有相应的像素级标签。
- 格式:每张图像都附有一个对应的标注图(标签图),其每个像素值对应物体类别的ID。
数据加载(dataloader)
VOC2012的标注就是P模式的
import torchimport numpy as npfrom PIL import Imagefrom torch.utils.data import Dataset, DataLoaderimport osimport randomimport torchvision.transforms as TVOC_CLASSES = [ 'background','aeroplane','bicycle','bird','boat','bottle','bus', 'car','cat','chair','cow','diningtable','dog','horse', 'motorbike','person','potted plant','sheep','sofa','train','tv/monitor']VOC_COLORMAP = [ [0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0], [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128], [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0], [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128], [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0], [0, 64, 128]]class VOCSegmentation(Dataset): def __init__(self, root, split='train', img_size=320, augment=True): super(VOCSegmentation, self).__init__() self.root = root self.split = split self.img_size = img_size self.augment = augment img_dir = os.path.join(root, 'JPEGImages') mask_dir = os.path.join(root, 'SegmentationClass') split_file = os.path.join(root, 'ImageSets', 'Segmentation', f'{split}.txt') if not os.path.exists(split_file): raise FileNotFoundError(split_file) with open(split_file, 'r') as f: file_names = [x.strip() for x in f.readlines()] self.images = [os.path.join(img_dir, x + '.jpg') for x in file_names] self.masks = [os.path.join(mask_dir, x + '.png') for x in file_names] assert len(self.images) == len(self.masks) print(f"{split} set loaded: {len(self.images)} samples") self.normalize = T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) def __getitem__(self, index): img = Image.open(self.images[index]).convert('RGB') mask = Image.open(self.masks[index]) # mask为P模式(0~20的类别) # Resize img = img.resize((self.img_size, self.img_size), Image.BILINEAR) mask = mask.resize((self.img_size, self.img_size), Image.NEAREST) # 转Tensor img = T.functional.to_tensor(img) mask = torch.from_numpy(np.array(mask)).long() # 0~20 # 数据增强 if self.augment: if random.random() > 0.5: img = T.functional.hflip(img) mask = T.functional.hflip(mask) if random.random() > 0.5: img = T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2)(img) img = self.normalize(img) return img, mask def __len__(self): return len(self.images)def get_dataloader(data_path, batch_size=4, img_size=320, num_workers=4): train_dataset = VOCSegmentation(root=data_path, split='train', img_size=img_size, augment=True) val_dataset = VOCSegmentation(root=data_path, split='val', img_size=img_size, augment=False) train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size, pin_memory=True, num_workers=num_workers) val_loader = DataLoader(val_dataset, shuffle=False, batch_size=batch_size, pin_memory=True, num_workers=num_workers) return train_loader, val_loader
CamVid
下载
数据集名称:CamVid
在这里进行下载,CamVid数据集有两种,一种是官方的就是上述的下载地址的,总共有32种类别,划分的会更加的细致。但是一般官网的太难打开了,所以我们可以通过Kaggle中的CamVid (Cambridge-Driving Labeled Video Database)进行下载。
还有一种就是11类别的(不包括背景),会将一些语义相近的内容进行合并,就划分的没有这么细致,任务难度也会比较低一些。(如果你在网上找不到的话,可以在评论区发言或是私聊我要取)
数据集简介
CamVid 数据集主要用于自动驾驶场景中的语义分割,包含驾驶场景中的道路、交通标志、车辆等类别的标注图像。该数据集旨在推动自动驾驶系统在道路场景中的表现。
数据特点:
- 图像数量:包括701帧视频序列图像,分为训练集、验证集和测试集。
- 类别:包含32个类别(也有包含11个类别的),包括道路、建筑物、车辆、行人等。
- 挑战:由于数据集主要来自城市交通场景,因此面临着动态变化的天气、光照、交通密度等挑战
数据加载(dataloader)
CamVid我从kaggle中下载的就是RGB的mask图像,所以我们需要先将其转换为单通道mask图像(L模式)
def mask_to_class(mask): mask_class = np.zeros((mask.shape[0], mask.shape[1]), dtype=np.uint8) for idx, color in enumerate(Cam_COLORMAP): color = np.array(color) # 每个像素和当前颜色匹配 matches = np.all(mask == color, axis=-1) mask_class[matches] = idx return mask_class
故数据加载的代码为(以下为32类的,如果需要十一类改下Cam_CLASSES和Cam_COLORMAP即可):
import osfrom PIL import Imageimport albumentations as Afrom albumentations.pytorch.transforms import ToTensorV2from torch.utils.data import Dataset, DataLoaderimport numpy as np# 32类Cam_CLASSES = ['Animal', 'Archway', 'Bicyclist', 'Bridge', 'Building', 'Car', 'CartLuggagePram', 'Child', 'Column_Pole', 'Fence', 'LaneMkgsDriv', 'LaneMkgsNonDriv', 'Misc_Text', 'MotorcycleScooter', 'OtherMoving', 'ParkingBlock', 'Pedestrian', 'Road', 'RoadShoulder', 'Sidewalk', 'SignSymbol', 'Sky', 'SUVPickupTruck', 'TrafficCone', 'TrafficLight', 'Train', 'Tree', 'Truck_Bus', 'Tunnel', 'VegetationMisc', 'Void', 'Wall']# 用于做可视化Cam_COLORMAP = [ [64, 128, 64], [192, 0, 128], [0, 128, 192], [0, 128, 64], [128, 0, 0], [64, 0, 128], [64, 0, 192], [192, 128, 64], [192, 192, 128], [64, 64, 128], [128, 0, 192], [192, 0, 64], [128, 128, 64], [192, 0, 192], [128, 64, 64], [64, 192, 128], [64, 64, 0], [128, 64, 128], [128, 128, 192], [0, 0, 192], [192, 128, 128], [128, 128, 128], [64, 128, 192], [0, 0, 64], [0, 64, 64], [192, 64, 128], [128, 128, 0], [192, 128, 192], [64, 0, 64], [192, 192, 0], [0, 0, 0], [64, 192, 0]]# 转换RGB mask为类别id的函数def mask_to_class(mask): mask_class = np.zeros((mask.shape[0], mask.shape[1]), dtype=np.uint8) for idx, color in enumerate(Cam_COLORMAP): color = np.array(color) # 每个像素和当前颜色匹配 matches = np.all(mask == color, axis=-1) mask_class[matches] = idx return mask_classclass CamVidDataset(Dataset): def __init__(self, image_dir, label_dir): self.image_dir = image_dir self.label_dir = label_dir self.transform = A.Compose([ A.Resize(448, 448), A.HorizontalFlip(), A.VerticalFlip(), A.Normalize(), ToTensorV2(), ]) self.images = sorted(os.listdir(image_dir)) self.labels = sorted(os.listdir(label_dir)) assert len(self.images) == len(self.labels),"Images and labels count mismatch!" def __len__(self): return len(self.images) def __getitem__(self, idx): img_path = os.path.join(self.image_dir, self.images[idx]) label_path = os.path.join(self.label_dir, self.labels[idx]) image = np.array(Image.open(img_path).convert("RGB")) label_rgb = np.array(Image.open(label_path).convert("RGB")) # RGB转类别索引 label_class = mask_to_class(label_rgb) # Albumentations 需要 (H, W, 3) 和 (H, W) transformed = self.transform(image=image, mask=label_class) return transformed['image'], transformed['mask']def get_dataloader(data_path, batch_size=4, num_workers=4): train_dir = os.path.join(data_path, 'train') val_dir = os.path.join(data_path, 'val') trainlabel_dir = os.path.join(data_path, 'train_labels') vallabel_dir = os.path.join(data_path, 'val_labels') train_dataset = CamVidDataset(train_dir, trainlabel_dir) val_dataset = CamVidDataset(val_dir, vallabel_dir) train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size, pin_memory=True, num_workers=num_workers) val_loader = DataLoader(val_dataset, shuffle=False, batch_size=batch_size, pin_memory=True, num_workers=num_workers) return train_loader, val_loader
Cityscape
下载
数据集名称:Cityscape
数据集下载地址:Download – Cityscapes Dataset
需要注册才能够登录,然后这三个都需要进行下载,因为后续我们数据处理的时候会要用到的。
其中leftImg8bit就是图片文件,gtFine就是精细标注的文件,gtCoarse就是粗略标注的文件。其实我们只需要用到的就是gtFine中的,但是我们后续数据处理的时候需要用到gtCoarse,要不然就会报错。
数据集简介
Cityscapes是一个专注于城市街道的语义分割数据集,特别适用于自动驾驶和城市环境中的语义分割任务。它提供了详细的像素级标注,涵盖了各种城市景观,如街道、建筑、交通信号灯等。
数据特点:
- 图像数量:包含5,000张高分辨率图像,其中包括2,975张训练图像,500张验证图像,1,525张测试图像。
- 类别:包含30个类别,其中19个是常见的语义分割类别,如道路、人行道、建筑物、车辆等。
- 挑战:数据集注重高分辨率图像(2048x1024),适用于复杂城市街道场景的分割任务。
数据集处理
下载好这三个文件之后,我们需要通过代码来生成标注,可以直接下载对应的库,也可以去github上下载对应的工具。
pip install cityscapesscripts
下载好之后进入preparation文件夹内找到createTrainIdLabelImgs.py文件
加上这么一句,你的CityScape地址,然后直接运行该文件即可。注意你的地址文件夹下面应该包含gtFine和gtCoarse两个文件夹。
os.environ['CITYSCAPES_DATASET'] = r"your Cityscape"
运行之后就可以直接生成标注文件了,注意这里生成的是19个类的标注文件,其实原gtFine文件夹中有标注文件的,只不过是33类别的。我们一般用到的都是19类别的。以labelids结尾的就是33类别的,labelTrainids结尾的就是我们刚刚生成的19类别的标准文件了。
然后我们将数据文件和标注整理下
import osimport randomimport shutil# 处理图片dataset_path = r"D:\博客记录\语义分割\data\Cityscape"# 原始的train, valid文件夹路径train_dataset_path = os.path.join(dataset_path, 'leftImg8bit/train')val_dataset_path = os.path.join(dataset_path, 'leftImg8bit/val')test_dataset_path = os.path.join(dataset_path, 'leftImg8bit/test')# 创建train,valid的文件夹train_images_path = os.path.join(dataset_path, 'train_images')val_images_path = os.path.join(dataset_path, 'val_images')test_images_path = os.path.join(dataset_path, 'test_images')# 处理标注# 原始的train_label, valid_label文件夹路径train_label_path = os.path.join(dataset_path, 'gtFine/train')val_label_path = os.path.join(dataset_path, 'gtFine/val')test_label_path = os.path.join(dataset_path, 'gtFine/test')# 创建train,valid的文件夹train_images_label_path = os.path.join(dataset_path, 'train_labels(19)')val_images_label_path = os.path.join(dataset_path, 'val_labels(19)')test_images_label_path = os.path.join(dataset_path, 'test_labels(19)')if os.path.exists(train_images_path) == False: os.mkdir(train_images_path)if os.path.exists(val_images_path) == False: os.mkdir(val_images_path)if os.path.exists(test_images_path) == False: os.mkdir(test_images_path)if os.path.exists(train_images_label_path) == False: os.mkdir(train_images_label_path)if os.path.exists(val_images_label_path) == False: os.mkdir(val_images_label_path)if os.path.exists(test_images_label_path) == False: os.mkdir(test_images_label_path)# -----------------移动文件夹-------------------------------------------------for file_name in os.listdir(train_dataset_path): file_path = os.path.join(train_dataset_path, file_name) for image in os.listdir(file_path): shutil.copy(os.path.join(file_path, image), os.path.join(train_images_path, image))for file_name in os.listdir(val_dataset_path): file_path = os.path.join(val_dataset_path, file_name) for image in os.listdir(file_path): shutil.copy(os.path.join(file_path, image), os.path.join(val_images_path, image))for file_name in os.listdir(test_dataset_path): file_path = os.path.join(test_dataset_path, file_name) for image in os.listdir(file_path): shutil.copy(os.path.join(file_path, image), os.path.join(test_images_path, image)) for file_name in os.listdir(train_label_path): file_path = os.path.join(train_label_path, file_name) for image in os.listdir(file_path): # 查找对应的后缀名,然后保存到文件中 if image.split('.png')[0][-13:] =="labelTrainIds": # print(image) shutil.copy(os.path.join(file_path, image), os.path.join(train_images_label_path, image))for file_name in os.listdir(val_label_path): file_path = os.path.join(val_label_path, file_name) for image in os.listdir(file_path): if image.split('.png')[0][-13:] =="labelTrainIds": shutil.copy(os.path.join(file_path, image), os.path.join(val_images_label_path, image))for file_name in os.listdir(test_label_path): file_path = os.path.join(test_label_path, file_name) for image in os.listdir(file_path): if image.split('.png')[0][-13:] =="labelTrainIds": shutil.copy(os.path.join(file_path, image), os.path.join(test_images_label_path, image))
结果如下图所示(当然tese_labels(19)文件夹内的数据都是无效的):
数据加载(dataloader)
Cityscapes的标注图像是L模式的
import osfrom PIL import Imageimport albumentations as Afrom albumentations.pytorch.transforms import ToTensorV2from torch.utils.data import Dataset, DataLoaderimport numpy as npCITYSCAPES_CLASSES = ["road","sidewalk","building","wall","fence","pole","traffic light","traffic sign","vegetation","terrain","sky","person","rider","car","truck","bus","train","motorcycle","bicycle","background",]CITYSCAPES_COLORMAP = [ [128, 64, 128], # road [244, 35, 232], # sidewalk [70, 70, 70], # building [102, 102, 156], # wall [190, 153, 153], # fence [153, 153, 153], # pole [250, 170, 30], # traffic light [220, 220, 0], # traffic sign [107, 142, 35], # vegetation [152, 251, 152], # terrain [70, 130, 180], # sky [220, 20, 60], # person [255, 0, 0], # rider [0, 0, 142], # car [0, 0, 70], # truck [0, 60, 100], # bus [0, 80, 100], # train [0, 0, 230], # motorcycle [119, 11, 32], # bicycle [0, 0, 0],# background]class CityScapes(Dataset): def __init__(self, image_dir, label_dir): self.image_dir = image_dir self.label_dir = label_dir self.transform = A.Compose([ A.Resize(448, 448), A.HorizontalFlip(), A.VerticalFlip(), A.Normalize(), ToTensorV2(), ]) self.images = sorted(os.listdir(image_dir)) self.labels = sorted(os.listdir(label_dir)) assert len(self.images) == len(self.labels),"Images and labels count mismatch!" def __len__(self): return len(self.images) def __getitem__(self, idx): img_path = os.path.join(self.image_dir, self.images[idx]) label_path = os.path.join(self.label_dir, self.labels[idx]) image = np.array(Image.open(img_path).convert("RGB")) label = np.array(Image.open(label_path)) # Albumentations 需要 (H, W, 3) 和 (H, W) transformed = self.transform(image=image, mask=label) return transformed['image'], transformed['mask']def get_dataloader(data_path, batch_size=4, num_workers=4): train_dir = os.path.join(data_path,'train_images') val_dir = os.path.join(data_path,'val_images') trainlabel_dir = os.path.join(data_path, 'train_labels(19)') vallabel_dir = os.path.join(data_path, 'val_labels(19)') train_dataset = CityScapes(train_dir, trainlabel_dir) val_dataset = CityScapes(val_dir, vallabel_dir) train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size, pin_memory=True, num_workers=num_workers) val_loader = DataLoader(val_dataset, shuffle=False, batch_size=batch_size, pin_memory=True, num_workers=num_workers) return train_loader, val_loader
ADE20K
下载
数据集名称:ADE20K
数据集下载地址:ADE20K dataset
在这里进行下载。但是也需要注册什么的比较麻烦。
数据集简介
ADE20k是一个大规模的语义分割数据集,涵盖了各种日常场景,提供了广泛的类标注,用于评估图像分割算法的性能。
数据特点:
- 图像数量:数据集包含20K多张图像,其中包括训练集、验证集和测试集。
- 类别:有150个语义类别,涵盖了物体、场景以及物体的一些具体部分(例如家具、道路、天空等)。
- 挑战:ADE20k是一个多类别、多样化的数据集,标注了大量不同的对象类别,适用于大规模的语义分割研究。
数据加载(dataloader)
ADE20k的标注也是L模式的
import osfrom PIL import Imageimport albumentations as Afrom albumentations.pytorch.transforms import ToTensorV2from torch.utils.data import Dataset, DataLoaderimport numpy as npADE_CLASSES = ["background","wall","building","sky","floor","tree","ceiling","road","bed","windowpane","grass","cabinet","sidewalk","person","earth","door","table","mountain","plant","curtain","chair","car","water","painting","sofa","shelf","house","sea","mirror","rug","field","armchair","seat","fence","desk","rock","wardrobe","lamp","bathtub","railing","cushion","base","box","column","signboard","chest of drawers","counter","sand","sink","skyscraper","fireplace","refrigerator","grandstand","path","stairs","runway","case","pool table","pillow","screen door","stairway","river","bridge","bookcase","blind","coffee table","toilet","flower","book","hill","bench","countertop","stove","palm","kitchen island","computer","swivel chair","boat","bar","arcade machine","hovel","bus","towel","light","truck","tower","chandelier","awning","streetlight","booth","television receiver","airplane","dirt track","apparel","pole","land","bannister","escalator","ottoman","bottle","buffet","poster","stage","van","ship","fountain","conveyer belt","canopy","washer","plaything","swimming pool","stool","barrel","basket","waterfall","tent","bag","minibike","cradle","oven","ball","food","step","tank","trade name","microwave","pot","animal","bicycle","lake","dishwasher","screen","blanket","sculpture","hood","sconce","vase","traffic light","tray","ashcan","fan","pier","crt screen","plate","monitor","bulletin board","shower","radiator","glass","clock","flag"]ADE_COLORMAP = [[0,0,0], [120, 120, 120], [180, 120, 120], [6, 230, 230], [80, 50, 50], [4, 200, 3], [120, 120, 80], [140, 140, 140], [204, 5, 255], [230, 230, 230], [4, 250, 7], [224, 5, 255], [235, 255, 7], [150, 5, 61], [120, 120, 70], [8, 255, 51], [255, 6, 82], [143, 255, 140], [204, 255, 4], [255, 51, 7], [204, 70, 3], [0, 102, 200], [61, 230, 250], [255, 6, 51], [11, 102, 255], [255, 7, 71], [255, 9, 224], [9, 7, 230], [220, 220, 220], [255, 9, 92], [112, 9, 255], [8, 255, 214], [7, 255, 224], [255, 184, 6], [10, 255, 71], [255, 41, 10], [7, 255, 255], [224, 255, 8], [102, 8, 255], [255, 61, 6], [255, 194, 7], [255, 122, 8], [0, 255, 20], [255, 8, 41], [255, 5, 153], [6, 51, 255], [235, 12, 255], [160, 150, 20], [0, 163, 255], [140, 140, 140], [250, 10, 15], [20, 255, 0], [31, 255, 0], [255, 31, 0], [255, 224, 0], [153, 255, 0], [0, 0, 255], [255, 71, 0], [0, 235, 255], [0, 173, 255], [31, 0, 255], [11, 200, 200], [255, 82, 0], [0, 255, 245], [0, 61, 255], [0, 255, 112], [0, 255, 133], [255, 0, 0], [255, 163, 0], [255, 102, 0], [194, 255, 0], [0, 143, 255], [51, 255, 0], [0, 82, 255], [0, 255, 41], [0, 255, 173], [10, 0, 255], [173, 255, 0], [0, 255, 153], [255, 92, 0], [255, 0, 255], [255, 0, 245], [255, 0, 102], [255, 173, 0], [255, 0, 20], [255, 184, 184], [0, 31, 255], [0, 255, 61], [0, 71, 255], [255, 0, 204], [0, 255, 194], [0, 255, 82], [0, 10, 255], [0, 112, 255], [51, 0, 255], [0, 194, 255], [0, 122, 255], [0, 255, 163], [255, 153, 0], [0, 255, 10], [255, 112, 0], [143, 255, 0], [82, 0, 255], [163, 255, 0], [255, 235, 0], [8, 184, 170], [133, 0, 255], [0, 255, 92], [184, 0, 255], [255, 0, 31], [0, 184, 255], [0, 214, 255], [255, 0, 112], [92, 255, 0], [0, 224, 255], [112, 224, 255], [70, 184, 160], [163, 0, 255], [153, 0, 255], [71, 255, 0], [255, 0, 163], [255, 204, 0], [255, 0, 143], [0, 255, 235], [133, 255, 0], [255, 0, 235], [245, 0, 255], [255, 0, 122], [255, 245, 0], [10, 190, 212], [214, 255, 0], [0, 204, 255], [20, 0, 255], [255, 255, 0], [0, 153, 255], [0, 41, 255], [0, 255, 204], [41, 0, 255], [41, 255, 0], [173, 0, 255], [0, 245, 255], [71, 0, 255], [122, 0, 255], [0, 255, 184], [0, 92, 255], [184, 255, 0], [0, 133, 255], [255, 214, 0], [25, 194, 194], [102, 255, 0], [92, 0, 255]]class ADE20kDataset(Dataset): def __init__(self, image_dir, label_dir): self.image_dir = image_dir self.label_dir = label_dir self.transform = A.Compose([ A.Resize(448, 448), A.HorizontalFlip(), A.VerticalFlip(), A.Normalize(), ToTensorV2(), ]) self.images = sorted(os.listdir(image_dir)) self.labels = sorted(os.listdir(label_dir)) assert len(self.images) == len(self.labels),"Images and labels count mismatch!" def __len__(self): return len(self.images) def __getitem__(self, idx): img_path = os.path.join(self.image_dir, self.images[idx]) label_path = os.path.join(self.label_dir, self.labels[idx]) image = np.array(Image.open(img_path).convert("RGB")) label = np.array(Image.open(label_path)) # Albumentations 需要 (H, W, 3) 和 (H, W) transformed = self.transform(image=image, mask=label) return transformed['image'], transformed['mask']def get_dataloader(data_path, batch_size=4, num_workers=4): train_dir = os.path.join(data_path,'images', 'training') val_dir = os.path.join(data_path,'images', 'validation') trainlabel_dir = os.path.join(data_path, 'annotations','training') vallabel_dir = os.path.join(data_path, 'annotations','validation') train_dataset = ADE20kDataset(train_dir, trainlabel_dir) val_dataset = ADE20kDataset(val_dir, vallabel_dir) train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size, pin_memory=True, num_workers=num_workers) val_loader = DataLoader(val_dataset, shuffle=False, batch_size=batch_size, pin_memory=True, num_workers=num_workers) return train_loader, val_loader
COCO2017
下载
数据集名称:COCO2017
数据集下载地址:COCO - Common Objects in Context
下载我所框出来的三个文件,分别是训练图像,验证图像和标注文件。标注文件是json格式的,后续我们将会进行转换。
数据集简介
COCO2017 数据集是一个大规模的数据集,设计用于物体检测、分割和关键点检测等任务。COCO数据集特别注重“上下文”信息,提供了丰富的标注和不同尺度的物体实例。
数据特点:
- 图像数量:训练集包含118K张图像,验证集包含5K张图像,测试集包含20K张图像。
- 类别:包括80个物体类别,如人、动物、交通工具、家具等。
- 标注:除了物体检测和实例分割标注外,COCO还提供了分割掩码、关键点标注等。
- 复杂性:COCO包含大量的遮挡、重叠物体、不同背景等,适合挑战性强的分割任务。
数据集处理
下载好三个文件之后,我们需要通过annotation中的instances_train2017和instances_val2017来转换获得我们的mask标注文件,我将其转换成了P模式的。
转换代码为:
import osimport numpy as npfrom pycocotools.coco import COCOfrom pycocotools import mask as maskUtilsfrom PIL import Imagefrom tqdm import tqdmimport imgvizdef coco_to_semantic_mask_pmode(coco_json_path, image_dir, save_mask_dir): os.makedirs(save_mask_dir, exist_ok=True) coco = COCO(coco_json_path) img_ids = coco.getImgIds() cat_ids = coco.getCatIds() catid2label = {cat_id: i+1 for i, cat_id in enumerate(cat_ids)} # +1保留0为背景 colormap = imgviz.label_colormap() # 得到颜色映射 for img_id in tqdm(img_ids): img_info = coco.loadImgs(img_id)[0] ann_ids = coco.getAnnIds(imgIds=img_id) anns = coco.loadAnns(ann_ids) h, w = img_info['height'], img_info['width'] mask = np.zeros((h, w), dtype=np.uint8) for ann in anns: cat_id = ann['category_id'] label = catid2label[cat_id] rle = coco.annToRLE(ann) m = maskUtils.decode(rle) mask[m == 1] = label # 保存成P模式 (带调色板) mask_img = Image.fromarray(mask, mode='P') # 设置调色板(需要1维list) palette = [v for color in colormap for v in color] palette += [0] * (256 * 3 - len(palette)) # 填满256个颜色 mask_img.putpalette(palette) mask_img.save(os.path.join(save_mask_dir, f"{img_info['file_name'].replace('.jpg', '.png')}")) print(f"已保存 {len(img_ids)} 张P模式语义分割mask图到 {save_mask_dir}")# ==== 用法 ====# 训练集coco_to_semantic_mask_pmode( coco_json_path='../../data/COCO2017/annotations1/instances_train2017.json', image_dir='../../data/COCO2017/train2017', save_mask_dir='../../data/COCO2017/train2017_labels')# 验证集coco_to_semantic_mask_pmode( coco_json_path='../../data/COCO2017/annotations1/instances_val2017.json', image_dir='../../data/COCO2017/val2017', save_mask_dir='../../data/COCO2017/val2017_labels')
数据加载(dataloader)
经过我们数据集处理后的COCO2017的标注就是P模式的
import osfrom PIL import Imageimport albumentations as Afrom albumentations.pytorch.transforms import ToTensorV2from torch.utils.data import Dataset, DataLoaderimport numpy as npimport imgvizCOCO_CLASSES = ["background","person","bicycle","car","motorcycle","airplane","bus","train","truck","boat","traffic light","fire hydrant","stop sign","parking meter","bench","bird","cat","dog","horse","sheep","cow","elephant","bear","zebra","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite","baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife","spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza","donut","cake","chair","couch","potted plant","bed","dining table","toilet","tv","laptop","mouse","remote","keyboard","cell phone","microwave","oven","toaster","sink","refrigerator","book","clock","vase","scissors","teddy bear","hair drier","toothbrush"]COCO_COLORMAP = imgviz.label_colormap()class COCO2017Dataset(Dataset): def __init__(self, image_dir, label_dir): self.image_dir = image_dir self.label_dir = label_dir self.transform = A.Compose([ A.Resize(448, 448), A.HorizontalFlip(), A.VerticalFlip(), A.Normalize(), ToTensorV2(), ]) self.images = sorted(os.listdir(image_dir)) self.labels = sorted(os.listdir(label_dir)) assert len(self.images) == len(self.labels),"Images and labels count mismatch!" def __len__(self): return len(self.images) def __getitem__(self, idx): img_path = os.path.join(self.image_dir, self.images[idx]) label_path = os.path.join(self.label_dir, self.labels[idx]) image = np.array(Image.open(img_path).convert("RGB")) label = np.array(Image.open(label_path)) # Albumentations 需要 (H, W, 3) 和 (H, W) transformed = self.transform(image=image, mask=label) return transformed['image'], transformed['mask']def get_dataloader(data_path, batch_size=4, num_workers=4): train_dir = os.path.join(data_path,'train2017') val_dir = os.path.join(data_path,'val2017') trainlabel_dir = os.path.join(data_path, 'train2017_labels') vallabel_dir = os.path.join(data_path, 'val2017_labels') train_dataset = COCO2017Dataset(train_dir, trainlabel_dir) val_dataset = COCO2017Dataset(val_dir, vallabel_dir) train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size, pin_memory=True, num_workers=num_workers) val_loader = DataLoader(val_dataset, shuffle=False, batch_size=batch_size, pin_memory=True, num_workers=num_workers) return train_loader, val_loader
结语
上述所述的相关数据集就是语义分割任务中常用的数据集了,各个数据集都各有特点:
- VOC2012更适用于初学者和基本实验。
- COCO2017和ADE20k则适用于大规模、多类别的复杂分割任务。
- CamVid和Cityscapes主要聚焦于自动驾驶和城市环境中的语义分割。
希望上列所述内容对你有所帮助,如果有错误的地方欢迎大家批评指正!
如果你觉得讲的还不错想转载,可以直接转载,不过麻烦指出本文来源出处即可,谢谢!
参考内容
语义分割标签——mask的读取与保存_mask文件-CSDN博客
语义分割数据集:Cityscapes的使用_cityscapes数据集-CSDN博客
图像语意分割Cityscapes训练数据集使用方法详解_图像语意分割训练cityscapes数据集segnet-convnet神经网络详解-CSDN博客