tensorflow_datasets 라이브러리는 딥러닝 훈련하고 평가하는 데 무료로 사용할 수 있는 데이터셋을 제공한다.

import tensorflow_datasets as tfds

print(len(tfds.list_builders()))

print(tfds.list_builders()[:5])

278

['abstract_reasoning', 'accentdb', 'aeslc', 'aflw2k3d', 'ag_news_subset']

총 278개(계속 증가)의 데이터셋이 이용가능하다.

get Celeb_a, MNIST datasets

1. 데이터셋의 builder() 함수를 호출

2. download_and_prepare() 메서드를 실행

3. as_dataset() 메서드를 호출

celeba_bldr=tfds.builder('celeb_a')

print(celeba_bldr.info.features)

FeaturesDict({
'attributes': FeaturesDict({
'5_o_Clock_Shadow': tf.bool,
'Arched_Eyebrows': tf.bool,
'Attractive': tf.bool,
'Bags_Under_Eyes': tf.bool,
'Bald': tf.bool,
'Bangs': tf.bool,
'Big_Lips': tf.bool,
'Big_Nose': tf.bool,
'Black_Hair': tf.bool,
'Blond_Hair': tf.bool,
'Blurry': tf.bool,
'Brown_Hair': tf.bool,
'Bushy_Eyebrows': tf.bool,
'Chubby': tf.bool,
'Double_Chin': tf.bool,
'Eyeglasses': tf.bool,
'Goatee': tf.bool,
'Gray_Hair': tf.bool,
'Heavy_Makeup': tf.bool,
'High_Cheekbones': tf.bool,
'Male': tf.bool,
'Mouth_Slightly_Open': tf.bool,
'Mustache': tf.bool,
'Narrow_Eyes': tf.bool,
'No_Beard': tf.bool,
'Oval_Face': tf.bool,
'Pale_Skin': tf.bool,
'Pointy_Nose': tf.bool,
'Receding_Hairline': tf.bool,
'Rosy_Cheeks': tf.bool,
'Sideburns': tf.bool,
'Smiling': tf.bool,
'Straight_Hair': tf.bool,
'Wavy_Hair': tf.bool,
'Wearing_Earrings': tf.bool,
'Wearing_Hat': tf.bool,
'Wearing_Lipstick': tf.bool,
'Wearing_Necklace': tf.bool,
'Wearing_Necktie': tf.bool,
'Young': tf.bool,
}),
'image': Image(shape=(218, 178, 3), dtype=tf.uint8),
'landmarks': FeaturesDict({
'lefteye_x': tf.int64,
'lefteye_y': tf.int64,
'leftmouth_x': tf.int64,
'leftmouth_y': tf.int64,
'nose_x': tf.int64,
'nose_y': tf.int64,
'righteye_x': tf.int64,
'righteye_y': tf.int64,
'rightmouth_x': tf.int64,
'rightmouth_y': tf.int64,
}),
})

print(celeba_bldr.infor.features)

Image(shape=(218, 178, 3), dtype=tf.uint8)

print(celeba_bldr.info.features['attributes'].keys())c

dict_keys(['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young'])

print(celeba_bldr.info.citation)

@inproceedings{conf/iccv/LiuLWT15,

added-at = {2018-10-09T00:00:00.000+0200},

author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},

biburl = {https://www.bibsonomy.org/bibtex/250e4959be61db325d2f02c1d8cd7bfbb/dblp},

booktitle = {ICCV},

crossref = {conf/iccv/2015},

ee = {http://doi.ieeecomputersociety.org/10.1109/ICCV.2015.425},

interhash = {3f735aaa11957e73914bbe2ca9d5e702},

intrahash = {50e4959be61db325d2f02c1d8cd7bfbb},

isbn = {978-1-4673-8391-2},

keywords = {dblp},

pages = {3730-3738},

publisher = {IEEE Computer Society},

timestamp = {2018-10-11T11:43:28.000+0200},

title = {Deep Learning Face Attributes in the Wild.},

url = {http://dblp.uni-trier.de/db/conf/iccv/iccv2015.html#LiuLWT15},

year = 2015

}

데이터셋의 구조

데이터셋의 특성은 ‘image’, ‘landmarks’, ‘attributes’ 세 개의 키를 가진 딕셔너리로 저장되어 있다.

image: 유명인사의 얼굴 이미지를 담고 있다.

landmarks: 얼굴에서 추출한 위치로 구성된 딕셔너리를 담고 있다. ex) 눈의 위치, 코의 위치 등

attributes: 이미지에 있는 사람의 얼굴 속성 40개를 담고 있는 딕셔너리(얼굴 표정, 화장, 머리카락 특징 등)

download_and_prepare()

데이터를 내려받고 모든 텐서플로 데이터셋을 위해 지정된 폴더에 저장한다.

(지정된 폴더에 해당 데이터가 있으면, 다시 저장하지 않음)

celeba_bldr.download_and_prepare() #for Mac(Bonita) /Users/tensorflow_datasets/download/manual

download_dir 매개변수를 통해 원하는 위치에 저장할 수 있다.

현재 서버 에러로 수동으로 다운

#원래는 아래와 같이 처리

celeba_bldr.download_and_prepare()

datasets=celeba_bldr.as_dataset(shuffle_files=False)

datasets.keys()

dict_keys([‘test’, ‘train’, ‘validation’])

이하 오류 내용 책 참고 이후 시도해 볼것!!

머신러닝교과서with파이썬,사이킷런,텐서플로_개정3판

아래는 실습x

ds_train=datasets['train']

assert isinstance(ds_train, tf.data.Dataset)

example=next(iter(ds_train)

print(type(example))

print(example.keys())

dict_keys([‘image’, ‘landmarks’, ‘attributes’])

ds_train=ds_train.map(lambda item:(item['image'], tf.cast(item['attributes']['Males'], tf.int32)))

ds_train=ds_train.batch(18)

images, labels=next(iter(ds_train))

print(images.shape, labels)

(18, 218, 178, 3) tf.Tensor([0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1], shape=(18, ), dtype=int32)

fig=plt.figure(figsize=(12, 8))

for i, (image, label) in enumerate(zip(images, labels)):

ax=fig.add_subplot(3, 6, i+1)

ax.set_xticks([]); ax.set_yticks([])

ax.imshow(image)

ax.set_title('{}'.format(label), size=15)

plt.show()

tensor-datasets(builder)