HDFS Blocks and Nodes

DataScienceOwl August 20, 2019 Big Data & Hadoop in Data Science,

Blocks A disk has a block size, which is the minimum amount of data that it can read or write. Filesystems for a single disk build on this by dealing with data in blocks, which are an integral multiple of the disk block size. Filesystem blocks are typically a few kilobytes in size, while disk blocks are normally 512 bytes. This is generally transparent to the filesystem user who is simply reading or writing a file—of whatever length. However, there are tools to perform filesystem maintenance, such as df and fsck, that operate on the filesystem block level. HDFS, too, has the concept of a block, but it is a much larger unit—64 MB by default. Like in a filesystem for a single disk, files in HDFS are broken into block-sized chunks, which are stored as independent units. Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage. Why Is a Block in HDFS So Large?

Reference : IBM and Apache Hadoop

MyPythonGuru

Follow us on Facebook

Post Top Ad

Tuesday, August 20, 2019

HDFS Blocks and Nodes

No comments:

Post Top Ad

visitors today

Data Science Jobs

Python Jobs

Featured Posts

Popular

About

Archive

Sponsor

Tags