Blocks
A disk has a block size, which is the minimum amount of data that it can read or write.
Filesystems for a single disk build on this by dealing with data in blocks, which are an
integral multiple of the disk block size. Filesystem blocks are typically a few kilobytes
in size, while disk blocks are normally 512 bytes. This is generally transparent to the
filesystem user who is simply reading or writing a file—of whatever length. However,
there are tools to perform filesystem maintenance, such as df and fsck, that operate on
the filesystem block level.
HDFS, too, has the concept of a block, but it is a much larger unit—64 MB by default.
Like in a filesystem for a single disk, files in HDFS are broken into block-sized chunks,
which are stored as independent units. Unlike a filesystem for a single disk, a file in
HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.
Why Is a Block in HDFS So Large?
Reference : IBM and Apache Hadoop
Post Top Ad
Your Ad Spot
Tuesday, August 20, 2019
HDFS Blocks and Nodes
Tags
# Big Data & Hadoop in Data Science
Big Data & Hadoop in Data Science
Subscribe to:
Post Comments (Atom)
Post Top Ad
Your Ad Spot
No comments:
Post a Comment