The first rule of Hadoop cluster capacity planning is that Hadoop can accommodate changes. If you overestimate your storage requirements, you can scale the cluster down. If you need more storage than you budgeted for, you can start out with a small cluster and add nodes as your data set grows.
Another best practice for Hadoop cluster capacity planning is to consider data redundancy needs. One of the advantages of storing data in a Hadoop cluster is that it replicates data, which protects against data loss. These replicas consume storage space, which you must factor into your capacity planning efforts. If you estimate that you will have 5 TB of data, and you opt for the Hadoop default of three replicas, your cluster must accommodate 15 TB of data.
To read the entire article, please click on https://searchdatacenter.techtarget.com/tip/Hadoop-cluster-capacity-planning-best-practices