Big Data and Hadoop Tutorial 1: Learning about Big Data
From today we have decided to start a course on Hadoop development. If you are a regular viewer of our site you may be knowing that we have started a course on Android development and to run along with it we decided to introduce this new course. It has become hard to manage all the data collected by our websites or applications so it’s the correct time to learn above Big Data and Hadoop.
Firstly what is Big Data? We can say its collection of large amount of data and this large amount doesn’t mean megabytes or gigabytes of data. We can say that data collected in Terabytes and Petabytes can be termed under Big Data. It is very difficult to store, process, maintain, analyse and visualize this large amount of data. Here store data means the data that gets collected from the users using your application or website. As the size of data increases it becomes difficult to find place to store such large amount of data. Process means you use the collected data to derive certain information from it. If you have a large heap of data it becomes difficult to extract certain information and process them. Maintain means your collected data should be stored safely at some place to use it later and note your data doesn’t get corrupted. Mostly when the amount of data increases some data gets compressed automatically or some data is lost. Analyse means you use the collected data to retrieve or calculate some result. Analysing such big amount of data is time consuming and costs more. Visualize means to review the data collected in form of chart or graph. In most case data is show in form of pie charts or graphs to show it to general users or for those who doesn’t understand technical wordings. With the increase in amount of data it is difficult to sort and arrange data for visualization.
Let’s now discuss about the characteristics of big data. There are mainly three characteristics of big data namely Volume, Velocity and Variety. Volume means how big your data is, what is the total amount of data available in terms of bytes? Velocity means data is being piled up or generated very fast in these days. For example Twitter generated 5000 to 6000 tweets every second, just image about its velocity. Variety of data means how the data is presented or collected like one is structured data. For example MySQL in which data is collected in an organized manner. Second is semi-structured data in which data is partially structured like in xml, json. Third is unstructured data like normal text, audio or video. Mostly we are generating nowadays these unstructured data by posting our pictures or videos in social networking sites and chatting with others.
Now let’s talk about sources of Big Data. Big Data is generated from social media, banks, instruments, websites and stock markets. Most users are active on social media sites and increases the amount of data for storing and processing for social media sites. Banks also generate data of its users and the transactions carried out by users. Instruments include RFID readers, security cameras etc. that keeps track of the security and produces continuous data in terms of audios and videos. Websites generated data in terms of post or the source of data they provide to download for users. Stock Market is one of the major factor generating big data related to all the buying and selling of stocks and transactions of money taking place between buyers and brokers.
Lastly we will learn about the use cases of Big Data. Use cases means where our collected big data is used for. Big Data is used as use case in Recommendation engines, Analysing call detail record (CDR), Fraud Detection, Market Basket Analysis and Sentimental Analysis. Recommendation engines is used in internet as you can see in many web applications or websites. For example in YouTube you can see the videos related to your previous searches gets recorded and it suggest videos related to that topic to you. This same scenario happens with websites or if you search something in google. Another example is online shopping websites that shows the best products for you analysing the data about your previous purchases and searches. Second is Analysing call detail record which is performed by telecom companies to keep their customers with them and compel customers not to leave their telecom services. If you want to know about it please comment below, I will make a detail article about it. Next if Fraud detection which is related to the frauds happening related to credit card, banking transactions etc. Big data helps to analyse the data collected during transactions to catch fraud activities. Fourth is Market Basket analysis which means when you buy any product they automatically add or suggest some products to you which makes you buy both the products. For example when you buy a website it automatically adds or asks you to add security for your website and hosting. Another example is online shopping, when you buy any smartphone they automatically adds extended warranty or cases of the phone and suggest you to buy them together. Final one is Sentimental analysis which suggest you to do something or donate money for some certain purpose. For example you can see some posts in social networking sites that tells about some flood victims or something and sentimentally attacks you to help them and provide your data.
Here we end discussing about Big Data and I think I have covered all details related to Big Data. In our next article we will learn about managing Big Data using Hadoop and what are other alternatives available. If you like our articles and want to start any other tutorial series other than Android and Hadoop please comment below with the name of topic.