Monday, May 11, 2020

Basic Data Analysis Commands Using R Studio

==============================================

Creating a summary of stats for the dataset

Command : 


      summary(dataset_price_personal_computers) 

Output : 



===================================================================

Checking str before converting data


Command : 


      str(dataset_price_personal_computers)

Output : 



===================================================================

Converting 'Ch' Values to integer (OR) Converting “Yes” to 1 and “No” to 0

Commands:


dataset_price_personal_computers$cd <- gsub("yes","1",dataset_price_personal_computers$cd)



dataset_price_personal_computers$cd <- gsub("no","0",dataset_price_personal_computers$cd)


Output : 



===================================================================== 

Transform ‘Chr’ columns cd, multi & premium to numeric

Commands:


dataset_price_personal_computers <- transform(dataset_price_personal_computers, cd = as.numeric(cd), multi = as.numeric(multi), premium = as.numeric(premium))

Output : 



======================================================================

Running Correlation on a data set

Commands:


cor(dataset_price_personal_computers)

Output : 






Qualitative and Quantitative Methodologies

An attribute tells about what kind of values or variables collected is known as Data Types. Data types mainly divided into quantitative data and qualitative data. Quantitative data will be data about amounts, and in this way numbers, and qualitative data is enlightening, and respects wonder which can be watched yet not estimated.

Qualitative methodology



Qualitative research is empirical research where the data are not in the form of numbers[1]. Qualitative research is multimethod in focus, involving an interpretive, naturalistic approach to its subject matter. This means that qualitative researchers study things in their natural settings, attempting to make sense of, or interpret, phenomena in terms of the meanings people bring to them [2].

The aim of qualitative research is to understand the social reality of individuals, groups, and cultures as nearly as possible as its participants feel it or live it. Thus, people and groups, are studied in their natural settings [3]. 

Emotional depictions can expect a huge activity of prescribing potential associations, causes, effects, and dynamic strategies in qualitative methodology. Qualitative research uses a connecting with, story style; this investigation might be of explicit favorable position to the master as she or he could go to emotional reports in order to take a gander at kinds of data that may somehow be blocked off, as needs are expanding new understanding.

As a result of the time and costs included, subjective plans don't by and large draw tests from huge scale informational collections. The time required for information assortment, investigation, and translation is protracted. Investigation of subjective information is troublesome and master information on a region is important to attempt to decipher subjective information, and incredible consideration must be taken when doing as such, for instance, if searching for indications of psychological maladjustment. 

Quantitative methodology



Quantitative methodology is gathering data in a numerical form that can be put into categories, or in rank order, or measured in units of measurement. Quantitative researchers aim to establish general laws of behavior and phenomenon across different settings or contexts. 

Quantitative data can be interpreted with statistical analysis, and since statistics are based on the principles of mathematics, the quantitative approach is viewed as scientifically objective, and rational[4] [5]. Quantitative information depends on estimated values and can be checked by others in light of the fact that numerical information is less open to ambiguities of translation.

Large sample sizes are needed for more accurate analysis. Small scale quantitative studies may be less reliable because of the low quantity of data [5]. Poor knowledge of the application of the statistical analysis may negatively affect analysis and subsequent interpretation[6]. 

Compare and contrast qualitative data vs quantitative data





Qualitative data can be produced through:


· Texts and reports
· Audio and video chronicles
· Images and symbols
· Interview transcripts and center gatherings
· Observations and notes

Quantitative data can be generated through
· Tests
· Experiments
· Surveys
· Market reports and Metrics


To reinforce your comprehension of subjective and quantitative information, To secure subjective information, consider identifiers like the shade of your garments, kind of hair, and nose shape. For quantitative information, consider quantifiable like your stature, weight, age, and shoe size. 



========================================================================



References

[1] Punch, K. (1998). Introduction to Social Research: Quantitative and Qualitative Approaches.

[2] Denzin, N., & Lincoln. Y. (1994). Handbook of Qualitative Research. Thousand Oaks, CA, US: Sage Publications Inc.

[3] McLeod, S. A. (2019, July 30). Qualitative vs. quantitative research. Simply Psychology. https://www.simplypsychology.org/qualitative-quantitative.html

[4] Carr, L. T. (1994). The strengths and weaknesses of quantitative and qualitative research: what method for nursing?. Journal of advanced nursing, 20(4), 716-721.Journal of the Association for Information Science and Technology, pp. 2155–2159

[5] Denscombe, M. (2010). The Good Research Guide: for small-scale social research. McGraw Hill.

[6] Antonius, R. (2003). Interpreting quantitative data with SPSS. Sage.

[7] Minichiello, V. (1990). In-Depth Interviewing: Researching People. Longman Cheshire.

[8] Devin Pickell. (2019, Mar 4). Qualitative vs quantitative data. what’s the difference? https://learn.g2.com/qualitative-vs-quantitative-data



Sunday, May 10, 2020

Data Collection Techniques in Data Science


Any exploration is just in the same class as the Data that drives it, so picking the correct strategy Data collection can have a significant effect. Right now, I will take a gander at three unique Data collection procedures – Survey, Interviews, and Focus Group Discussions – and assess their appropriateness under various conditions.


Data Collection 

Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. The data collection component of research is common to all fields of study including physical and social sciences, humanities, business, etc.[1].



Surveys


Surveys, as we consider them here, remain solitary instruments of Data collection that will be regulated to the example subjects either through the mail, telephone or on the web. They have for some time been one of the most well-known Data collection methods.

Surveys offer the scientists a chance to painstakingly structure and figure the Data collection plan with accuracy. Respondents can take these surveys at an advantageous time and consider the appropriate responses at their own pace. The range is hypothetically boundless. The survey can arrive at each side of the globe if the medium takes into account it. Surveys can be controlled to the individuals through a collection of ways. The surveys can just be sent by means of email or fax or can be directed through the Internet. These days, the online study strategy has been the most well-known method for social event information from target members. Besides the accommodation of information gathering, scientists can gather information from individuals around the world.

On the other hand, The Surveys that were utilized by the specialist from the earliest starting point, just as the technique for overseeing it, can't be changed all through the procedure of Data gathering. Despite the fact that this resoluteness can be seen as a shortcoming of the study technique, this can likewise be quality considering the way that exactness and reasonableness can both be practiced in the examination. Questions that bear debates may not be decisively replied by the members due to the most likely trouble of reviewing the data identified with them. The reality behind these debates may not be alleviated as precisely as when utilizing elective information gathering techniques, for example, eye to eye meetings and center gatherings.

Interviews.


There are three fundamental types of research interviews: structured, semi-structured, and unstructured. Structured interviews are, essentially, verbally administered questionnaires, in which a list of predetermined questions are asked, with little or no variation and with no scope for follow-up questions to responses that warrant further elaboration. [2]. The reason for the exploration talk with is to investigate the perspectives, encounters, convictions or potential inspirations of people on explicit issues (e.g. factors that impact their participation at the dental specialist). Subjective techniques, for example, interviews are accepted to give a 'more profound' comprehension of social wonders than would be gotten from absolutely quantitative strategies, for example, polls.

Interviews help the researchers uncover rich, significant comprehension, and learn information that they may have missed otherwise. The proximity of an examiner can give the respondents additional comfort while noticing the study and assurance right interpretation of the questions. The physical closeness of a determined, particularly arranged examiner would altogether be able to improve the response rate.

Interfacing with all respondents to coordinate interviews is a tremendous, dreary exercise that prompts a critical addition in the cost of driving a survey. To ensure the ampleness of the whole exercise, the examiners must be all around arranged in the major fragile aptitudes and the material point. 

Focus Group Discussions. 


Focus Group Discussions take the intelligent advantages of an interview to the following level by bringing a painstakingly picked bunch together for a directed conversation regarding the matter of the review. Focus groups share many common features with less structured interviews, but there is more to them than merely collecting similar data from many participants at once. A focus group is a group discussion on a particular topic organized for research purposes. This discussion is guided, monitored, and recorded by a researcher (sometimes called a moderator or facilitator)[2].

The closeness of a few pertinent individuals together simultaneously can request that they check out a solid conversation and help specialists with revealing data that they most likely won't have envisioned. It engages the masters to affirm the genuine factors promptly; any misinformed reaction will in all probability be countered by different individuals from within the group. It licenses the scientists to see the various sides of the coin and build up a reasonable point of view on the issue.

Finding parties of individuals who are fundamental to the assessment and convincing them to get together for the session simultaneously can be a risky task. The vicinity of unnecessarily uproarious individuals in the center get-together can control the evaluations of the individuals who are less vocal. The individuals from an inside social gathering can typically fall prey to careless consistency on the off chance that one of them ends up being incredibly astonishing and persuading. This will cover the good grouping of end that may have notwithstanding developed. The official of a spotlight pack conversation must be attentive to shield this from occurring. 


References

[1] Kabir, Syed Muhammad. (2016). Methods Of Data Collection.

[2] Gill, P., Stewart, K., Treasure, E. et al. Methods of data collection in qualitative research: interviews and focus groups. Br Dent J 204, 291–295 (2008). https://doi.org/10.1038/bdj.2008.192.

[3] Gaurav. J. (2017, Aug 16). 4 Data Collection Techniques: Which One’s Right for You?. Retrieved https://humansofdata.atlan.com/2017/08/4-data-collection-techniques-ones-right/

[4] Sarah Mae Sincero (Mar 18, 2012). Advantages and Disadvantages of Surveys. Retrieved Feb 13, 2020 from Explorable.com: https://explorable.com/advantages-and-disadvantages-of-surveys



R – A TRUTHFUL PROGRAMMING LANGUAGE FOR DATA SCIENTISTS


To innovate and practice algorithms for implementing solutions, analyze unstructured data, To perform statistical computations, data analysis, graphical representation and visualization of data, statistical programming languages play an important role in the day to day work of  Data Scientists. To capture, communicate, store, analyze, and aggregate data manually requires a lot of manual effort, the manual effort has not guaranteed accuracy to build complex calculations. To reduce the manual effort and increase the accuracy in the above operations, statistical programming language plays an important role in modern technology.

R for Data Science


Data science is an empowering control that empowers you to change unrefined data into getting, comprehension, and data. The goal of "R for Data Science" is to help you with learning the most noteworthy devices in R that will empower you to do Data science [1].

Introduction to R.

R is a language and condition for unquestionable figuring and plans. R gives a wide assortment of certain (straight and nonlinear appearing, old-style quantifiable tests, time-game-plan assessment, depiction, gathering, … ) and graphical technique, and is altogether extensible. The S language is a great part of the time the vehicle of a decision to inspect in quantifiable technique, and R gives an Open Source course to the excitement for that action [2]. 


R Environment.

R is an integrated suite of software facilities for data manipulation, calculation and a graphical display. It includes effective data handling and storage the facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes  conditionals, loops, user-defined recursive functions and input and output facilities [2]. 

R Language.

R is an extremely versatile open-source programming language for statistics and data science [3]. In the R system, you can do any kind of statistical computation by using functional-based syntax or program based code with very powerful debugging facilities and this language has many interfaces to other programming languages. Then the resulting statistics can be displayed by using the high-level graphical tool in R [4]. When data scientists work in any field of big data like data business, industry, and government, you'll find the majority of them using the R environment and packages (comparison between languages will discuss later), even when they work in medicine, academia, and so on. R has the following features [5]: 

        A short and slim syntax to accelerate your tasks on your data. It has a variant format for loading and storing data for both local and over internet tasks. Ability to perform your tasks in memory by using a consistent syntax. A long list of tools (functions, packages) for data analysis tasks, some of them are built-in and the rest is open source. It has different easy manners to represent the statistical results in graphical methods, and the ability to store these graphs on the disk. Ability to automate analyses and create new functions (R is a programming language), and extend the existing language features.

      Users don’t need to reload their data every time because the system saves the data between the sessions, and save the history of their commands. If you prefer GUI, there are many free GUI for R like • RStudio • R Commander • StatET • ESS • JGR Java GUI for R.

Advantages of R.

1. Programmers don’t need to reload data every time, the system saves data between sessions and history of their commands.
2. Supports various formats for storing and loading data for both local and over internet tasks.
3. It's highly compatible and can pair up with different programming languages like C, C++, JAVA, Python.
4. It is easy to integrate with various database management systems and technologies like Hadoop.
5. R is well known as the lingua franca of statistics.
6. Reporting results of the analysis is extremely easy, it also helps to build interactive web apps that allow users to play with results.

 

Disadvantages of R.    

1. R package and a programming language are much slower than other languages like python.
2. R has a lack of basic security, due to this it has several restrictions to embedded into a web application.
3. R requires entire data in a single place, due to this it requires more memory.


R Tool and Programming language have several advantages and disadvantages for data scientists comparable to the other statistical programming languages.


References :
[1] H. Wickham & G.Grolemund, "R for Data Science", January 2017, O’Reilly Media Inc.
[2] Introduction to R. Retrieved from https://www.r-project.org/about.html
[3] W. McKinney, “Python for data analysis”,1st ed., 2013, O’Reilly Media Inc., pp.453.
[4] D. Rotolo and L.Leydesdorff, “Matching Medline / Pubmed data with a web of science: a routine in r language”, vol. 66, no. 10, 2015, Journal of the Association for Information Science and Technology, pp. 2155–2159
[5] T. Siddiqui and M. Al Kadri, “Review of Programming Languages and Tools for Big Data Analytics”, May-June 2017, International Journal of Advanced Research in Computer Science, pp.1113.