Showing posts with label SAS analytics certification. Show all posts
Showing posts with label SAS analytics certification. Show all posts

Sunday, 30 August 2015

Data Preparation using SAS

Before doing any data analysis, there are tasks which are critical to the success of the data analysis project. That critical task is known as data preparation. You may have heard that in the last years the data production is expanding at an astonishing pace. Experts now point to a 4300% increase in annual data generation by 2020. This can be due to the switch from analog to digital technologies and the rapid increase in data generation by individuals and corporations alike. The most of the data generated in the last few years are unstructured.

In the above context, it is highly important to prepare your data from the unstructured dataset to a structured dataset to do a meaningful analysis.
“Data preparation means manipulation of data into a form suitable for further analysis and processing”



“Data Preparation techniques consists of Cleaning, Integration, Selection and Transformation”
We will discuss some of the data preparation techniques in SAS using SAS. INFORMAT is used to read the data with special characters. FORMAT is used to display the data with special characters.

Data DP.Practice;

length City $10.;
input City $ ID $ Age Salary DOJ Profit;
informat Salary dollar6. DOJ ddmmyy10. Profit dollar7.2;
format Salary dollar6. DOJ ddmmyy10. Profit dollar7.2;
label DOJ = "Date of Joining";
rename Salary = Salary_of_Employee;
datalines;
Bangalore T101 24 $2,000 12/12/2010 $300.50
Pune T102 29 $3,000 11/10/2006 $400.50
Hyderabad T103 $5,000 12/10/2008 $500.70
Delhi T104 $6,000 12/12/2009 $450.00
Pune T105 $7,000 12/12/2009 $450.00
;
run;


On the above SAS code, we have used both the INFORMAT and FORMAT to read and display the data with special characters. The SAS INFORMAT statement read the salary as numeric variable and in a specific format i.e. $5,000 which is of 6 characters including $. The FORMAT statement displays the same in your input data. Rename and label statements helps modify the variables metadata for further understanding of the dataset.
We will apply some transformations techniques in a dataset which helps us to apply some advanced analytical techniques in the data. We have a dataset that has various attributes of a customer who has subscribed or not subscribed an edition. In our dataset we have a categorical variable status which holds the observation either “Subscribed” or “Not Subscribed”.  We can transform the categorical variable into a dichotomous variable to run a logistic regression on our dataset.

Data media01;
set DP.media;
length status $15;
If status =”subscribed” then status = “0”;
else status = “1”;
run;

On the above SAS code, we have applied simple If Else statements to transform our dataset called media. Transforming a categorical variable into a dichotomous variable helps us to apply the analytical techniques that we want to run in our dataset. Once after the transformation is done, the dataset is good to go for the next stage i.e. data analysis.

The more you torture your data i.e. Data Preparation, the more the success on the outcome of the data analysis.

Wednesday, 12 August 2015

Import and Export of dataset using SAS and R

For an analyst, data is a primary raw material which can used to draw conclusions and inferences for taking business decisions. Raw data is of less help to draw conclusions and inferences. Hence, we need to put the data into any statistical analysis software to slice and dice to bring inference with the help of data for better decision making. In this post, we will discuss the steps to import and export of a dataset using SAS and R.

There are different methods to do import and export of a dataset using SAS and R. The methods can differ as per the file format you are importing and exporting using SAS.


Import a dataset using SAS:

Proc import datafile = "F:\SASPractise\Data\Files\class.csv" out = class dbms = csv replace;
run;

Proc import datafile = "F:\SASPractise\Data\Files\class.csv" out = class1 dbms = tab replace;
run;

Proc import datafile = "F:\SASPractise\Data\Files\class.txt" out = class2 dbms = dlm replace;
run;

Proc import datafile = "F:\SASPractise\Data\Files\classdup.xls" out = classdup dbms = xls replace;
Sheet = "Sheet1";
run;

The PROCEDURE IMPORT syntax has three major components to describe,
DATAFILE - specifies the complete path and filename of the input PC file
OUT - identifies the output SAS dataset with either a one or two level SAS name. You can either give a permanent or temporary library name for your reference
DBMS - specifies the type of data to import into SAS
When you import a dataset, you can see the below logs to confirm that the dataset was successfully created.

Class was successfully created.
PROCEDURE IMPORT used.


Export a dataset using SAS:

Proc export data = class.exp1 outfile = "F:\SASPractise\Data\Files\exp1.xls" dbms = tab replace; 
run;

Proc export data = class.exp2 outfile = "F:\SASPractise\Data\Files\exp2.csv" dbms = csv replace; 
run;

Proc export data = class.exp3 outfile = "F:\SASPractise\Data\Files\exp3.txt" dbms = dlm replace; 
delimiter = ",";
run;

Import a dataset using R:

There are multiple ways to import a dataset in R. We will begin with the simplest method in R to import a dataset.

file.choose() method

Example1<-file.choose()

You will be prompted a window, you can simply select the csv, xls, txt file format to import the dataset. It is as simple as that.

Using read.csv () method

Example2 <- read.csv(“c:\\users\desktop\pollution.csv”)
View(Example2)

Export a dataset using R

write.csv() method

write.csv(Example2, “c:\\users\desktop\example2.csv”)

Getting the data into any statistical analysis software is one primary step that you take before doing analysis on the data. Moving in and out of data from one system to another will give an analyst the flexibility to use the best algorithms and techniques available in the SAS, R and so on. Once you get the data into any statistical analysis software, you can begin the process of data manipulation and data analysis to draw conclusions and inference from the data with the techniques available in the software system.