How to Read Space Delimited File in Sas
A jargon-free, piece of cake-to-learn SAS base course that is tailor-fabricated for students with no prior knowledge of SAS.
- 90+ Lessons
- 150+ Practice Exercises
- 5 Coding Projects
- k+ Satisfied Students
How to Import Text Files into SAS
Text files are a common file format to apply when importing or exporting information from one data source for some other. When importing text files from other data sources or databases, there are many variations in the data construction and delimiters that i can come across.
This article aims to address some of the more common challenges that arise when attempting to import different variations of text files into SAS. A few different tips and methods are also provided along the style.
Topics covered include:
- Importing tab-delimited text files with PROC IMPORT
- Importing special character delimited text files with PROC IMPORT
- Importing space-delimited text files with PROC IMPORT
- Using PROC IMPORT to Generating Information Step lawmaking for importing text files
Software
Before we continue, make sure yous have access to SAS Studio. It's complimentary!
Data Sets
The examples used in this commodity are based on the text files listed below. The text files are derived from the SASHELP datasets including CARS and ORSALES datasets:
- Cars_tab.txt - download
- Cars_pipe.txt - download
- Orsales_space.txt - download
Earlier running any of the examples below, you will need to replace the path '/dwelling house/your_username/SASCrunch' with a directory that y'all have read/write access to in your environment.
This can vary depending on the machine you are running SAS on, the version of SAS y'all are running and the Operating System (OS) you are using.
If y'all are using SAS OnDemand for Academics, you must starting time upload the files to the SAS server.
Please visit hither for educational activity if you are not sure how to do this.
i. Importing a Tab-delimited Text File with PROC IMPORT
With a tab-delimited text file, the variables (columns) are separated by a tab and the files typically cease with a ".txt" extension.
In this instance, the input file is the cars_tab.txt file. This is a text file based on the SASHELP.CARS dataset.
The first part you lot demand following the PROC IMPORT statement is the datafile statement. The datafile statement is required so that SAS knows where the file yous would similar to import is stored and what the proper name of that file is. Inside the quotation marks post-obit the datafile statement, you demand to add together the consummate path, including the filename and file extension. Be sure to replace '/home/your_username/SASCrunch' with the correct directory on your machine or environment where cars_tab.txt is saved. In this example, "/home/your_username/SASCrunch" is the path, "cars_tab" is the filename, and ".txt" is the file extension.
To import tab-delimited text files, both the DBMS and DELIMITER options volition need to be used. The DBMS value used for this case is DLM. The DLM value tells SAS that you would similar to specify a custom delimiter for the dataset.
After endmost off the PROC IMPORT argument with a semi-colon, a second choice, DELIMITER is added. The value of DELIMITER for a tab-delimited file is '09'x, which is the hexadecimal representation of a TAB on an ASCII platform.
Finally, the supersede option is included to allow for multiple re-runs and overwrites of the CARS_TAB dataset in Work. If you lot adopt non to overwrite the newly imported SAS dataset, you tin just remove the replace option.
Using these parameters, the following code will import the tab-delimited cars_tab.txt file and output a SAS dataset in WORK called CARS_TAB:
proc import datafile = '/home/your_username/SASCrunch/cars_tab.txt'
out = cars_tab
dbms = dlm
replace;
delimiter = '09'10;
run;
After running the above lawmaking, you will notice something is a fleck off with the output dataset:
If you were to open up the cars_tab.txt file directly using Notepad, Wordpad, TextEdit or similar on your computer, y'all would notice that this file has a extra row of invalid data in it. This blazon of situation often occurs when the text file is created from some other data source.
Fortunately, SAS provides an option that you tin can add to your PROC IMPORT statement to skip this extra line of information that yous don't demand. Past adding the datarow pick, you can permit SAS know at which row the data (observations) offset. In this case, we know that the get-go row has the headings, the second row has no data, and the observations outset on the third row, so we set datarow = 3 :
proc import datafile = '/abode/your_username/SASCrunch/cars_tab.txt'
out = cars_tab
dbms = dlm
replace
;
delimiter = '09'x;
datarow = 3;
run;
Practise you have a hard time learning SAS?
Take our Practical SAS Training Course for Accented Beginners and larn how to write your first SAS program!
2. Importing Text Files Delimited with Special Characters
Since text files can contain whatever number of special characters equally delimiters, the DELIMITER statement be used with only about any keyboard character.
For example, if the values of a text file are delimited with the pipe bar "|", you tin can simply specify the pipe bar symbol in the DELIMITER statement, like to how we used '09'ten for tab-delimited files. In this case, the cars_pipe.txt file is read in to create the CARS_PIPE SAS dataset in the WORK library:
proc import datafile = '/dwelling/your_username/SASCrunch/cars_pipe.txt'
out = cars_pipe
dbms = dlm
replace
;
delimiter = '|';
run;
After updating the path in the datafile argument and running the in a higher place code, y'all volition notice that while the columns have been read in correctly, the variable names are not correct and actual values are being used as the variable names:
If you were to open the cars_pipe.txt file directly using Notepad, Wordpad, TextEdit or similar text editors on your estimator, you would detect that this text file has no column headings and the data starts directly in the first row.
To get around this, you lot need to let SAS know that there are no cavalcade headings provided in the input text file. Past default, there is a GETNAMES option in PROC IMPORT which is ready to Yeah. With this setting equal to Yeah, SAS assumes that the showtime row of data contains the column headings, which ultimately cease upwards every bit the SAS variable names. When this is non the case, simply set GETNAMES = NO to let SAS know in that location are no column headings provided in the input file:
proc import datafile = '/home/your_username/SASCrunch/cars_pipe.txt'
out = cars_pipe
dbms = dlm
replace
;
getnames = no;
delimiter = '|';
run;
To prepare the variable names, you could for example use the SAS Data Step with the RENAME argument to create a new dataset. As an example, the following dataset code would create a dataset called CARS_PIPE_CLEAN, which uses the RANEM statement to set the appropriate variable names as shown here:
data cars_pipe_clean;
set cars_pipe;
rename var1 = make
var2 = model
var3 = type
var4 = origin
/*var5 = ...
var15 = ...*/
;
run;
3. Importing Space-delimited Text Files with PROC IMPORT
Space-delimited text files are nevertheless another common file blazon y'all may run across that you would like to import into SAS. Past default, setting DBMS = DLM with your PROC IMPORT statement will use space as the delimiter, and so you don't need to explicitly use the delimiter option in this case.
For example, the orsales_space.txt text file contains space-delimited columns, and tin be imported into SAS with DBMS = DLM :
proc import datafile = '/home/your_username/SASCrunch/orsales_space.txt'
out = orsales
dbms = dlm
replace
;
run;
proc freq data = orsales;
tables product_category;
run;
Every bit shown in the Results, "Contrasted Sports Articles" is now only "Contrasted Sports A" in this newly imported dataset:
This blazon of situation tin can often occur when importing datasets into SAS because PROC IMPORT will only check a portion of the records before determining what the appropriate variable type and lengths should be on the output SAS dataset.
The solution to this problem is to include the GUESSINGROWS pick with your PROC IMPORT call. By specifying a number for GUESSINGROWS, you can tell SAS how many rows it should scan in your incoming dataset before determining what the advisable length and variable types should be.
In this instance import, there are 912 rows of data. Hither, past setting GUESSINGROWS = 912 we can exist sure that SAS volition option the largest width necessary to avoid truncation of any data when information technology completes the import. A new dataset, ORSALES_GUESSINGROWS, is then created so y'all can see the difference in results:
proc import datafile = '/home/your_username/SASCrunch/orsales_space.txt'
out = orsales_guessingrows
dbms = dlm
supplant
;
guessingrows = 912;
run;
By running a PROC FREQ to generate a frequency table on the newly created dataset, we can test whether or non the GUESSINGROWS option was constructive:
proc freq information = orsales_guessingrows;
tables product_category;
run;
As y'all tin can meet from the output, the Product_Category value "Assorted Sports Articles" at present shows upwards correctly and is no longer truncated:
It's important to note that GUESSINGROWS can be extremely computationally intensive and may significantly deadening downward the time it takes to import your dataset to SAS. The larger the value you lot gear up for GUESSINGROWS, the longer the processing will have, but more reliable the results will be. The run fourth dimension volition of form depend on your environment, the number of records and the number of variables plant in your data.
four. Importing a Tab-delimited File using Data Step
Although the amount of SAS lawmaking required to import a Text file using Data Step is longer than the code required for PROC IMPORT, using Data Step code allows for greater flexibility.
By using Data Step code, the variable names, lengths and types tin exist manually specified at the time of import. The advantage is that this allows you to format the dataset exactly the way you want every bit soon as it is created in SAS, rather than having to make additional modifications later on.
First, as with whatsoever SAS Data Step code, you need to specify the name and location for the dataset you are going to create. Here, a dataset named CARS_DATASTEP volition be created in the Piece of work directory.
The next step is to utilize the INFILE argument. The INFILE argument in this instance is made up of 6 components:
- The location of the Text file – /dwelling/your_username/SASCrunch in this example
- Delimiter selection – the delimiter constitute on the input file enclosed in quotation marks (delimiter is '09'x in this example since it is a tab-delimited file)
- MISSOVER option – Tells SAS to go on reading the aforementioned record even if a missing value is found for one of the variables
- FIRSTOBS – The offset row that contains the observations in the input file (Prepare to 3 in this example since the observations beginning on the third row in the cars_tab.txt file)
- DSD – Tells SAS that when a delimiter is plant within a quotation marker in the dataset, it should be treated as a value and non a delimiter
- LRECL – Maximum length for an entire record (32767 is the default maximum to utilize which volition ensure no truncation within 32767 characters)
Later the INFILE statement, the simplest way to ensure that your variable names, lengths, types and formats are specified correctly is to utilise a format statement for each variable. Afterward an advisable format has been assigned to each variable, the variables that you would like to import should be listed in lodge after an INPUT argument. Notation that character variables should have a dollar sign ($) afterwards each variable name.
Note that you can also specify INFORMATs and LENGTHs optionally here, simply in almost cases the FORMAT and INPUT statements should be all you need for a successful import.
Below is the Data Step code that would successfully import the cars_tab.txt file into a SAS dataset. As mentioned, exist sure to update the path to the right location of the cars_tab.txt file in your environment earlier running the following lawmaking:
data work.cars_datastep_tab;
infile '/home/your_username/SASCrunch/cars_tab.txt'
delimiter ='09'10
missover
firstobs =two
DSD
lrecl = 32767;
format Brand $5. ;
format Model $30. ;
format Blazon $half-dozen. ;
format Origin $6. ;
format DriveTrain $5. ;
format MSRP $9. ;
format Invoice $nine. ;
format EngineSize best12. ;
format Cylinders best12. ;
format Horsepower best12. ;
format MPG_City best12. ;
format MPG_Highway best12. ;
format Weight best12. ;
format Wheelbase best12. ;
format Length best12. ;
input
Make $
Model $
Type $
Origin $
DriveTrain $
MSRP $
Invoice $
EngineSize
Cylinders
Horsepower
MPG_City
MPG_Highway
Weight
Wheelbase
Length
;
run;
After running the above code, you should come across the CARS_DATASTEP_TAB data set up, shown partially here:
Go a Certified SAS Specialist
Get access to two SAS base certification prep courses and 150+ exercise exercises
5. Generating Data Step Code with PROC IMPORT
When the variable names, types, lengths or formats that SAS is automatically generating with PROC IMPORT are not what you are looking for, and you don't want to blazon out xl+ lines of code as in the previous example, PROC IMPORT can still be a time-saving tool.
Going back to the cars_pipe.txt text file, recall that this text file did not contain cavalcade headings.
Re-run the following code to import cars_pipe.txt into SAS and create a temporary dataset, CARS_PIPE to be stored in WORK:
proc import datafile = '/home/your_username/SASCrunch/cars_pipe.txt'
out = cars_pipe
dbms = dlm
supersede
;
getnames = no;
delimiter = '|';
run;
After running the to a higher place code, go to the Log that is created and discover that SAS Data Step code is really being generated every bit a result of the PROC IMPORT:
By simply copying and pasting this code from your log into your SAS plan, y'all can at present use this lawmaking as a template to outset your Data Step lawmaking, modifying it as needed to adjust variable names, types and lengths.
For example, you can supersede the variable names VAR1-VAR15 with the original variable names from CARS, every bit shown here:
data Piece of work.CARS_PIPE_CUSTOM ;
infile '/home/your_username/SASCrunch/cars_pipe.txt' delimiter = '|' MISSOVER DSD lrecl = 32767;
informat make $5. ;
informat model $30. ;
informat blazon $6. ;
informat origin $6. ;
informat drivetrain $five. ;
informat msrp nlnum32. ;
informat invoice nlnum32. ;
informat enginesize best32. ;
informat cylinders best32. ;
informat horsepower best32. ;
informat mpg_city best32. ;
informat mpg_highway best32. ;
informat weight best32. ;
informat wheelbase best32. ;
informat length best32. ;
format make $five. ;
format model $30. ;
format blazon $6. ;
format origin $half-dozen. ;
format drivetrain $v. ;
format msrp nlnum12. ;
format invoice nlnum12. ;
format enginesize best12. ;
format cylinders best12. ;
format horsepower best12. ;
format mpg_city best12. ;
format mpg_highway best12. ;
format weight best12. ;
format wheelbase best12. ;
format length best12. ;
input
brand $
model $
type $
origin $
drivetrain $
msrp
invoice
enginesize
cylinders
horsepower
mpg_city
mpg_highway
weight
wheelbase
length
;
run;
Chief SAS in 30 Days
Get latest articles from SASCrunch
SAS Base Certification Exam Prep Course
Two Certificate Prep Courses and 300+ Exercise Exercises
Source: https://sascrunch.com/importing-text-files/
Inline Feedbacks
View all comments