Read Csv In R

Once you've installed and configured R to your liking, it's time to start using it to work with data. Yes, you can type your data directly into R's interactive console. But for any kind of serious work, you're a lot more likely to already have data in a file somewhere, either locally or on the Web. Here are several ways to get data into R for further work.

[This story is part of Computerworld's 'Beginner's guide to R.' To read from the beginning, check out the introduction; there are links on that page to the other pieces in the series.]

The read.csv function assumes that your file has a header row, so row 1 is the name of each column. If that's not the case, you can add header=FALSE to the command: In this case, R will read the first line as data, not column headers (and assigns default column header names you can change later).

Sample data

If you just want to play with some test data to see how they load and what basic functions you can run, the default installation of R comes with several data sets. Type:

data()

into the R console and you'll get a listing of pre-loaded data sets. Not all of them are useful (body temperature series of two beavers?), but these do give you a chance to try analysis and plotting commands. And some online tutorials use these sample sets.

One of the less esoteric data sets is mtcars, data about various automobile models that come from Motor Trends. (I'm not sure from what year the data are from, but given that there are entries for the Valiant and Duster 360, I'm guessing they're not very recent; still, it's a bit more compelling than whether beavers have fevers.)

You'll get a printout of the entire data set if you type the name of the data set into the console, like so:

mtcars

There are better ways of examining a data set, which I'll get into later in this series. Also, R does have a print() function for printing with more options, but R beginners rarely seem to use it.

Existing local data

R has a function dedicated to reading comma-separated files. To import a local CSV file named filename.txt and store the data into one R variable named mydata, the syntax would be:

mydata <- read.csv('filename.txt')

(Aside: What's that <- where you expect to see an equals sign? It's the R assignment operator. I said R syntax was a bit quirky. More on this in the section on R syntax quirks.)

And if you're wondering what kind of object is created with this command, mydata is an extremely handy data type called a data frame -- basically a table of data. A data frame is organized with rows and columns, similar to a spreadsheet or database table.

The read.csv function assumes that your file has a header row, so row 1 is the name of each column. If that's not the case, you can add header=FALSE to the command:

mydata <- read.csv('filename.txt', header=FALSE)

In this case, R will read the first line as data, not column headers (and assigns default column header names you can change later).

If your data use another character to separate the fields, not a comma, R also has the more general read.table function. So if your separator is a tab, for instance, this would work:

mydata <- read.table('filename.txt', sep='t', header=TRUE)

The command above also indicates there's a header row in the file with header=TRUE.

If, say, your separator is a character such as | you would change the separator part of the command to sep='|'

One of the easiest and most reliable ways of getting data into R is to use text files, in particular CSV (comma-separated values) files. The CSV file format uses commas to separate the different elements in a line, and each line of data is in its own line in the text file, which makes CSV files ideal for representing tabular data.

The additional benefit of CSV files is that almost any data application supports export of data to the CSV format. This is certainly the case for most spreadsheet applications, including Microsoft Excel and OpenOffice Calc.

In the following examples, assume that you have a CSV file stored in a convenient folder in your file system. To convert an Excel spreadsheet to CSV format, you need to choose File→Save As, which gives you the option to save your file in a variety of formats.

Keep in mind that a CSV file can represent only a single worksheet of a spreadsheet. Finally, be sure to use the topmost row of your worksheet (row 1) for the column headings.

In R, you use the read.csv() function to import data in CSV format. This function has a number of arguments, but the only essential argument is file, which specifies the location and filename. To read a file called elements.csv located at f: use read.csv() with file.path:

R imports the data into a data frame. As you can see, this example has ten observations of nine variables.

Notice that the default option is to convert character strings into factors. Thus, the columns Name, Block, State.At.STP, Occurrence, and Description all have been converted to factors. Also, notice that R converts spaces in the column names to periods (for example, in the column State.At.STP).

This default option of converting strings to factors when you use read.table() can be a source of great confusion. You’re often better off importing data that contains strings in such a way that the strings aren’t converted factors, but remain character vectors. To import data that contains strings, use the argument stringsAsFactors=FALSE to read.csv() or read.table():

If you have a file in the EU (European Union) format (where commas are used as decimal separators and semicolons are used as field separators), you need to import it to R using the read.csv2() function.