====== Data description ======
===== Starting R for the first time =====
The recommended way to use ''R'' is to create a new empty directory (named something like "Digging Numbers" and start R from the command line into that directory. This way, data and command history will be saved just for this workspace. Here's an example for UNIX operating systems.
$ mkdir diggingnumbers # just the first time
$ cd diggingnumbers
$ R
After that, your ''R'' session is active in the current directory. You can always check the directory you are working in with the command
> getwd()
===== Importing data =====
To read data from the raw data file into R:
> spearheads <- read.csv("spearheads.csv", header=TRUE)
From that moment on you can access the dataset with the ''data frame'' object named ''spearheads''.
You can save time and fingers typing
> attach(spearheads)
every time you start a new session into that workspace. This enables you to call variables directly, like ''Maxle'' instead of ''spearheads$Maxle''
Once you have read in a dataset, you can verify the names of the variables using the "names" command:
> names(spearheads)
This will display a list of the column names in the table. It is also a handy means of verifying capitalization and spelling of the field (column) names , since a missing or added capital in a field name will result in an error.
For additional information regarding the data set enter:
> str(spearheads)
This displays a more elaborated list of the data as follows:
'data.frame': 40 obs. of 14 variables:
$ Num : int 1 2 3 4 5 6 7 8 9 10 ...
$ Mat : int 2 2 2 2 2 2 2 2 2 2 ...
$ Con : int 3 3 3 3 3 3 3 2 2 1 ...
$ Loo : int 1 1 1 1 1 1 1 1 1 1 ...
$ Peg : int 2 2 2 NA 1 2 2 2 2 2 .... etc.
The ouput shows, first that data is stored in memory as a dataframe. It also tells you that there are 40 records - observations - of 14 variables. The output then lists the variable name, the type of data, and a partial list of values stored in the variable following importation. This is particularly important information since some of the variables listed as "int" types are not actually numerical data. Material type - Mat - for example, is categorical data that has been entered as a numeric code. R will need to be informed that the variable really contains levels of a factor (a categorical variable) for some commonly used statistical routines. R could otherwise yield nonsensical results. There is no point, for example, in asking for an average value of Mat.
==== A note about importing data from external sources ====
Especially when you are importing files that you haven't produced yourself, **always** inspect text-format data with a text editor (e.g. ''vi'', ''emacs'', ''gedit'', ''wordpad'' ). Don't make assumptions based on the file extension (like ".csv"), instead just go looking at the data first. That's just good practice and something any user of external data should keep in mind.
You might find that files produced in a different country use different locale settings of decimal separators (comma vs point). R by default tries to load files with English settings. If your file doesn't load, inspect it and make good use of some of the options of the ''read.csv()'' command like ''sep'' (for field separator) and ''dec'' (for decimal separator).
===== Quitting R =====
When you are done with your first tutorial, quit the ''**R**'' session with the ''q()'' command, and answer ''y'' to the //Save workspace image// question.
> q()
Save workspace image? [y/n/c]:
This leaves all the variables you created as they are for your next session.
If you want to be sure R data is actually saved in //that// directory,
just ''ls -a'' after quitting ''R'' and you should find two files ''.RData'' and ''.Rhistory''.
----
[[Start]] · **Data description** · [[Transforming variables]] · [[Tables]] · [[Pictorial displays]] · [[Measures of position and variability]] · [[Sampling]] · [[Tests of difference]] · [[Tests of distribution]] · [[Correlation]] · [[Tests of association]]