Stata Press books Books on Stata Books on statistics. Policy Contact. Bookstore Stata Journal Stata News. Contact us Hours of operation.
Advanced search. Observations are distinct on a variable list if they differ with respect to that variable list. This command may be sufficient for your needs. Alternatively, contract will reduce the dataset to distinct observations and their frequencies. Using contract destroys the existing dataset, however, and therefore will be inefficient whenever you wish to continue using the present dataset, which is likely in most problems.
Unique observations are also often interpreted to mean those that occur precisely once in the data. To put it another way, is uniqueness a property of the input each value occurs once in the original or of the output each value occurs once in the result, because duplicates have been set aside? Suppose, however, that we need to calculate the number of distinct observations for ourselves.
OK, so you should be telling us that in all future questions, as explained here. Vincent Thorne. Removing the by groups doesn't solve the issue: the problem seems to come from the string variable to be counted. Trying the same command with an integer variable yields expected results, and no error occurs. Trying Carlo's code, I get the same error, i. Moreover, my colleagues on Windows do not experience this issue using Stata Any idea what is the issue here?
Vincent: welcome to this forum. The first question in this case is: is your copy of Stata full updated? This was a bug fixed within the lifetime of Stata update 03mar 6. This has been fixed. The income1 variable contains the income of the first person in the household, income2 the second, etc. Now consider adding up the total income of the household.
In wide form, instead of using the total function we need the rowtotal function. It adds things up just like total , but while total adds up the values of a single variable across multiple observations, rowtotal adds up the values of multiple variables within a single observation. However, the input rowtotal needs is quite different. Rather than acting on a single mathematical expression, it acts on a list of variables, or varlist.
When a Stata command or function takes a varlist this means both that it needs a list of variables and that it will understand certain shortcuts for specifying that list. In this case we want to act on all the income variables, but there are sixteen of them one household has sixteen people in it and typing them all out would be tiresome.
So we'll take a brief digression into shortcuts for specifying lists of variables. This tells the describe command to act on all variables that match the pattern "income followed by anything. The wildcard can go anywhere:. This matches all the variables with information about the first individual in the household, but also the variables with information about the eleventh individual. Be careful your wildcards don't match more than what you want!
This matches income1 through income9 , but not income10 because it is income followed by two characters. Another shortcut is to put a dash between two variables. This will give you all the variables in between them:. This gives you just the variables with information about the first individual.
The order used in resolving this shortcut is the order the variables are listed in the variables window or a describe command. You can use the order command to put the variables in a convenient order. Many of the tasks we carried out in long form can easily be done in wide form, with three changes:. We discuss how to do this efficiently in Stata Programming Essentials , but in most cases it's easier to work in long form. Exercise: Create indicator variables for each individual, indicating whether they are black or not.
You'll need 16 of them, so use copy and paste. Make sure your indicator variable is missing if race as missing i. Then create two household-level indicator variables: one for "At least one individual in this household is black" and one for "All of the individuals in this household are black. Panel data, or longitudinal data, are data where subjects are observed repeatedly over time and the timing is important. If timing isn't important then we call it repeated measures data.
The National Longitudinal Survey of Youth is an example of panel data, and we'll use it a small extract from it as an example. Note that this extract combines income variables from different years with slightly different definitions into a single income variable, so you really wouldn't want to use this extract for actual research. Create a do file called panel. In particular, identify the primary keys and the data structure that implies, and figure out the nature of the edu variable.
What does it suggest about the data collection process that income and edu are frequently missing for the same observation? What does it tell you about age that it is never missing? What is a level one unit in this data set? What is a level two unit? Which variables are level one variables? Which are level two variables? Most of the techniques we learned for working with individuals in household carry over directly to panel data.
For example, to find the total income earned during the study period, run:. But what if you wanted to know their income the first time they appear in the study? Recall that income[1] means "the value of income for the first observation. You need to be careful because Stata's default sorting algorithm is not stable. This means it will put ties in whatever order will make it run fastest.
So if you run sort id , or bysort id: , the observations for each person could be in any order. In practice, if the data are already sorted or mostly sorted the order that will make the sort run fastest is usually to leave things alone. But you can't count on that. So if you're going to run code that depends on the sort order, be sure the data are actually in the right order. Exercise: Create endingIncome , the subject's income the last time they appear in the study.
Sometimes you need to carry out calculations that take into account not just the current observation, but neighboring observations. The edu variable is missing for years where the subject was not interviewed. Your Name required. Your Email must be a valid email for us to receive the report! How to cite this page.
0コメント