If you are confident that the data on individuals in your project is correct, you can skip this step. Parity Builder cannot tell you whether specific attribute values are correct, but it can help you identify three possible problems: missing attribute values; illegal or impossible attribute values; and multiple individuals sharing the same identification (ID attribute). All three problems are present in the tutorial's data.
Domain Report
The best way to identify data problems (other than duplicate IDs) is to produce a listing of all the distinct values for your numerical and categorical attributes. In the Generate rosters menu, click Report domains. Parity Builder will write a report labeled Attribute Domain Analysis in the Messages tab. The domain report will list all categorical and numerical attributes, and the domain (set of values) of each. Incorrect or anamolous values will point to data problems. Generate that report now. Observe the following anomalies:
- The gender attribute has three possible values, one of which ("n") would appear to be erroneous.
- The experience attribute has a "null" value, indicating that at least one individual is missing an experience rating.
- The ranking attribute has includes a value of "3g", which is not a number. In addition, someone has a ranking of 0; but the ranking scale is supposed to be 1 to 3.
We will deal with each of these problems below.
Duplicate IDs
Let us begin with the issue of duplicate IDs. Click on the individuals tab and note that the first two children share the name "Aaron". Parity Builder identifies duplicate IDs whenever the ID attribute is set, and adjusts all but the first by adding an underscore ("_") and the number of the table row containing the individual. Thus the second instance of an "Aaron", in row 2, becomes "Aaron_2", while the second instance of an "Ashley", in row 13, becomes "Ashley_13". Occasionally, however, you may inadvertently introduce a duplicate ID as a result of adding a new individual or editing an existing one. To create an example of this, click on the name "Aaron_2" and delete the last two characters, restoring it to "Aaron". (Be sure to hit the Enter key to commit the change.) Also edit "Benjamin_15" and "Benjamin_16" back to just "Benjamin".
Now click on the Individuals menu, select the Filter... submenu, and then select Duplicate ID. Note that Parity Builder displays just the records of the individuals with duplicate IDs. Edit the names to eliminate duplicates, either by restoring the previous values (for instance, returning the second "Aaron" to "Aaron_2") or by entering some other difference (such as a middle or last initial). As you press the Enter key after each change, Parity Builder removes the affected records from the display. When you have fixed all the duplications, the table should be empty. Go back to the Individuals > Filter... submenu and select Show all to restore the full table display.
Missing Data
Parity Builder cannot balance attribute values if it
does not know those values for all individuals. In the Attribute Domain Analysis,
we observed that the experience attribute (Exp) has
Return to the Individuals > Filter... submenu, and this time select Missing data. Parity Builder will display only those individuals missing some data. Here "Casey" and "Jessica" are missing the experience entry. Code both as inexperienced ('n' or 'N', without the quotation marks) and note that each row is removed from the table as it is fixed. Once again, restore the table with Individuals > Filter... > Show all.
Invalid Data
Recall from the Attribute Domain Analysis that the Ranking attribute contained invalid entries, one of which was not a number. In the Individuals > Filter... submenu, select Invalid data. This option restricts the table to records with numerical attributes that do not look like numbers. In this case, "William_148" has a ranking of "3g", which is not a valid number. Delete the extraneous 'g' to fix the problem. As before, restore the full table afterward.
The Invalid data filter only finds numerical values that are not numbers. It cannot tell when an entry is an inappropriate number (such as listing the weight of an adult as 15 pounds, when it should be 150). Similarly, it cannot tell when the value of a categorical attribute is invalid, because Parity Builder has no way of knowing the valid domain for categorical attributes. In looking at the Attribute Domain Analysis, we saw that the value 'n' appears at least once in the Gender attribute. This is likely a typographic error in the data entry, since the only valid genders are 'f' and 'm'. We also saw that, in addition to the invalid value for Ranking that we just fixed, rankings of 0, 1, 2 and 3 were encountered. All rankings are supposed to be on a scale from 1 to 3, so the 0 value is incorrect.
Click on the Individuals > Filter... submenu, then click Attribute value. This will allow you to list all individuals with a specific value of a particular attribute. Select Gender in the attribute list, then 'n' in the value list, and click Apply. We see a list of all individuals whose gender is set to 'n', which in this case is just Timothy. Since Timothy is traditionally a male name, and 'n' is next to 'm' on a keyboard, we will assume this is just a typographical error. Correct it by clicking in the corresponding cell and editing it. When you finish, the list should be empty.
Return to the Attribute value menu option, and this time select the Ranking attribute. To filter a numerical attribute, you specify the lower and upper limits of a range of values you want to see. In this case, you are only interested in the (nonsensical) value 0, so leave the lower limit at 0.0 and change the upper limit to 0.0 as well. Parity Builder lists the only individual (Christine) with this ranking. Assume that, after checking, you find that Christine's correct ranking is 1. Change it here and then save the project.
Our next step is to add any necessary restrictions.
previous: Edit attributes | next: Enter restrictions |