So, I have been working on a little tool to read in the Civilization III scenario files, and represent the data in them in some kind of nicely structured way. And that kind of has me stumped. While I am able to extract what I need, the structured part kind of bothers me. So, here is a question to all of you, who have dealt with that: how do you read the data from the BIC file and store it?
There are three major approaches to reading the file I have identified:
1. The straightforward approach: hard-code the strcuture of the file into your program, and read the file sequentially. In other words, your code would look like this:
Read 4 bytes
put that into the file identifier variable
Read 4 bytes
check that this says VER#
Read 4 bytes
Put that into the number of headers variable
Read 4 bytes
Put that into the header length variable
Read 4 bytes
Put that into the major version variable
etc, etc, ad infinitum
This approach seems like a very big pain in the @$$, because you need to essentially put all of the structure of the file into this one big bolb of code. Of course, you could break it up into pieces, and then from your main reading method say:
ReadFileIdentiefier();
ReadVersion();
ReadBuildings();
etc;
but that still doesn't really eliminate the problem.
2. Read the file and when a known header is encountered, read the section. So, your code would look like this:
Read four bytes.
If they are BLDG, read the building section.
If they are ESPN, read the espionage section.
etc...
This does improve on the code a little bit, because now you are sort of independant of the order of sections, and if the developers decide to add a new section to the bic file, that won't break your code - it will just ignore the new section. However, that still does not solve all of the problems: you are still dependant on the hard-coded format of one entry. For instance, for a citizen entry, you would still have to read it the same way you would have read the whole file:
Read four bytes
that is the length of data
read 32 bytes
that is the name of the citizen
read 32 bytes
that is the pedia entry for the citizen
etc.
etc.
(Note: the numbers may not be correct, as I am doing this from memory. It is not my intention to provide the BIC format document here, though, but rather to just illustrate the point.)
3. Sort of a variation on #2: read through the whole file, searching for four consecutive capital letters followed by a number. These conditions are only satisfied by section identifiers, such as BLDG, CTZN, etc.. Store the positions of the headers in a vector. Then, go through the positions, identify what kind of section it is, and read the deata from there appropriatly. This approach still has the same problem as #2: you hard-code the format of a single entry of data for each type.
Now, then the question is: why is it even so important to be concerned with hard-coding the format of the bic file? Well, the problem here is that the format apparently changes quite a bit. And especially if we want to read all the different versions of the BIC file, we need quite a bit of flexibility.
Soooo, the way I am reading and storing the data is this:
I have a separate class for each type of entry. Each class has the data member corresponding to the data format of the entry. So, for instance, a class for a citizen would look like this:
Then, when I encounter a known section, I get all of the data members of the class through reflection, and read the data according to them. The problem here, however, is that the order in which I read the data is important, for obvious reasons. That is no big deal if the compiler does not perform any optimizations during compilation, because then all of the data members of the class are returned in the exact same order as they were declared in the source. If I compile with optimizations, however, the order of the data members is changed, and I can no longer rely on this system to support keeping track of the file format.
So, that's that. Now it's your turn to share the experience.
There are three major approaches to reading the file I have identified:
1. The straightforward approach: hard-code the strcuture of the file into your program, and read the file sequentially. In other words, your code would look like this:
Read 4 bytes
put that into the file identifier variable
Read 4 bytes
check that this says VER#
Read 4 bytes
Put that into the number of headers variable
Read 4 bytes
Put that into the header length variable
Read 4 bytes
Put that into the major version variable
etc, etc, ad infinitum
This approach seems like a very big pain in the @$$, because you need to essentially put all of the structure of the file into this one big bolb of code. Of course, you could break it up into pieces, and then from your main reading method say:
ReadFileIdentiefier();
ReadVersion();
ReadBuildings();
etc;
but that still doesn't really eliminate the problem.
2. Read the file and when a known header is encountered, read the section. So, your code would look like this:
Read four bytes.
If they are BLDG, read the building section.
If they are ESPN, read the espionage section.
etc...
This does improve on the code a little bit, because now you are sort of independant of the order of sections, and if the developers decide to add a new section to the bic file, that won't break your code - it will just ignore the new section. However, that still does not solve all of the problems: you are still dependant on the hard-coded format of one entry. For instance, for a citizen entry, you would still have to read it the same way you would have read the whole file:
Read four bytes
that is the length of data
read 32 bytes
that is the name of the citizen
read 32 bytes
that is the pedia entry for the citizen
etc.
etc.
(Note: the numbers may not be correct, as I am doing this from memory. It is not my intention to provide the BIC format document here, though, but rather to just illustrate the point.)
3. Sort of a variation on #2: read through the whole file, searching for four consecutive capital letters followed by a number. These conditions are only satisfied by section identifiers, such as BLDG, CTZN, etc.. Store the positions of the headers in a vector. Then, go through the positions, identify what kind of section it is, and read the deata from there appropriatly. This approach still has the same problem as #2: you hard-code the format of a single entry of data for each type.
Now, then the question is: why is it even so important to be concerned with hard-coding the format of the bic file? Well, the problem here is that the format apparently changes quite a bit. And especially if we want to read all the different versions of the BIC file, we need quite a bit of flexibility.
Soooo, the way I am reading and storing the data is this:
I have a separate class for each type of entry. Each class has the data member corresponding to the data format of the entry. So, for instance, a class for a citizen would look like this:
Code:
public int Length; public int DefaultCitizen; public string SingularName; public string CivilopediaEntry; public string PluralName; public int Prerequisite; public int LuxuryBonus; public int ResearchBonus; public int TaxBonus;
So, that's that. Now it's your turn to share the experience.
Comment