Our DataFrame is the list of French mayors in 2014:
https://www.data.gouv.fr/storage/f/2014-04-25T17-51-58/maires-25-04-2014.xlsx
this file is also in data/maires-25-04-2014.xlsx
so no reason to reload it...
Load the file in a DataFrame¶
In [ ]:
Correct the dataframe¶
We can see there are issues:
- first 3 lines are just comments to ignore
- last line holds sums which we don't want
- names of columns are in the forth line
- names of columns are too long (e.g. 'Code du département (Maire)') so let define our name and ignore line 4 too (the title)
Show head of the resulting DataFrame.
Note: it can be useful to reaload the DataFrame with the right arguments.
In [ ]:
Lisez la doc de read_excel
et recharger le tableau avec les bonnes options pour avoir directement le tableau parfait, sans aucunes des corrections précédentes à faire.
In [ ]:
Cast birth and population¶
Birth and population are useless String, cast them to what they should be.
In [ ]:
Add a column 'age'¶
Use the birthdate to add a column 'age'. You may need to compute in year since TimeDelta are in days by default.
Looking at data¶
- display the line of Paris
- sort all cities per population, largest first
- give total population
- give percentage of male mayors
- give statitics on the age of mayors
In [ ]:
Group data¶
Let's group all cities of the same department and
- sum the population with
np.sum
- average the age of mayors with
np.mean
- count the number of cities with
np.size
In [ ]: