import pandas as pd
import numpy as np
np.random.seed(2)
pd.set_option('display.precision', 3)

size = 20
df = pd.DataFrame({'A': np.random.randn(size), 
                   'B': np.random.randint(5,size=size),
                   'C': np.random.randint(5,size=size)})
df.B += 3
df

df2 = df.groupby('B').mean()
df2['countB'] = df.groupby('B').size()
df2

df.groupby('B').groups

{3: [3, 10, 14, 16], 4: [1, 9, 13, 15, 18, 19], 5: [2, 6, 8, 11, 12, 17], 6: [0], 7: [4, 5, 7]}

df.groupby('B').get_group(7)

df.groupby(['B','C']).first()  # get first value of A for each group

df.groupby(['B','C']).get_group((3,3))

dfm = df.groupby(['B','C']).first()
dfm

dfm.groupby(level=1).sum()

df.groupby('B').agg([np.mean, 'last'])  # some function are predefined and therefore can be named

/tmp/ipykernel_92450/2617865426.py:1: FutureWarning: The provided callable <function mean at 0x7fdaa8046a70> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  df.groupby('B').agg([np.mean, 'last'])  # some function are predefined and therefore can be named

df.groupby('B').agg({'A': "sum", 'C': lambda x : x[x%2 == 0].mean() })

df = pd.DataFrame({'month': np.random.randint(1,4,size=10), 
                   'day sales': np.random.randint(50,size=10)}).sort_values('month')

df

df.groupby('month').mean()

df.groupby('month').transform("mean")

df['mean day sales'] = df.groupby('month').transform("mean")['day sales']
df

	A	B	C
0	-0.417	6	1
1	-0.056	4	4
2	-2.136	5	2
3	1.640	3	3
4	-1.793	7	0
5	-0.842	7	3
6	0.503	5	0
7	-1.245	7	2
8	-1.058	5	2
9	-0.909	4	0
10	0.551	3	4
11	2.292	5	2
12	0.042	5	0
13	-1.118	4	2
14	0.539	3	4
15	-0.596	4	1
16	-0.019	3	3
17	1.175	5	0
18	-0.748	4	2
19	0.009	4	1

	A	C	countB
B
3	0.678	3.500	4
4	-0.570	1.667	6
5	0.136	1.000	6
6	-0.417	1.000	1
7	-1.293	1.667	3

		A
B	C
3	3	1.640
3	4	0.551
4	0	-0.909
	1	-0.596
	2	-1.118
	4	-0.056
5	0	0.503
5	2	-2.136
6	1	-0.417
7	0	-1.793
	2	-1.245
	3	-0.842

		A
B	C
3	3	1.640
3	4	0.551
4	0	-0.909
	1	-0.596
	2	-1.118
	4	-0.056
5	0	0.503
5	2	-2.136
6	1	-0.417
7	0	-1.793
	2	-1.245
	3	-0.842

	A
C
0	-2.200
1	-1.013
2	-4.499
3	0.799
4	0.495

Grouper des données suivant une colonne¶

Structure d'un groupe¶

Grouper suivant plusieurs colonnes¶

Grouper suivant un sous-index¶

Appliquer différentes opérations¶

Grouper tout en conservant la structure initiale¶

Grouping by dates¶

	A		C
	mean	last	mean	last
B
3	0.678	-0.019	3.500	3
4	-0.570	0.009	1.667	1
5	0.136	1.175	1.000	0
6	-0.417	-0.417	1.000	1
7	-1.293	-1.245	1.667	2

	month	day sales	mean day sales
0	1	20	23.0
1	1	26	23.0
2	2	23	21.0
3	2	22	21.0
5	2	37	21.0
6	2	10	21.0
7	2	8	21.0
8	2	26	21.0
4	3	43	39.0
9	3	35	39.0

	A	C
B
3	2.712	4.0
4	-3.418	2.0
5	0.817	1.0
6	-0.417	NaN
7	-3.880	1.0