Copies and views in Pandas¶

Arrays, DataFrames, can contain a large amount of data so it is good to avoid copies, even temporary ones, when possible. Thus extracting a sub-array to consult it does not require a copy. If now we want to modify it then we must ask ourselves the question of modifying the original table.

In practice Pandas chooses whether it makes a copy or a view. If he made a view, a modification of the sub-table will modify the main table, which will not be the case if he made a copy. Also Pandas sends a warning message to emphasize the uncertainty.

In [1]:
import pandas as pd
from IPython.display import display, HTML

CSS = """
.output {
    flex-direction: row;
}
"""
HTML('<style>{}</style>'.format(CSS))
Out[1]:
In [2]:
print(pd.__version__)  # behaviour can change with the version
2.2.1
In [3]:
df = pd.DataFrame({'A':list('qwer'), 'B':list('asdf')})
df
Out[3]:
A B
0 q a
1 w s
2 e d
3 r f
In [4]:
df2 = df.loc[1:3,:]
df.loc[2,'B'] = 'X'   # warning, here 2 is a label (by chance it has the same value than the index)
df2.loc[1,'A'] = 'Z'
/tmp/ipykernel_897/1281438451.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2.loc[1,'A'] = 'Z'

df2 is potentially a copy of part of df (but it could be a view).

In [5]:
from IPython.display import display
display(df, df2)
A B
0 q a
1 Z s
2 e X
3 r f
A B
1 Z s
2 e X
3 r f

So we can see here that df2 is a view of df since a change on one of them is visible on the other (both have X and Z).

However, if I add a column to df2, since I give this column to df then the columns A and B of the 2 dataframes are views but the columns C of the 2 dataframes are distinct therefore copies!

In [6]:
df = pd.DataFrame({'A':list('qwer'), 'B':list('asdf')})
df2 = df.loc[1:3,:]
df2['C'] = df.A + df.B 

df.loc[2,'B'] = 'X'
df2.loc[1,'A'] = 'Z'

df['C'] = df2['C']
df2.loc[3,'C'] = 'AB'
/tmp/ipykernel_897/3317636207.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['C'] = df.A + df.B
In [7]:
display(df, df2)
A B C
0 q a NaN
1 Z s ws
2 e X ed
3 r f rf
A B C
1 Z s ws
2 e X ed
3 r f AB

Note that the result may be different since you never know if Pandas sees df2 as a view or a copy.

copy to make sure you have a copy¶

You want to be sure of the result, make copies:

In [8]:
df = pd.DataFrame({'A':list('qwer'), 'B':list('asdf')})
df2 = df.loc[1:3,:].copy()
df.loc[1,'A'] = 'X'
df2.loc[2,'B'] = 'Z'
display(df, df2)
A B
0 q a
1 X s
2 e d
3 r f
A B
1 w s
2 e Z
3 r f

To sum up :

  • in reader only mode, views are fine
  • in writing mode, copies are preferred.
In [ ]: