5. Data Manipulation¶
5.1. Combining DataFrame¶
5.1.1. Mutating Joins¶
- Datasets
Python
import pandas as pd
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
right = pd.DataFrame({'A': ['A0', 'A1', 'A6', 'A7'],
'F': ['B4', 'B5', 'B6', 'B7'],
'G': ['C4', 'C5', 'C6', 'C7'],
'H': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7])
print(left)
print(right)
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
A F G H
4 A0 B4 C4 D4
5 A1 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
R
left = data.frame(A = c('A0', 'A1', 'A2', 'A3'),
B = c('B0', 'B1', 'B2', 'B3'),
C = c('C0', 'C1', 'C2', 'C3'),
D = c('D0', 'D1', 'D2', 'D3'))
left
right = data.frame(A = c('A0', 'A1', 'A6', 'A7'),
F = c('B4', 'B5', 'B6', 'B7'),
G = c('C4', 'C5', 'C6', 'C7'),
H = c('D4', 'D5', 'D6', 'D7'))
right
> left
A B C D
1 A0 B0 C0 D0
2 A1 B1 C1 D1
3 A2 B2 C2 D2
4 A3 B3 C3 D3
> right
A F G H
1 A0 B4 C4 D4
2 A1 B5 C5 D5
3 A6 B6 C6 D6
4 A7 B7 C7 D7
- Left join
Python
- Code:
# left join
left_join = left.merge(right, on='A', how='left')
print(left_join)
- Result:
A B C D F G H
0 A0 B0 C0 D0 B4 C4 D4
1 A1 B1 C1 D1 B5 C5 D5
2 A2 B2 C2 D2 NaN NaN NaN
3 A3 B3 C3 D3 NaN NaN NaN
R
- Code:
library(dplyr)
# left join
dplyr::left_join(left,right, by = 'A')
# or
left %>% left_join(right, by ='A')
- Result:
> dplyr::left_join(left,right, by = 'A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
3 A2 B2 C2 D2 <NA> <NA> <NA>
4 A3 B3 C3 D3 <NA> <NA> <NA>
Warning message:
Column `A` joining factors with different levels, coercing to character vector
> left %>% left_join(right, by ='A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
3 A2 B2 C2 D2 <NA> <NA> <NA>
4 A3 B3 C3 D3 <NA> <NA> <NA>
- Right join
Python
- Code:
# right join
right_join = left.merge(right, on='A', how='right')
print(right_join)
- Result:
A B C D F G H
0 A0 B0 C0 D0 B4 C4 D4
1 A1 B1 C1 D1 B5 C5 D5
2 A6 NaN NaN NaN B6 C6 D6
3 A7 NaN NaN NaN B7 C7 D7
R
- Code:
library(dplyr)
# right join
dplyr::right_join(left,right, by = 'A')
left %>% right_join(right, by ='A')
- Result:
> dplyr::right_join(left,right, by = 'A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
3 A6 <NA> <NA> <NA> B6 C6 D6
4 A7 <NA> <NA> <NA> B7 C7 D7
Warning message:
Column `A` joining factors with different levels, coercing to character vector
> left %>% right_join(right, by ='A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
3 A6 <NA> <NA> <NA> B6 C6 D6
4 A7 <NA> <NA> <NA> B7 C7 D7
Warning message:
Column `A` joining factors with different levels, coercing to character vector
- Inner join
Python
- Code:
# inner join
inner_join = left.merge(right, on='A', how='inner')
print(inner_join)
- Result:
A B C D F G H
0 A0 B0 C0 D0 B4 C4 D4
1 A1 B1 C1 D1 B5 C5 D5
R
- Code:
library(dplyr)
# inner join
dplyr::inner_join(left,right, by = 'A')
left %>% inner_join(right, by ='A')
- Result:
> dplyr::inner_join(left,right, by = 'A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
Warning message:
Column `A` joining factors with different levels, coercing to character vector
> left %>% inner_join(right, by ='A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
Warning message:
Column `A` joining factors with different levels, coercing to character vector
- Full join
Python
- Code:
# full join
full_join = left.merge(right, on='A', how='outer')
print(full_join)
- Result:
A B C D F G H
0 A0 B0 C0 D0 B4 C4 D4
1 A1 B1 C1 D1 B5 C5 D5
2 A2 B2 C2 D2 NaN NaN NaN
3 A3 B3 C3 D3 NaN NaN NaN
4 A6 NaN NaN NaN B6 C6 D6
5 A7 NaN NaN NaN B7 C7 D7
R
- Code:
library(dplyr)
# full join
dplyr::full_join(left,right, by = 'A')
left %>% full_join(right, by ='A')
- Result:
> dplyr::full_join(left,right, by = 'A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
3 A2 B2 C2 D2 <NA> <NA> <NA>
4 A3 B3 C3 D3 <NA> <NA> <NA>
5 A6 <NA> <NA> <NA> B6 C6 D6
6 A7 <NA> <NA> <NA> B7 C7 D7
Warning message:
Column `A` joining factors with different levels, coercing to character vector
> left %>% full_join(right, by ='A')
A B C D F G H
1 A0 B0 C0 D0 B4 C4 D4
2 A1 B1 C1 D1 B5 C5 D5
3 A2 B2 C2 D2 <NA> <NA> <NA>
4 A3 B3 C3 D3 <NA> <NA> <NA>
5 A6 <NA> <NA> <NA> B6 C6 D6
6 A7 <NA> <NA> <NA> B7 C7 D7
Warning message:
Column `A` joining factors with different levels, coercing to character vector
5.1.2. Filtering Joins¶
5.2. DataFrame Operations¶
TO DO …..