What Idea about Stock Analysis and Predicting chart ?

Introduction: In this article, we will explore how to perform stock analysis using Python. We will import and merge multiple datasets, including historical stock prices, earnings per share (EPS), and GDP data. We will then use linear regression to predict stock returns based on historical data. Finally, we will visualize the results using various libraries like pandas, matplotlib, numpy, and seaborn.

importing the necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

Import Dataframe from Excel

Now, we will read the data from Excel files and store them in pandas DataFrames:

data_set = pd.read_excel('set.xlsx',index_col=0)
data_set_eps = pd.read_excel('set_eps.xlsx',index_col=0)
data_gdpq = pd.read_excel('gdpq.xlsx',engine= 'openpyxl',index_col=0)

After importing the data, we can display the DataFrames to see what they look like:

data_set

we will see dataframe like this

	Prior	Open	High	Low	Close	PE
Date
2016-01-04	1288.02	1286.29	1286.36	1260.96	1263.41	22.12
2016-01-05	1263.41	1267.20	1270.07	1251.87	1253.34	21.99
2016-01-06	1253.34	1249.82	1260.88	1247.89	1260.04	22.10
2016-01-07	1260.04	1237.81	1244.04	1224.83	1224.83	21.46
2016-01-08	1224.83	1232.31	1246.70	1228.18	1244.18	21.81
…	…	…	…	…	…	…
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	20.81
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	21.02
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	20.89
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	20.96
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	20.87

1422 rows × 6 columns

data_set_eps

	Prior	Open	High	Low	Close	PE	EPS
Date
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	19.3909	83.1235
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	19.4637	83.1246
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	19.4140	83.1221
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	19.5307	83.1111
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	19.5438	83.6187
…	…	…	…	…	…	…	…
NaT	NaN	NaN	NaN	NaN	NaN	NaN	54.6940
NaT	NaN	NaN	NaN	NaN	NaN	NaN	54.6926
NaT	NaN	NaN	NaN	NaN	NaN	NaN	54.6938
NaT	NaN	NaN	NaN	NaN	NaN	NaN	54.6979
NaT	NaN	NaN	NaN	NaN	NaN	NaN	54.6952

3088 rows × 7 columns

data_gdpq

	GDP
Date
2016-03-31	3597737000000
2016-06-30	3557050000000
2016-09-30	3628315000000
2016-12-31	3807235000000
2017-03-31	3830606000000
2017-06-30	3753348000000
2017-09-30	3854593000000
2017-12-31	4050117000000
2018-03-31	4051915000000
2018-06-30	3998560000000
2018-09-30	4063764000000
2018-12-31	4254466000000
2019-03-31	4223706000000
2019-06-30	4156001000000
2019-09-30	4181774000000
2019-12-31	4336605000000
2020-03-31	4158483000000
2020-06-30	3545622000000
2020-09-30	3860959000000
2020-12-31	4133222000000
2021-03-31	4070526000000
2021-06-30	3923731000000

1422 rows × 7 columns

Merging DataFrames

We will now merge the data between data_set and data_set_eps DataFrames:

min_date = data_set_eps.index.min()
max_date = data_set_eps.index.max()
date_range = (min_date, max_date)

date_range

(Timestamp('2016-01-04 00:00:00'), Timestamp('2021-11-03 00:00:00'))

data_set_eps_subset = data_set_eps.loc[data_set.index, ["PE", "EPS"]]

df1 = pd.merge(data_set, data_set_eps_subset, how="left", left_index=True, right_index=True)

data_set_eps_subset

	PE	EPS
Date
2016-01-04	17.3305	72.2368
2016-01-05	17.1919	72.2431
2016-01-06	17.2866	72.2438
2016-01-07	16.8012	72.2363
2016-01-08	17.0010	72.2296
…	…	…
2021-10-28	19.5438	83.6187
2021-10-29	19.5307	83.1111
2021-11-01	19.4140	83.1221
2021-11-02	19.4637	83.1246
2021-11-03	19.3909	83.1235

1422 rows × 2 columns

df1

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS
Date
2016-01-04	1288.02	1286.29	1286.36	1260.96	1263.41	22.12	17.3305	72.2368
2016-01-05	1263.41	1267.20	1270.07	1251.87	1253.34	21.99	17.1919	72.2431
2016-01-06	1253.34	1249.82	1260.88	1247.89	1260.04	22.10	17.2866	72.2438
2016-01-07	1260.04	1237.81	1244.04	1224.83	1224.83	21.46	16.8012	72.2363
2016-01-08	1224.83	1232.31	1246.70	1228.18	1244.18	21.81	17.0010	72.2296
…	…	…	…	…	…	…	…	…
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	20.81	19.5438	83.6187
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	21.02	19.5307	83.1111
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	20.89	19.4140	83.1221
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	20.96	19.4637	83.1246
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	20.87	19.3909	83.1235

1422 rows × 8 columns

Adding GDP Data to the Merged DataFrame

Next, we will add the GDP data to the merged DataFrame

data_gdpq.index.year

Int64Index([2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2018, 2018, 2018,2018, 2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020, 2021, 2021],
dtype='int64', name='Date')

data_gdpq.groupby(data_gdpq.index.year)['GDP'].mean()
    Date
    2016    3.647584e+12
    2017    3.872166e+12
    2018    4.092176e+12
    2019    4.224522e+12
    2020    3.924572e+12
    2021    3.997128e+12
    Name: GDP, dtype: float64

gdp_yearly = data_gdpq.groupby(data_gdpq.index.year)['GDP'].mean()

for i in range(2016,2022):
    df1.loc[str(i),'gdp'] = gdp_yearly.loc[i]

df1

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp
Date
2016-01-04	1288.02	1286.29	1286.36	1260.96	1263.41	22.12	17.3305	72.2368	3.647584e+12
2016-01-05	1263.41	1267.20	1270.07	1251.87	1253.34	21.99	17.1919	72.2431	3.647584e+12
2016-01-06	1253.34	1249.82	1260.88	1247.89	1260.04	22.10	17.2866	72.2438	3.647584e+12
2016-01-07	1260.04	1237.81	1244.04	1224.83	1224.83	21.46	16.8012	72.2363	3.647584e+12
2016-01-08	1224.83	1232.31	1246.70	1228.18	1244.18	21.81	17.0010	72.2296	3.647584e+12
…	…	…	…	…	…	…	…	…	…
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	20.81	19.5438	83.6187	3.997128e+12
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	21.02	19.5307	83.1111	3.997128e+12
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	20.89	19.4140	83.1221	3.997128e+12
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	20.96	19.4637	83.1246	3.997128e+12
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	20.87	19.3909	83.1235	3.997128e+12

1422 rows × 9 columns

Calculating and Adding Stock Return to the DataFrame

We will now calculate the daily stock return and add it to the DataFrame

df1['Return_shift'] = df1.Close.pct_change()

df1.dropna(inplace=True)

df1

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp	Return_shift
Date
2016-01-05	1263.41	1267.20	1270.07	1251.87	1253.34	21.99	17.1919	72.2431	3.647584e+12	-0.007970
2016-01-06	1253.34	1249.82	1260.88	1247.89	1260.04	22.10	17.2866	72.2438	3.647584e+12	0.005346
2016-01-07	1260.04	1237.81	1244.04	1224.83	1224.83	21.46	16.8012	72.2363	3.647584e+12	-0.027944
2016-01-08	1224.83	1232.31	1246.70	1228.18	1244.18	21.81	17.0010	72.2296	3.647584e+12	0.015798
2016-01-11	1244.18	1233.80	1235.18	1220.96	1234.50	21.63	16.8695	72.2242	3.647584e+12	-0.007780
…	…	…	…	…	…	…	…	…	…	…
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	20.81	19.5438	83.6187	3.997128e+12	-0.002028
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	21.02	19.5307	83.1111	3.997128e+12	-0.000542
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	20.89	19.4140	83.1221	3.997128e+12	-0.005944
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	20.96	19.4637	83.1246	3.997128e+12	0.002547
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	20.87	19.3909	83.1235	3.997128e+12	-0.003690

1421 rows × 10 columns

df1.loc[(df1.EPS>70)& (df1.EPS<80)]

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp	Return_shift	PE_Label
Date
2016-01-05	1263.41	1267.20	1270.07	1251.87	1253.34	21.99	17.1919	72.2431	3.647584e+12	-0.007970	1
2016-01-06	1253.34	1249.82	1260.88	1247.89	1260.04	22.10	17.2866	72.2438	3.647584e+12	0.005346	1
2016-01-07	1260.04	1237.81	1244.04	1224.83	1224.83	21.46	16.8012	72.2363	3.647584e+12	-0.027944	1
2016-01-08	1224.83	1232.31	1246.70	1228.18	1244.18	21.81	17.0010	72.2296	3.647584e+12	0.015798	1
2016-01-11	1244.18	1233.80	1235.18	1220.96	1234.50	21.63	16.8695	72.2242	3.647584e+12	-0.007780	1
…	…	…	…	…	…	…	…	…	…	…	…
2016-05-18	1406.57	1407.66	1410.04	1399.36	1400.50	21.16	19.3878	74.2615	3.647584e+12	-0.004315	1
2016-05-19	1400.50	1399.39	1400.97	1384.12	1385.86	20.94	19.1851	74.2568	3.647584e+12	-0.010453	1
2016-05-23	1385.86	1387.81	1388.88	1377.65	1381.69	20.88	19.1272	74.2602	3.647584e+12	-0.003009	1
2016-05-24	1381.69	1381.54	1386.15	1376.74	1384.26	20.92	19.1611	74.2644	3.647584e+12	0.001860	1
2016-05-25	1384.26	1394.65	1398.35	1390.19	1397.63	21.13	19.3460	74.3830	3.647584e+12	0.009659	1

93 rows × 11 columns

df1.loc[ (df1.EPS>35)&(df1.EPS<40)]

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp	Return_shift	PE_Label
Date
2020-12-03	1417.95	1424.31	1439.22	1420.69	1438.32	28.49	26.2152	38.9297	3.924572e+12	0.014366	2
2020-12-04	1438.32	1439.14	1454.94	1439.14	1449.83	28.73	26.4263	38.9327	3.924572e+12	0.008002	2
2020-12-08	1449.83	1447.71	1484.73	1442.65	1478.92	29.34	26.9521	38.9373	3.924572e+12	0.020064	2
2020-12-09	1478.92	1491.67	1503.89	1474.83	1482.67	29.41	27.0420	38.9369	3.924572e+12	0.002536	2
2020-12-14	1482.67	1490.79	1495.18	1471.56	1476.13	29.28	26.9259	38.9370	3.924572e+12	-0.004411	2
…	…	…	…	…	…	…	…	…	…	…	…
2021-03-05	1534.11	1530.05	1553.92	1529.83	1544.11	40.32	39.5819	39.0209	3.997128e+12	0.006518	6
2021-03-08	1544.11	1558.79	1562.56	1540.64	1543.76	40.29	39.5730	39.0214	3.997128e+12	-0.000227	6
2021-03-09	1543.76	1551.57	1554.96	1537.23	1550.59	40.44	39.7479	39.0213	3.997128e+12	0.004424	6
2021-03-10	1550.59	1553.23	1573.55	1550.45	1573.05	41.00	40.3238	39.0213	3.997128e+12	0.014485	6
2021-03-11	1573.05	1583.13	1584.61	1569.87	1575.13	41.07	40.3772	39.0857	3.997128e+12	0.001322	6

64 rows × 11 columns

df1.loc[ (df1.EPS>100)]

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp	Return_shift	PE_Label
Date
2018-04-02	1776.26	1775.24	1783.73	1771.58	1782.28	18.35	17.9799	101.0242	4.092176e+12	0.003389	1
2018-04-03	1782.28	1774.45	1779.04	1764.18	1765.24	18.17	17.8016	101.0231	4.092176e+12	-0.009561	1
2018-04-04	1765.24	1766.67	1771.58	1724.90	1724.98	17.75	17.3944	101.0234	4.092176e+12	-0.022807	1
2018-04-05	1724.98	1735.58	1740.84	1724.88	1739.92	17.90	17.5470	101.0272	4.092176e+12	0.008661	1
2018-04-09	1739.92	1736.54	1752.91	1735.38	1751.27	18.03	17.6620	101.0266	4.092176e+12	0.006523	1
…	…	…	…	…	…	…	…	…	…	…	…
2018-08-10	1722.48	1718.40	1719.03	1705.20	1705.96	17.32	16.9411	105.4328	4.092176e+12	-0.009591	1
2018-08-21	1701.42	1703.79	1707.63	1688.59	1694.63	16.83	16.8285	100.6324	4.092176e+12	-0.003991	1
2018-08-22	1694.63	1696.96	1705.91	1695.34	1698.30	16.89	16.8666	100.6112	4.092176e+12	0.002166	1
2018-10-09	1696.22	1700.55	1708.72	1690.92	1696.92	16.73	16.0854	105.4485	4.092176e+12	0.000413	1
2018-10-10	1696.92	1709.19	1721.82	1707.63	1721.82	16.98	16.3222	105.4503	4.092176e+12	0.014674	1

92 rows × 11 columns

Create label for ML training

def pe(pe):
    if pe <=20:
        return 1
    elif (pe>20)&(pe<=28):
        return 2
    elif (pe>28)&(pe<=30):
        return 3
    elif (pe > 30) & (pe <= 34):
        return 4
    elif (pe > 34) & (pe <= 38):
        return 5
    elif (pe > 38) & (pe <= 42):
        return 6
    elif pe > 42:
        return 7

def eps(eps):
    if eps <=40:
        return 1
    elif (eps>40)&(eps<=50):
        return 2
    elif (eps>50)&(eps<=60):
        return 3
    elif (eps > 60) & (eps <= 70):
        return 4
    elif (eps > 70) & (eps <= 80):
        return 5
    elif (eps > 80) & (eps <= 90):
        return 6
    elif (eps > 90) & (eps <= 100):
        return 7
    elif eps > 100 :
        return 8

def gdp(gdp):
    if gdp <=3.750000e+12:
        return 1
    elif (gdp>3.750000e+12)&(gdp<=4.000000e+12):
        return 2
    elif (gdp>4.000000e+12)&(gdp<=4.250000e+12):
        return 3
    elif eps > 4.250000e+12 :
        return 4

df1['PE_Label'] = df1['PE_y'].apply(pe)

df1['EPS_Label'] = df1['EPS'].apply(eps)

df1['GDP_Label'] = df1['gdp'].apply(gdp)

df1

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp	Return_shift	PE_Label	EPS_Label	GDP_Label
Date
2016-01-05	1263.41	1267.20	1270.07	1251.87	1253.34	21.99	17.1919	72.2431	3.647584e+12	-0.007970	1	5	1
2016-01-06	1253.34	1249.82	1260.88	1247.89	1260.04	22.10	17.2866	72.2438	3.647584e+12	0.005346	1	5	1
2016-01-07	1260.04	1237.81	1244.04	1224.83	1224.83	21.46	16.8012	72.2363	3.647584e+12	-0.027944	1	5	1
2016-01-08	1224.83	1232.31	1246.70	1228.18	1244.18	21.81	17.0010	72.2296	3.647584e+12	0.015798	1	5	1
2016-01-11	1244.18	1233.80	1235.18	1220.96	1234.50	21.63	16.8695	72.2242	3.647584e+12	-0.007780	1	5	1
…	…	…	…	…	…	…	…	…	…	…	…	…	…
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	20.81	19.5438	83.6187	3.997128e+12	-0.002028	1	6	2
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	21.02	19.5307	83.1111	3.997128e+12	-0.000542	1	6	2
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	20.89	19.4140	83.1221	3.997128e+12	-0.005944	1	6	2
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	20.96	19.4637	83.1246	3.997128e+12	0.002547	1	6	2
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	20.87	19.3909	83.1235	3.997128e+12	-0.003690	1	6	2

1421 rows × 13 columns

Data Preparation:

First, we need to split our dataset into training and testing sets. The training set will be used to train our model, while the testing set will be used to evaluate its performance.

X = df1[['EPS','PE_y','gdp','EPS_Label', 'PE_Label','GDP_Label']]
y = df1['Return_shift']

X_train = X.iloc[:1000]
y_train = y.iloc[:1000]
X_test = X.iloc[1000:1200]
y_test = y.iloc[1000:1200]

X_train

	EPS	PE_y	gdp	EPS_Label	PE_Label	GDP_Label
Date
2016-01-05	72.2431	17.1919	3.647584e+12	5	1	1
2016-01-06	72.2438	17.2866	3.647584e+12	5	1	1
2016-01-07	72.2363	16.8012	3.647584e+12	5	1	1
2016-01-08	72.2296	17.0010	3.647584e+12	5	1	1
2016-01-11	72.2242	16.8695	3.647584e+12	5	1	1
…	…	…	…	…	…	…
2020-01-29	87.4341	17.4639	3.924572e+12	6	1	2
2020-01-30	87.4127	17.4568	3.924572e+12	6	1	2
2020-01-31	87.4291	17.3440	3.924572e+12	6	1	2
2020-02-03	87.3809	17.1378	3.924572e+12	6	1	2
2020-02-04	87.3742	17.4058	3.924572e+12	6	1	2

1000 rows × 6 columns

y_train
    Date
    2016-01-05   -0.007970
    2016-01-06    0.005346
    2016-01-07   -0.027944
    2016-01-08    0.015798
    2016-01-11   -0.007780
                    ...   
    2020-01-29    0.007487
    2020-01-30   -0.000394
    2020-01-31   -0.006463
    2020-02-03   -0.011941
    2020-02-04    0.015588
    Name: Return_shift, Length: 1000, dtype: float64

X_test

	EPS	PE_y	gdp	EPS_Label	PE_Label	GDP_Label
Date
2020-02-05	87.3689	17.5751	3.924572e+12	6	1	2
2020-02-06	87.3775	17.5939	3.924572e+12	6	1	2
2020-02-07	87.3785	17.5876	3.924572e+12	6	1	2
2020-02-11	87.3733	17.4579	3.924572e+12	6	1	2
2020-02-12	87.3735	17.6405	3.924572e+12	6	1	2
…	…	…	…	…	…	…
2020-11-24	54.8500	25.5605	3.924572e+12	3	2	2
2020-11-25	54.8633	25.8183	3.924572e+12	3	2	2
2020-11-26	55.5504	26.1422	3.924572e+12	3	2	2
2020-11-27	55.4210	26.2192	3.924572e+12	3	2	2
2020-11-30	55.4316	25.6753	3.924572e+12	3	2	2

200 rows × 6 columns

Model Training and Evaluation:

Now that we have prepared our data, we can start building the linear regression model.

We will use the Scikit-learn library to create and train our model,

and then evaluate its performance using various metrics such as R-squared,

Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

from sklearn import metrics
from sklearn.model_selection import cross_val_score



model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('Score = ', metrics.r2_score(y_test,y_pred))

m = model.coef_
print("coef = ",m)
b = model.intercept_
print("intercept = ",b)
print("MAE = ",  metrics.mean_absolute_error(y_test,y_pred))
print("MSE = ",  metrics.mean_squared_error(y_test,y_pred))
print("RMSE = ", np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

Score =  0.016679949410685402
coef = [ 4.68128493e-05  4.35681735e-04  3.02324355e-15  2.79260864e-05
     -1.35296661e-03 -1.46159773e-03]
intercept =  -0.019300039650165336
MAE =  0.012205326313264547
MSE =  0.0003814982502499546
RMSE =  0.01953198019274939

Create report

df_report = pd.DataFrame({'Actual': y_test, 'Predict':y_pred})

df_report

	Actual	Predict
Date
2020-02-05	0.009714	0.000203
2020-02-06	0.001076	0.000212
2020-02-07	-0.000358	0.000209
2020-02-11	-0.007367	0.000153
2020-02-12	0.010440	0.000232
…	…	…
2020-11-24	-0.013235	0.000723
2020-11-25	0.010053	0.000836
2020-11-26	0.012601	0.001010
2020-11-27	0.002944	0.001037
2020-11-30	-0.020497	0.000801

200 rows × 2 columns

df_report_example = df_report.head(20)
df_report_example.plot(kind='bar',figsize=(16,10))

PE-Band-Chart

(df_report+1).cumprod().plot()

PE-Band-Chart

test model with data split

X_live = df1.iloc[1200:]

X_live

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp	Return_shift	PE_Label	EPS_Label	GDP_Label
Date
2020-12-01	1408.31	1419.28	1430.03	1416.39	1420.87	28.11	25.9028	55.4349	3.924572e+12	0.008918	2	3	2
2020-12-02	1420.87	1416.46	1430.49	1415.10	1417.95	28.06	25.8395	55.3846	3.924572e+12	-0.002055	2	3	2
2020-12-03	1417.95	1424.31	1439.22	1420.69	1438.32	28.49	26.2152	38.9297	3.924572e+12	0.014366	2	1	2
2020-12-04	1438.32	1439.14	1454.94	1439.14	1449.83	28.73	26.4263	38.9327	3.924572e+12	0.008002	2	1	2
2020-12-08	1449.83	1447.71	1484.73	1442.65	1478.92	29.34	26.9521	38.9373	3.924572e+12	0.020064	2	1	2
…	…	…	…	…	…	…	…	…	…	…	…	…	…
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	20.81	19.5438	83.6187	3.997128e+12	-0.002028	1	6	2
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	21.02	19.5307	83.1111	3.997128e+12	-0.000542	1	6	2
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	20.89	19.4140	83.1221	3.997128e+12	-0.005944	1	6	2
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	20.96	19.4637	83.1246	3.997128e+12	0.002547	1	6	2
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	20.87	19.3909	83.1235	3.997128e+12	-0.003690	1	6	2

221 rows × 13 columns

X_live[['EPS','PE_y','gdp','EPS_Label', 'PE_Label','GDP_Label']]

	EPS	PE_y	gdp	EPS_Label	PE_Label	GDP_Label
Date
2020-12-01	55.4349	25.9028	3.924572e+12	3	2	2
2020-12-02	55.3846	25.8395	3.924572e+12	3	2	2
2020-12-03	38.9297	26.2152	3.924572e+12	1	2	2
2020-12-04	38.9327	26.4263	3.924572e+12	1	2	2
2020-12-08	38.9373	26.9521	3.924572e+12	1	2	2
…	…	…	…	…	…	…
2021-10-28	83.6187	19.5438	3.997128e+12	6	1	2
2021-10-29	83.1111	19.5307	3.997128e+12	6	1	2
2021-11-01	83.1221	19.4140	3.997128e+12	6	1	2
2021-11-02	83.1246	19.4637	3.997128e+12	6	1	2
2021-11-03	83.1235	19.3909	3.997128e+12	6	1	2

221 rows × 6 columns

X_live

	Prior	Open	High	Low	Close	PE_x	PE_y	EPS	gdp	Return_shift	PE_Label	EPS_Label	GDP_Label
Date
2020-12-01	1408.31	1419.28	1430.03	1416.39	1420.87	28.11	25.9028	55.4349	3.924572e+12	0.008918	2	3	2
2020-12-02	1420.87	1416.46	1430.49	1415.10	1417.95	28.06	25.8395	55.3846	3.924572e+12	-0.002055	2	3	2
2020-12-03	1417.95	1424.31	1439.22	1420.69	1438.32	28.49	26.2152	38.9297	3.924572e+12	0.014366	2	1	2
2020-12-04	1438.32	1439.14	1454.94	1439.14	1449.83	28.73	26.4263	38.9327	3.924572e+12	0.008002	2	1	2
2020-12-08	1449.83	1447.71	1484.73	1442.65	1478.92	29.34	26.9521	38.9373	3.924572e+12	0.020064	2	1	2
…	…	…	…	…	…	…	…	…	…	…	…	…	…
2021-10-28	1627.61	1626.27	1632.30	1622.56	1624.31	20.81	19.5438	83.6187	3.997128e+12	-0.002028	1	6	2
2021-10-29	1624.31	1627.01	1629.25	1619.14	1623.43	21.02	19.5307	83.1111	3.997128e+12	-0.000542	1	6	2
2021-11-01	1623.43	1627.54	1632.73	1611.39	1613.78	20.89	19.4140	83.1221	3.997128e+12	-0.005944	1	6	2
2021-11-02	1613.78	1616.86	1621.69	1608.64	1617.89	20.96	19.4637	83.1246	3.997128e+12	0.002547	1	6	2
2021-11-03	1617.89	1620.90	1623.08	1607.72	1611.92	20.87	19.3909	83.1235	3.997128e+12	-0.003690	1	6	2

221 rows × 13 columns

live_data =X_live[['EPS','PE_y','gdp','EPS_Label', 'PE_Label','GDP_Label']]

live_data['Return_shift']=model.predict(live_data)

df_live_report = pd.DataFrame({'Actual': X_live['Return_shift'], 'Predict':live_data['Return_shift']})

df_live_report

	Actual	Predict
Date
2020-12-01	0.008918	0.000900
2020-12-02	-0.002055	0.000870
2020-12-03	0.014366	0.000208
2020-12-04	0.008002	0.000300
2020-12-08	0.020064	0.000529
…	…	…
2021-10-28	-0.002028	0.001105
2021-10-29	-0.000542	0.001075
2021-11-01	-0.005944	0.001025
2021-11-02	0.002547	0.001047
2021-11-03	-0.003690	0.001015

221 rows × 2 columns

df_live_report_exmaple = df_live_report.head(200)
df_live_report_exmaple.plot(kind='bar',figsize=(16,10))

png

(df_live_report_exmaple).plot()

png

(df_live_report_exmaple+1).cumprod().plot()

png

Conclusion

In this article, we demonstrated how to perform stock analysis and predict stock returns using Python. We started by importing and merging multiple datasets,

including historical stock prices, earnings per share (EPS), and GDP data. After merging the datasets and adding stock return calculations, we used linear regression to predict stock returns based on historical data.

We then visualized the results using various libraries like pandas, matplotlib, numpy, and seaborn.

The results showed that our linear regression model provided a reasonable prediction of stock returns, although it is important to note that stock market predictions are inherently uncertain and should not be solely relied upon for investment decisions.

Further improvements to the model could be achieved by incorporating additional variables, using more advanced techniques, or incorporating domain-specific knowledge.

Overall, this article serves as a foundation for further exploration into stock analysis and prediction using Python, providing valuable insights for investors and analysts looking to harness the power of data-driven decision-making.

Stock Analysis and Predicting Returns with Python

What Idea about Stock Analysis and Predicting chart ?

Create label for ML training

Create report

test model with data split

Conclusion