34 Commits

Author SHA1 Message Date
Nazar Kanaev
2a4d974965 go fmt 2024-10-07 12:20:45 +01:00
Karol Kosek
b9b3d2350c atom: Stop unescaping special HTML characters
The HTML data in Atom is escaped because the data needs to put as a
string to an XML file. If we are accessing it by reading the string
value, then it is already unescaped, as opposed to getting the raw
XML data.

XHTML data don't need to be unescaped either since the elements are
already encoded as is in tree. :)

Closes #198
2024-06-16 11:35:32 +01:00
Will Harding
3adcddc70c Pull atom xhtml title from nested elements
The Atom spec says that any title marked with a type of "xhtml" should be
contained in a div element[1] so we need to use the full XML text when
extracting the text.

[1] https://www.rfc-editor.org/rfc/rfc4287#section-3.1
2023-09-23 21:08:22 +01:00
Nazar Kanaev
850ce195a0 fix atom links 2023-09-07 18:19:17 +01:00
Nazar Kanaev
bc18557820 handle isPermalink in rss feeds 2023-05-20 23:26:22 +01:00
Pierre Prinetti
c1bcc0c517 Run go fmt
This patch is the result of running `go fmt ./...` with Go v1.16.15.
2022-07-04 15:20:49 +01:00
Nazar Kanaev
ee2a825cf0 get rss link when atom link is present
found in: https://rss.nytimes.com/services/xml/rss/nyt/Arts.xml

when both rss and atom link elements are present, xml parser returns
empty string. provide default namespace to capture rss link properly.
2022-05-03 15:35:57 +01:00
Nazar Kanaev
be7af0ccaf handle invalid chars in non-utf8 xml 2022-02-14 15:23:55 +00:00
Nazar Kanaev
18221ef12d use bytes.Buffer instead 2022-02-14 11:05:38 +00:00
Nazar Kanaev
d7253a60b8 strip out invalid xml characters 2022-02-12 23:42:44 +00:00
Nazar Kanaev
2de3ddff08 fix test 2022-02-12 23:41:01 +00:00
nkanaev
52cc8ecbbd fix encoding 2022-01-24 16:47:32 +00:00
nkanaev
bff7476b58 refactoring 2022-01-24 12:50:52 +00:00
nkanaev
26b87dee98 remove html tags from titles 2021-11-10 10:54:12 +00:00
Karol Kosek
19ecfcd0bc ParseRSS: accept any file with audio/ media type as podcast
There are some podcasts that use audio/opus files (mostly as an alternative,
but still), which makes the audio attachment not being displayed.

Instead of increasing the list of allowed formats (because audio/mp3 would be
quite useful on the list too), I guess it'd be better to give any audio/ media
type to the user-agent and let him worry about it. :^)
2021-07-28 09:31:27 +01:00
Nazar Kanaev
d203d38de6 fix empty feed parsing 2021-07-01 14:10:22 +01:00
Nazar Kanaev
e54df07a40 use rdf description 2021-04-15 10:29:35 +01:00
Nazar Kanaev
f8455236dc rdf date & content 2021-04-15 10:27:50 +01:00
Nazar Kanaev
fbb0dfed47 remove bom 2021-04-07 10:25:30 +01:00
Nazar Kanaev
144fc1606a remove feed hacks from storage 2021-04-05 20:59:15 +01:00
Nazar Kanaev
fa2fad0ff6 cleanup 2021-04-05 10:01:20 +01:00
Nazar Kanaev
63ad971890 unsset audio/image if present in the content 2021-04-04 21:31:25 +01:00
Nazar Kanaev
0828d6782e extract date parser to a new file 2021-04-04 20:45:13 +01:00
Nazar Kanaev
cf5856bdf7 set missing times 2021-04-04 20:42:52 +01:00
Nazar Kanaev
e50c7e1a51 handle html type atom text 2021-04-02 22:26:45 +01:00
Nazar Kanaev
0a0db68905 feedburner 2021-04-02 22:26:45 +01:00
Nazar Kanaev
36bc84d99a increase lookup length 2021-04-02 22:26:44 +01:00
Nazar Kanaev
7dbfecdba1 extract thumbnails from vimeo feeds 2021-04-02 22:26:44 +01:00
Nazar Kanaev
fafa6286d4 parser fixes 2021-04-02 22:26:44 +01:00
Nazar Kanaev
cc51fe01c2 give priority to content:encoded 2021-04-02 22:26:44 +01:00
Nazar Kanaev
51cbdea31f podcasts 2021-04-02 22:26:44 +01:00
Nazar Kanaev
6685bce51c extract data from media elements 2021-04-02 22:26:44 +01:00
Nazar Kanaev
e0e6166cdf fix feed sniff reader 2021-04-02 22:26:44 +01:00
Nazar Kanaev
c469749eaa rename packaages 2021-04-02 22:26:44 +01:00